Multiple POS tags

Nina · October 24, 2013, 3:14pm

The tagged text shows multiple POS tags and one chunk-tag associated with each token. I think the openNLP provides a specific POS tag and a chunk-tag for each token. I wonder where the other possible pos tags come from in LT. The OpenNLP provides a tree structure of the POS tags, in LT Tag text how can I avail the tree data structure besides the independent POS tags?

dnaber · October 24, 2013, 4:05pm

We’re using OpenNLP only for the chunks, the POS tags don’t come from OpenNLP but from our own embedded dictionary that doesn’t disambiguate all readings. The reason is that statistical disambiguation is a bit dangerous for grammar checking, as it might prevent errors from being found.

Nina · October 24, 2013, 6:10pm

Many thanks for your quick responses to all my questions.
Your POS dictionary is build based on WordNet?
Is there any way we can access the synonyms or related concepts/words through a rule?

dnaber · October 24, 2013, 6:15pm

No, it’s not based on WordNet. Thus, there are no synonyms, it’s just word, inflected form, and POS tag(s).

Nina · November 8, 2013, 4:44pm

Is there any evidence for “The reason is that statistical disambiguation is a bit dangerous for grammar checking, as it might prevent errors from being found.”?

Also, can you please elaborate on whether LT tries all permutations/combinations of all the (dictionary based) POS tags of each of the target in a sentence to apply rules?

dnaber · November 8, 2013, 5:23pm

There’s no hard evidence in form of numbers that I know of. LT considers all readings, unless they are disambiguated using disambiguation.xml.

Nina · November 11, 2013, 2:43pm

Thank you very much.

LT considers all readings… here “all readings” meaning “all pos tags”?

dnaber · November 11, 2013, 3:01pm

Yes, I meant all POS tags. BTW, as your questions are quite advanced, you might want to join our mailing list (languagetool-devel List Signup and Options), there are more developers than here in the forum.

Nina · November 11, 2013, 3:22pm

Thanks Daniel.

Nina · November 11, 2013, 7:46pm

Hi Daniel,

Can you please clarify the documentation at
http://wiki.languagetool.org/using-chunks
quotes …“we’re also using the OpenNLP part-of-speech tagger”. So it is not clear whether LT uses OpenNLP pos tags and additional POS tags for each token.

Thanks

dnaber · November 11, 2013, 8:03pm

We only use the OpenNLP POS tagger as a step before chunking. So the noun phrases (a.k.a. chunks) are found by OpenNLP, but everything else uses our own POS tagger. This chunking was only introduced in LT 2.3.

Nina · November 11, 2013, 8:15pm

If I wish to use OpenNLP POS tags instead of LT native tagger, can you provide some pointers to start. How can I use the OpenNLP POS tags to create the dictionary?
Thanks.

dnaber · November 11, 2013, 10:27pm

That’s currently not possible. I’m also not sure if it is a good idea, as this would be English-only, plus OpenNLP is basically an implementation detail that could change for the next version. Maybe you can tell more about what you’re trying to do (the mailing list seems to be the better place for that).

willfriends · October 19, 2017, 9:12am

Hello. According to section 3 in the paper “A Rule-Based Style and Grammar Checker
Daniel Naber”, there should be postag probability. May I know where I can extract those tag probability? Thank you very much.

dnaber · October 19, 2017, 10:03am

That paper isn’t quite up-to-date… in LT, we don’t have tag probabilities.

willfriends · October 20, 2017, 11:22am

Thanks a lot.

aafreen · January 18, 2018, 7:06am

POS tags contains disambiguation now also. It contains many inconsistencies. If I am writing a rule to capture noun but not verb, if the word has also a verb form – > then the word is not captured.
Example:

I have few book. (Noun followed by the word “few”)
If I am “V.*” in exception, the word is not captured.
Because, "the word “book” can have verb form also."
Second example:
This paper offers some news. (Singular noun followed by the word “this”)
Here.,
Paper is the noun followed by the word “this”. And the above sentence is correct.
But in LT POS…
Paper is captured as adjective
Offer is captured as Noun

And captured “offers” and giving the replacement “Offer”
(Ofcourse, Paper can be act as JJ and Offer can be act as Noun… But not in this case.)
Stanford POS is capturing correctly. Any other option to refer stanford POS from LT.

dnaber · January 18, 2018, 8:37am

We cannot integrate the Stanford tagger due to its license (GPL), so we’d have to use a different tagger. I’m not sure if this is currently on anybody’s TODO list.

aafreen · January 18, 2018, 9:32am

What do you meant by “I’m not sure if this is currently on anybody’s TODO list.”??
Stanford POS is available offline. It needs no license. Please let me know if any other possibilities are there to correct above mentioned inconsistencies?

dnaber · January 18, 2018, 10:00am

Maybe we’re not talking about the same software? I was referring to The Stanford Natural Language Processing Group, which is under GPL.