The tagged text shows multiple POS tags and one chunk-tag associated with each token. I think the openNLP provides a specific POS tag and a chunk-tag for each token. I wonder where the other possible pos tags come from in LT. The OpenNLP provides a tree structure of the POS tags, in LT Tag text how can I avail the tree data structure besides the independent POS tags?
We’re using OpenNLP only for the chunks, the POS tags don’t come from OpenNLP but from our own embedded dictionary that doesn’t disambiguate all readings. The reason is that statistical disambiguation is a bit dangerous for grammar checking, as it might prevent errors from being found.
Many thanks for your quick responses to all my questions.
Your POS dictionary is build based on WordNet?
Is there any way we can access the synonyms or related concepts/words through a rule?
No, it’s not based on WordNet. Thus, there are no synonyms, it’s just word, inflected form, and POS tag(s).
Is there any evidence for “The reason is that statistical disambiguation is a bit dangerous for grammar checking, as it might prevent errors from being found.”?
Also, can you please elaborate on whether LT tries all permutations/combinations of all the (dictionary based) POS tags of each of the target in a sentence to apply rules?
There’s no hard evidence in form of numbers that I know of. LT considers all readings, unless they are disambiguated using disambiguation.xml.
Thank you very much.
LT considers all readings… here “all readings” meaning “all pos tags”?
Yes, I meant all POS tags. BTW, as your questions are quite advanced, you might want to join our mailing list (https://lists.sourceforge.net/lists/listinfo/languagetool-devel), there are more developers than here in the forum.
Can you please clarify the documentation at
quotes …“we’re also using the OpenNLP part-of-speech tagger”. So it is not clear whether LT uses OpenNLP pos tags and additional POS tags for each token.
We only use the OpenNLP POS tagger as a step before chunking. So the noun phrases (a.k.a. chunks) are found by OpenNLP, but everything else uses our own POS tagger. This chunking was only introduced in LT 2.3.
If I wish to use OpenNLP POS tags instead of LT native tagger, can you provide some pointers to start. How can I use the OpenNLP POS tags to create the dictionary?
That’s currently not possible. I’m also not sure if it is a good idea, as this would be English-only, plus OpenNLP is basically an implementation detail that could change for the next version. Maybe you can tell more about what you’re trying to do (the mailing list seems to be the better place for that).
Hello. According to section 3 in the paper “A Rule-Based Style and Grammar Checker
Daniel Naber”, there should be postag probability. May I know where I can extract those tag probability? Thank you very much.
That paper isn’t quite up-to-date… in LT, we don’t have tag probabilities.
Thanks a lot.
POS tags contains disambiguation now also. It contains many inconsistencies. If I am writing a rule to capture noun but not verb, if the word has also a verb form – > then the word is not captured.
I have few book. (Noun followed by the word “few”)
If I am “V.*” in exception, the word is not captured.
Because, "the word “book” can have verb form also."
This paper offers some news. (Singular noun followed by the word “this”)
Paper is the noun followed by the word “this”. And the above sentence is correct.
But in LT POS…
Paper is captured as adjective
Offer is captured as Noun
And captured “offers” and giving the replacement “Offer”
(Ofcourse, Paper can be act as JJ and Offer can be act as Noun… But not in this case.)
Stanford POS is capturing correctly. Any other option to refer stanford POS from LT.
We cannot integrate the Stanford tagger due to its license (GPL), so we’d have to use a different tagger. I’m not sure if this is currently on anybody’s TODO list.
What do you meant by “I’m not sure if this is currently on anybody’s TODO list.”??
Stanford POS is available offline. It needs no license. Please let me know if any other possibilities are there to correct above mentioned inconsistencies?
Maybe we’re not talking about the same software? I was referring to https://nlp.stanford.edu/software/tagger.shtml, which is under GPL.