Back to LanguageTool Homepage - Privacy - Imprint

Words relations and postag, matching postags


(Ruud Baars) #1

Having only the derives form, root word and postag in the postag file cause confusion in the lookup functionality when dealing with (root) words that are written the same, but are nevertheless different.
Would it be possible to replace the root word with a root id to improve the reverse lookup?


(Knorr) #2

Hi @Ruud_Baars!
I think it would be a good idea if you could give an example of what is the problem and of what you would like to achieve.


(Lodewijk Arie van Brienen) #3

guessing: reduce the risk of contaminating the lookup with results from a homonymic root word.
EG: die(as in dying) vs. die(as in dice)


(Ruud Baars) #4

There is built-in uncertainty in the postag search. Because the entry in the dictionary is only word-root-tag, all one can rely on is the grapheme (and tag). When the same grapheme has two tags, one must rely on the (difficult to make) disambiguator to distinguish between them where possible. But if there are 2 roots of the same grapheme, how does 'token inflected' work? Does it know the inflections of all the root graphemes, of only one of them?

And for the reverse lookup: it will find (1 or maybe even more) roots for a grapheme, and find the matching tag for all of those, or just one?