Words relations and postag, matching postags

Ruud_Baars · September 2, 2017, 9:33am

Having only the derives form, root word and postag in the postag file cause confusion in the lookup functionality when dealing with (root) words that are written the same, but are nevertheless different.
Would it be possible to replace the root word with a root id to improve the reverse lookup?

Knorr · September 2, 2017, 2:40pm

Hi @Ruud_Baars!
I think it would be a good idea if you could give an example of what is the problem and of what you would like to achieve.

SkyCharger001 · September 2, 2017, 9:50pm

guessing: reduce the risk of contaminating the lookup with results from a homonymic root word.
EG: die(as in dying) vs. die(as in dice)

Ruud_Baars · September 3, 2017, 7:13pm

There is built-in uncertainty in the postag search. Because the entry in the dictionary is only word-root-tag, all one can rely on is the grapheme (and tag). When the same grapheme has two tags, one must rely on the (difficult to make) disambiguator to distinguish between them where possible. But if there are 2 roots of the same grapheme, how does ‘token inflected’ work? Does it know the inflections of all the root graphemes, of only one of them?

And for the reverse lookup: it will find (1 or maybe even more) roots for a grapheme, and find the matching tag for all of those, or just one?