I am feeling we could improve analyzed token by allowing more tags which are not POS tags.
I am currently putting some tags in POS tag dictionary that are of additional help but are not pure POS tags. If we had extra tags we could put such information there.
The idea is to add a field AnalyzedToken.extraTags, e.g.: private Map<String,String> extraTags;
The key here is category and value is set of tags. E.g. “pre-disambig: noun:m:…” or "
This could be used in several ways:
disambiguator could put original (pre-disambig) tags there (if needed)
tagger could put additional tags so that disambig/rules could use it later
Tagger could use that field for some dynamic properties (e.g. token was tagged dynamically, without the dictionary) or users can create additional dictionaries.
It could be then used in the rules with something like this: <token postag="..." extra_tag="category1:tag1">
I think we could add a field and start using it in Java code first, and then later extend it to grammar.xml/disambiguation.xml
The question is what type of field should it be:1
plain string String extraTag - this has benefit of being similar to posTag and using same mechanisms
Set<String> - (as unlike posTag these tag may be not related at all) this has benefit of being able to handle tags in more independent ways
Map<String,Set<String>> - category to set of tags - this allows to separate tag sets into different categories, e.g. “dynamicTagging”, “disambig”, “semantic” etc will all have different set of tags
3rd IMHO is most properly designed and scalable but may impose bigger changes to xml handling.