I am feeling we could improve analyzed token by allowing more tags which are not POS tags.
I am currently putting some tags in POS tag dictionary that are of additional help but are not pure POS tags. If we had extra tags we could put such information there.
The idea is to add a field AnalyzedToken.extraTags, e.g.:
private Map<String,String> extraTags;
The key here is category and value is set of tags. E.g. “pre-disambig: noun:m:…” or "
This could be used in several ways:
- disambiguator could put original (pre-disambig) tags there (if needed)
- tagger could put additional tags so that disambig/rules could use it later
Tagger could use that field for some dynamic properties (e.g. token was tagged dynamically, without the dictionary) or users can create additional dictionaries.
It could be then used in the rules with something like this:
<token postag="..." extra_tag="category1:tag1">