It would be very useful if we could synthesize a word taking the POS tag from one token and the lemma from another one. This is very common, for example, in Romance languages. This feature would save us writing dozens of rules to give a proper suggestion. So I would like to implement it.
First, we should agree on the syntax. A
<match> inside a
<match> probably is not a good idea. Could it be something like this?
<suggestion><match no="2" lemma_from_no="1" postag="N.(..).*" postag_regexp="yes" postag_replace="D..$1.*"> <mach no="2"/></suggestion>
For example, in Spanish, this could be used for generating suggestions like these:
un hombres > unos hombres / un hombre
algún hombres > algunos hombres / algún hombre
este hombres > estos hombres / este hombre
el hombres > los hombres / el hombre
All these suggestions can be generated writing more and more XML rules, but at some point it becomes unmanageable.