Back to LanguageTool Homepage - Privacy - Imprint

Disambiguator add entry and its base word

I want to add a postag (BNW:STL:ONV) to every word that is like [0-9]+.*ig and add the derivative for it: that ends with -ige, with postag BNW:STL:VRB
And the other way around.

How should I do that? The example in Developing a Disambiguator | dev.languagetool.org does not do all this…

Adding the POS tags is trivial. Do you need anything else? Adding particular lemmas to each word (e.g. lemma 9ig to the word 9ige`) is not possible in disambiguation.xml.

    <rule>
        <pattern>
            <marker>
                <token regexp="yes">[0-9]+.*ig</token>
            </marker>
        </pattern>
        <disambig action="add"><wd pos="BNW:STL:ONV"/></disambig>
    </rule>
    <rule>
        <pattern>
            <marker>
                <token regexp="yes">[0-9]+.*ige</token>
            </marker>
        </pattern>
        <disambig action="add"><wd pos="BNW:STL:VRB"/></disambig>
    </rule>

That is a pity. Any 11- jarige, voudige, bladige, koppige etc. up to 150-jarige needs to be in the dictionary then.
And there are more words like this where generating lemma would be more useful than exploding the dictionary.

I will put it on the growing whish list for Dutch.

But what do you need to do in the rules? Almost anything can be done with this information.
You cannot synthesize a form from the lemma, but you can generate it with:
<suggestion><match no="1" regexp_match="ige$" regexp_replace="ig"/></suggestion>

I just want them in the tagger list to be tagged and structured correctly as adjective, to be seen as common adjective by any rule.