Disambiguation question

Ruud_Baars · March 17, 2018, 7:30am

The rule below is quite simple. It tries to match a noun postag in between two common words that usually contains a singular or plural noun. It works okay for words with one of those tags. But when I put a word in it that has only different tags (it happens in texts in the wild), it results in a strange tag:

<rule name="de_ZNW_van_2" id="DE_ZNW_VAN_2">
<pattern>
    <token>de</token>
    <marker>
        <token></token>
    </marker>
    <token>van</token>
</pattern>
<disambig><match no="1" postag="ZNW.*DE_.*" postag_regexp="yes" /></disambig>
</rule>

java -jar languagetool-commandline.jar -l nl -t
Expected text language: Dutch
Dit is de kindje van mij.
Working on STDIN…
Dit[Dit/null] is[is/ZNW:EKV,zijn/WKW:TGW:3EP] de[de/null] kindje[kind/ZNW.DE_.] van[van/VRZ,van/ZNW:EKV] mij[mij/null].

ZNW.DE_. was the filter, and now ended up as the tag. While there were different tags assigned to this word. I think this result should be null/UNKOWN.
Don’t you think so?

Somehow, a filterall rule behaves differently.

<rule id="A" name="a">
    <pattern>
        <marker>
            <token>de</token>
            <token postag="ZNW.*DE_.*" postag_regexp="yes"/>
            <token>van</token>
        </marker>
    </pattern>
    <disambig action="filterall"/>
</rule>

This should do the same, I guess. But it does not:
Dit[Dit/null] is[is/ZNW:EKV,zijn/WKW:TGW:3EP] de[de/null] kindje[kind/ZNW:EKV:VRK:HET] van[van/VRZ,van/ZNW:EKV] a[a/ZNW:EKV:DE_]

It leaves the tag, even though it is not matched.

Furthermore… When I add a postag on a word (with lemma or without) and it already has this tag, it ends up with two identical tags. This is not a problem, but not correct, or is it?

What am I doing wrong, or what is the disambiguation doing wrong?

What I am actually looking for is to drop all tags and add a new one on a word within a pattern.