Disambiguator

The postag combination
WKW:TGW:1EP followed by WKW:TGW:1EP
is very unlikely.
When both words also have an other postag, I want to remove this combination.

Can that be done?

It is possible with two rules:

<rule>
    <pattern>
        <marker>
            <and>
                <token postag="WKW:TGW:1EP"/>
                <token postag="WKW:TGW:1EP" negate_pos="yes"/>
            </and>    
        </marker>
        <and>
            <token postag="WKW:TGW:1EP"/>
            <token postag="WKW:TGW:1EP" negate_pos="yes"/>
        </and>
    </pattern>
    <disambig action="remove" postag="WKW:TGW:1EP"/>
</rule>
<rule>
    <pattern>
        <and>
            <token postag="WKW:TGW:1EP"/>
            <token postag="WKW:TGW:1EP" negate_pos="yes"/>
        </and>
        <marker>
            <and>
                <token postag="WKW:TGW:1EP"/>
                <token postag="WKW:TGW:1EP" negate_pos="yes"/>
            </and>    
        </marker>
    </pattern>
    <disambig action="remove" postag="WKW:TGW:1EP"/>
</rule>

I think this is not the same. The first rule could remove the postag that is tested for in the second one… (I guess)

It would be great to be able to filter out any unlikely postag order.
Something like

    <rule>
    <pattern>
      <token postag="pos1"/>
      <token postag="pos2"/>
      <token postag="pos3"/>
    <disambig action="remove"><wd pos="pos1"/><wd pos="pos2"/><wd pos="pos3"/></disambig>
    </rule>

(or better still, in shorthand in a file ‘unlikelypostagarrays.txt’)
pos1 pos2 pos3

You are right. This is a problem.

I can add values to the example:

<antipattern><token postag="WKW:TGW:1EP"/><token postag="WKW:TGW:1EP"/></antipattern> <!-- score: 35785.81888886987871956080 -->
    <!-- onderzoek verkloot --> : neither WKW:TGW:1EP
    <!-- word belast -->  : grammar mistake
    <!-- beter begrijp --> first is not WKW:TGW:1EP
    <!-- verwacht haar --> second is not WKW:TGW:1EP
    <!-- vlak veel --> et cetera....
    <!-- fruit haar -->
    <!-- woon jij -->
    <!-- open water -->
    <!-- veel vet -->
    <!-- welk gebaar -->
    <!-- uitstel net -->
    <!-- win jij -->
    <!-- heel ijl -->
    <!-- proces mag -->
    <!-- jong paar -->
    <!-- beroep lang -->
    <!-- nuttig gevoel -->
    <!-- jouw potlood -->
    <!-- boek zie -->
    <!-- stuur kun -->

Maybe removing all of them is a bit too brute. I will have to thinks this over a bit more. Maybe a filterall with longer, better patterns is a better method.