Back to LanguageTool Homepage - Privacy - Imprint

Antipattern of just one token

<antipattern><token postag="UNKNOWN"/></antipattern>
not prevent the pattern to trigger when there is any amount of unknown words in it? It does not seem to do so.

As long as one of the unknown words is in the <pattern> (or was it in the <marker> inside the <pattern>?), I guess so. Can you provide a complete example that already as simple as possible?

Hmm. I can not reproduce it in the rule editor. I will check it again later.

This is an example for Dutch:

<rulegroup id="TESTJE" name="Testje">
        <antipattern><token postag="UNKNOWN"/></antipattern>
        <antipattern><token postag="SENT_START"/><token postag_regexp="yes" postag="BNW:STL:VRB|WKW:VTD:VRB"/>&ZnwEkvDe;<token postag="SENT_END" regexp="yes">[.!]</token></antipattern>
        <pattern><token postag="SENT_START"/><token postag_regexp="yes" postag="BNW:STL:VRB|WKW:VTD:VRB"/><marker><token postag_regexp="yes" postag="ZNW.*"/></marker><token postag="SENT_END" regexp="yes">[.!]</token></pattern>        
        <message>Hier wordt een de-woord verwacht.</message>
        <example type="incorrect">Goede <marker>restaurant</marker>!</example>

The output is:

Checking example sentences of 3496 rules for Dutch...
[ERROR] Tests run: 3, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 10.967 s <<< FAILURE! - in
[ERROR] testRules(  Time elapsed: 10.681 s  <<< FAILURE!
Dutch rule TESTJE[1]:
"Goede restaurant!"
Errors expected: 1
Errors found   : 0
Message: Hier wordt een de-woord verwacht.
Analyzed token readings: [/SENT_START*] Goede[Goede/ENM:PER:LST:NIX*,goed/BNW:STL:VRB*]  [ /null*] restaurant[restaurant/ZNW:EKV:HET] ![!/SENT_END*]

There is not one unknown token in the sentence, but still the antipattern is triggered.

Can you further simplify the rule, e.g. remove the second antipattern and expand the entities? This way it’s easier to test for a developer.

<rule id="TESTJE" name="Testje">
    <antipattern><token postag="UNKNOWN"/></antipattern>
    <pattern><token postag="SENT_START"/><token>goed</token><marker><token>woord</token></marker><token postag="SENT_END" /></pattern>        
    <message>bla bla</message>
    <example type="incorrect">goed <marker>woord</marker>!</example>

Could indeed be a bug. Feel free to open an issue, but it’s probably better to work around this using a <filter> if this happens more often. A developer could write one more easily then debugging this issue.

I will open a bug for the record. But there might be a workaround. I will look into that.