Back to LanguageTool Homepage - Privacy - Imprint

Help on rule with optional token


#1

Can someone explain me why this rule doesn't find the expected error in 'L is an extreme points.'?

<rule id="IS_A_AN_PLURAL" name="IS_A_AN_PLURAL">
 <pattern>
  <token postag='VBZ'></token>
  <token regexp='yes' postag='DT' chunk='B-NP-plural'>a|an</token>
  <token min='0' max='3'></token>
  <marker>
  <token chunk='E-NP-plural'><exception postag='NN'></exception></token>
  </marker>
 </pattern>
 <message>Please check verb-subject agreement. Verb: "<match no="1"/>", subject: "<match no="4"/>"</message>
 <example correction=''>L is an extreme <marker>points</marker>.</example>
 <example>L is an extreme point.</example>
</rule>

Thanks


(Daniel Naber) #2

You can use http://community.languagetool.org/analysis/index?lang=en to see how a sentence gets analysed. In this case, extreme points gets tagged incorrectly due to disambiguation rules (from disambiguation.xml) shown by the site under "Disambiguator log".


#3

But shouldn't this match anyway? Because 'extreme' should be within <token min='0' max='3'></token> and 'points' have chunk='E-NP-plural' as I am looking for?


(Daniel Naber) #4

You're right - does it work if you use <token></token> instead of <token min='0' max='3'></token>?


#5

Yes, it does match...


(Daniel Naber) #6

You could submit a bug report at https://github.com/languagetool-org/languagetool/issues, but I wouldn't hold my breath for this to get fixed. That part of the code is rather convoluted.


(Ryan) #7

The two sentences and two examples for the min and max attributes from the Development Overview wiki page don't give a lot of info, but I think the max attribute is the reason why the rule does not match the test sentence.

Max appears to operate greedily. I am relatively new to LanguageTool, but I think the max attribute tells LT to try to match the next three words in a sentence with the token in question. In this case, “is” matches with the POS tag of “VBZ”. Then “a” matches via regular expressions. The blank token then matches both “extreme” and “points,” which ruins the rest of the pattern.

If you try your rule with the following sentence, you should see that it matches. “This is an extremely large onerous points.” But if you try it again with a slightly longer sentence, the rule once again loses its match. “This is an extremely large onerous tiresome points.”


(Ryan) #8

If you add chunk='I-NP-plural to the third token in the pattern, you can make it avoid matching the last token of the plural noun phrase. This won't account for noun phrases longer than five tokens (1 article + 0 - 3 adjectives + 1 noun) but it's still a small improvement.