Back to LanguageTool Homepage - Privacy - Imprint

Word in context with antipattern


(Mike Unwalla) #1

I have a rule that finds a word in context (http://wiki.languagetool.org/tips-and-tricks#toc24). When I add an antipattern (http://wiki.languagetool.org/development-overview#toc13), the rule does not give the result that I expect. Testrules gives an error message if I remove the comments from the last 2 examples.

<rule id="WORD_IN_CONTEXT_WITH_ANTIPATTERN" name="Test: word in context with antipattern">
  <antipattern>
    <token>car</token>
    <token>boot</token>
  </antipattern>
  <pattern>
    <token postag="SENT_START" skip="-1"><exception scope="next">elephant</exception></token>
    <marker>
      <token skip="-1">trunk<exception scope="next">elephant</exception></token>
    </marker>
    <token postag="SENT_END"><exception scope="current">elephant</exception></token>
  </pattern>
  <message>For AmE, use 'trunk' not 'boot'. Context: http://wiki.languagetool.org/tips-and-tricks#toc24, antipattern: http://wiki.languagetool.org/development-overview#toc13</message>
  <short>Antipattern test</short>
  <example type="incorrect" correction="">The automobile has a small <marker>trunk</marker>.</example>
  <example type="incorrect" correction="">A <marker>trunk</marker> is a type of container.</example>
  <example type="correct">The elephant has a long <marker>trunk</marker>.</example>
  <example type="correct">The <marker>trunk</marker> of the elephant is large.</example>
<!--  <example type="correct">The car boot contains a <marker>trunk</marker>.</example>-->
<!--  example type="correct">Put the <marker>trunk</marker> in the car boot.</example>-->
</rule>

Should I be able to use an antipattern with the word-in-context method?


(Daniel Naber) #2

Yes, but the text matched by the antipattern needs to overlap the text matched by the pattern. You can probably force that by using skip, maybe like this (not tested):

   <antipattern>
      <token>car</token>
      <token skip="-1">boot</token>
      <token>trunk</token>
   </antipattern>

(Mike Unwalla) #3

Thanks Daniel. That works.


(Marcin Miłkowski) #4

It's also a good idea to limit the number of words immunized from matching by using <marker>s.

Best,
Marcin


(Mike Unwalla) #5

Marcin,

Thanks, but I don't understand. The antipattern prevents a match. You suggest that I limit the words that are prevented from matching. That makes sense, because I don't want false negatives.

How do I use in the antipattern to limit the words that are prevented from matching?


(Marcin Miłkowski) #6

What I mean is that in some rules, the antipattern might match too much, and prevent the rule to fire in cases when the antipattern just partially matched. I had such cases -- your antipattern is quite specific but it might easily happen with some POS tag pattern that would not match but it should (just imagine that due to skipping you match very remote parts of the rule).

So I would then just limit the marker to the first token or the last one just to be on the safe side in such cases.