False alarm need to be solved!


<rule>    
    <pattern>
    <marker>
      <token>can</token>
    </marker>
    <token>can</token>
    <token postag="VB"></token>
    </pattern>
    <disambig postag="NN"/>
    <example type="untouched">You can!</example>
    <example type="ambiguous" inputform="can[can/MD,can/NN,can/VB,can/VBP]" outputform="can[can/NN]">The <marker>can</marker> can hold the water.</example>
    </rule>

Can we put two adjacent words which part of speech is not the same as the word is not repeated and add this in WordRepeatRule?

We still want to detect “I can can hold the ladder”. So maybe this is one of the cases where we just need to live with the false alarm. After all it says “possible typo”.

<rule>    
    <pattern case_sensitive="no">
    <token regexp="yes">a|the</token>
    <marker>
      <token>can</token>
    </marker>
    <token>can</token>
    <token postag="VB"></token>
    </pattern>
    <disambig postag="NN"/>
    <example type="untouched">You can!</example>
    <example type="ambiguous" inputform="can[can/MD,can/NN,can/VB,can/VBP]" outputform="can[can/NN]">The <marker>can</marker> can hold the water.</example>
    </rule>

How about this?

I just tried that but then we miss the alarm for e.g. “This is a test sentence sentence”.

Ok! Could we just add this exception in the WordRepeatRule?

If you provide a patch… I personally don’t think it’s worth the effort.

The issue seems to be that “can” is a strict homonym (both a homograph and a homophone).
So this rule will fail in that case, but this is a rare problem.
If LanguageTool could identify that the first “can” was a noun we might have a work around.

For we has added something in the distinguish.xml like above, the LT could identity the first can is NN, would you offer a good idea to solve this false alarm?

Since it’s such a rare case, as dnaber state “I personally don’t think it’s worth the effort”
However, maybe an exception could be added to the rule where if the first “can” is preceded by an “a” or “the” (i.e a determiner) then it’s probably OK.
I think, I might be wrong, but if there is determiner before a “can” it’s probably a noun.

I’ve just looked at EnglishWordRepeatRule.java and noticed there are a number of exceptions.

for example “that that” which looks well defined.
if (wordRepetitionOf(“that”, tokens, position) && nextPOSIsIn(tokens, position, “NN”, “PRP$”, “JJ”, “VBZ”, “VBD”)) {
return true; // “I don’t think that that is a problem.”
}

However “had had” looks like it might need a little work
if (wordRepetitionOf(“had”, tokens, position)) {
return true; // “If I had had time, I would have gone to see him.”
}

Shouldn’t this also test to see if there is a PRP before the first “had”?

On another point would this rule reassign a “can” preceded by a determiner to a Noun?

<rule name="determiner + can ->     NN" id="DT_can">
    <pattern>
        <token postag="DT"><exception postag="PDT" /></token>
        <marker>
            <and>
                <token>can</token>
            </and>
        </marker>
    </pattern>
    <disambig postag="NN" />
</rule>

I’m a little new to this “disambig” token and not 100% sure of it function.

Thanks for your advice. I have added a exception in EnglishWordRepeatRule.java like below and work well.

if (wordRepetitionOf(“can”, tokens, position) && POSIsIn(tokens, position, “NN”)) {
return true; // “The can can hold the water.”
}

/**

  • @author Mility

  • @since 2015/5/15

  • @param tokens

  • @param position

  • @param posTags

  • @return
    */
    private boolean POSIsIn(AnalyzedTokenReadings[] tokens, int position, String… posTags) {
    if (tokens.length > position-1 ) {
    for (String posTag : posTags) {
    if (tokens[position-1 ].hasPartialPosTag(posTag)) {
    return true;
    }
    }

    }
    return false;
    }

and you say “Shouldn’t this also test to see if there is a PRP before the first “had””, is it like this?
if (wordRepetitionOf(“had”, tokens, position)&& prePOSIsIn(tokens, position, “PRP”)) {
return true; // “If I had had time, I would have gone to see him.”
}
private boolean prePOSIsIn(AnalyzedTokenReadings[] tokens, int position, String… posTags) {
if (tokens.length > position-2 ) {
for (String posTag : posTags) {
if (tokens[position-2 ].hasPartialPosTag(posTag)) {
return true;
}
}

  }
  return false;

}

If you could turn this into a pull request at github and add a test case, I could add it.

Thanks, I made two new request as #262 and #263.

Today, I made a new pull request at https://github.com/Mility/languagetool/commit/0919d26820eae18da6355e0eb6cd692ee7c71fc5
and add disambiguation.xml

<rule name="a number of nns" id ="A_NUMBER_OF_NNS">    
           <pattern>
            <token>a</token>
            <token>number</token>
            <token>of</token>
            <marker>
            <token postag="NN|NNS" postag_regexp="yes" ></token>
            </marker>
           </pattern>
           <disambig postag='NNS'/>
       </rule>

Sometimes, there will be two prompts a grammatical error in the same place, how should we do?

If the errors are at exactly the same place, the GUI and the website (languagetool.org) will ignore one of the errors. So as long as both errors are valid errors, nothing needs to be done. If one error is misleading, it’s a good idea to avoid it, maybe using the antipattern feature.

The standalone version seem to couldn’t ignore one of the error, How about those pull:
https://github.com/Mility/languagetool/commit/0919d26820eae18da6355e0eb6cd692ee7c71fc5
and https://github.com/Mility/languagetool/commit/d7748ecbef2c449c725b51967eaced1c61560bdd
and https://github.com/Mility/languagetool/commit/61e5b74063caae2a00d8c1517e31e261bc33cfd3