False alarm need to be solved!

Mility · May 12, 2015, 12:43pm

<rule>    
    <pattern>
    <marker>
      <token>can</token>
    </marker>
    <token>can</token>
    <token postag="VB"></token>
    </pattern>
    <disambig postag="NN"/>
    <example type="untouched">You can!</example>
    <example type="ambiguous" inputform="can[can/MD,can/NN,can/VB,can/VBP]" outputform="can[can/NN]">The <marker>can</marker> can hold the water.</example>
    </rule>

Can we put two adjacent words which part of speech is not the same as the word is not repeated and add this in WordRepeatRule?

dnaber · May 12, 2015, 4:09pm

We still want to detect “I can can hold the ladder”. So maybe this is one of the cases where we just need to live with the false alarm. After all it says “possible typo”.

Mility · May 13, 2015, 7:29am

<rule>    
    <pattern case_sensitive="no">
    <token regexp="yes">a|the</token>
    <marker>
      <token>can</token>
    </marker>
    <token>can</token>
    <token postag="VB"></token>
    </pattern>
    <disambig postag="NN"/>
    <example type="untouched">You can!</example>
    <example type="ambiguous" inputform="can[can/MD,can/NN,can/VB,can/VBP]" outputform="can[can/NN]">The <marker>can</marker> can hold the water.</example>
    </rule>

How about this?

dnaber · May 13, 2015, 11:38am

I just tried that but then we miss the alarm for e.g. “This is a test sentence sentence”.

Mility · May 13, 2015, 12:29pm

Ok! Could we just add this exception in the WordRepeatRule?

dnaber · May 13, 2015, 1:57pm

If you provide a patch… I personally don’t think it’s worth the effort.

PeterLawrence · May 14, 2015, 2:08pm

The issue seems to be that “can” is a strict homonym (both a homograph and a homophone).
So this rule will fail in that case, but this is a rare problem.
If LanguageTool could identify that the first “can” was a noun we might have a work around.

Mility · May 14, 2015, 3:17pm

For we has added something in the distinguish.xml like above, the LT could identity the first can is NN, would you offer a good idea to solve this false alarm？

PeterLawrence · May 14, 2015, 9:35pm

Since it’s such a rare case, as dnaber state “I personally don’t think it’s worth the effort”
However, maybe an exception could be added to the rule where if the first “can” is preceded by an “a” or “the” (i.e a determiner) then it’s probably OK.
I think, I might be wrong, but if there is determiner before a “can” it’s probably a noun.

PeterLawrence · May 15, 2015, 9:36am

I’ve just looked at EnglishWordRepeatRule.java and noticed there are a number of exceptions.

for example “that that” which looks well defined.
if (wordRepetitionOf(“that”, tokens, position) && nextPOSIsIn(tokens, position, “NN”, “PRP$”, “JJ”, “VBZ”, “VBD”)) {
return true; // “I don’t think that that is a problem.”
}

However “had had” looks like it might need a little work
if (wordRepetitionOf(“had”, tokens, position)) {
return true; // “If I had had time, I would have gone to see him.”
}

Shouldn’t this also test to see if there is a PRP before the first “had”?

On another point would this rule reassign a “can” preceded by a determiner to a Noun?

<rule name="determiner + can ->     NN" id="DT_can">
    <pattern>
        <token postag="DT"><exception postag="PDT" /></token>
        <marker>
            <and>
                <token>can</token>
            </and>
        </marker>
    </pattern>
    <disambig postag="NN" />
</rule>

I’m a little new to this “disambig” token and not 100% sure of it function.

Mility · May 15, 2015, 10:40am

Thanks for your advice. I have added a exception in EnglishWordRepeatRule.java like below and work well.

if (wordRepetitionOf(“can”, tokens, position) && POSIsIn(tokens, position, “NN”)) {
return true; // “The can can hold the water.”
}

/**

@author Mility
@since 2015/5/15
@param tokens
@param position
@param posTags
@return
*/
private boolean POSIsIn(AnalyzedTokenReadings[] tokens, int position, String… posTags) {
if (tokens.length > position-1 ) {
for (String posTag : posTags) {
if (tokens[position-1 ].hasPartialPosTag(posTag)) {
return true;
}
}

}
return false;
}

and you say “Shouldn’t this also test to see if there is a PRP before the first “had””, is it like this?
if (wordRepetitionOf(“had”, tokens, position)&& prePOSIsIn(tokens, position, “PRP”)) {
return true; // “If I had had time, I would have gone to see him.”
}
private boolean prePOSIsIn(AnalyzedTokenReadings[] tokens, int position, String… posTags) {
if (tokens.length > position-2 ) {
for (String posTag : posTags) {
if (tokens[position-2 ].hasPartialPosTag(posTag)) {
return true;
}
}

  }
  return false;

}

dnaber · May 18, 2015, 6:48am

If you could turn this into a pull request at github and add a test case, I could add it.

Mility · May 18, 2015, 9:43am

Thanks, I made two new request as #262 and #263.

Mility · May 22, 2015, 8:40am

Today, I made a new pull request at https://github.com/Mility/languagetool/commit/0919d26820eae18da6355e0eb6cd692ee7c71fc5
and add disambiguation.xml

<rule name="a number of nns" id ="A_NUMBER_OF_NNS">    
           <pattern>
            <token>a</token>
            <token>number</token>
            <token>of</token>
            <marker>
            <token postag="NN|NNS" postag_regexp="yes" ></token>
            </marker>
           </pattern>
           <disambig postag='NNS'/>
       </rule>

Mility · May 23, 2015, 8:30am

Sometimes, there will be two prompts a grammatical error in the same place, how should we do?

dnaber · May 23, 2015, 8:47am

If the errors are at exactly the same place, the GUI and the website (languagetool.org) will ignore one of the errors. So as long as both errors are valid errors, nothing needs to be done. If one error is misleading, it’s a good idea to avoid it, maybe using the antipattern feature.

Mility · May 23, 2015, 8:52am

The standalone version seem to couldn’t ignore one of the error, How about those pull:
https://github.com/Mility/languagetool/commit/0919d26820eae18da6355e0eb6cd692ee7c71fc5
and https://github.com/Mility/languagetool/commit/d7748ecbef2c449c725b51967eaced1c61560bdd
and https://github.com/Mility/languagetool/commit/61e5b74063caae2a00d8c1517e31e261bc33cfd3