[rule suggestion] "can is" typo

Are the forums the correct place to suggest rules? The rule editor just links to the generic Support : LanguageTool site, which suggests the forum as the first entry point, so here goes:

Sometimes you end up typing both ‘can’ and ‘is’ in a sentence, because you’ve reformulated it a bunch of times.
Here is my suggested rule:

<!-- English rule, 2020-07-26 -->
<rule id="CAN_IS" name="can is">
 <pattern>
  <token postag='CD|PRP|PRP\$|JJ|JJS|JJR|NN:U|DT' postag_regexp='yes' negate_pos='yes'></token>
  <marker>
  <token>can</token>
  <token>is</token>
  </marker>
 </pattern>
 <message>Did you mean either can or is?</message>
 <example correction=''>Solving this more accurately <marker>can is</marker> important for many applications Can is there way to help you? Finding ways to improve can is important for the future.</example>
 <example>A soda can is a beverage container.</example>
 <example>That can is expired.</example>
 <example>Your can is expired.</example>
 <example>Looks the old garbage can is being sold.</example>
 <example>Looks like the old can is being sold.</example>
 <example>Doing all that you can is good enough.</example>
 <example>I guess doing all that one can is the best anyone can ask for.</example>
 <example correction=''>Can is there way to help you?</example>
 <example>The oldest can is worth 500$</example>
 <example>An even older can is only worth 2$</example>
 <example correction=''>Finding ways to improve can is important for the future.</example>
</rule>

This works fine for most sentences, but it still gives a false positive for Maybe live it up while you still can is the best approach, which I can’t figure out how to fix.

“All one can is cry”

Ah yeah. I assume you mean this as an example wrong sentence, but the rule wouldn’t work for that either. Any idea how to write it better?

Ask @tiff or @Mike_Unwalla

No rule without exceptions. Just try it on a large text collection and check the false/true ratio and amounts. A very rare false positive is not a problem. Most common exceptions can be added as such. Building rules often seems easy, but then the language use kicks in :slight_smile:

“Maybe ‘live it up while you still can’ is the best approach.” should be a closer match to the spoken form.

@atnas, thanks for your contribution.

If you have a rule that solves or partly solves an issue (Issues · languagetool-org/languagetool · GitHub), you can add your rule as a comment to that issue.

If there is no issue for the problem that your rule solves, I suggest that you make an issue and give examples of the problem. Then, add your rule as a comment. That’s my personal preference. @dnaber, do you have a preference/suggestion?

As @Ruud_Baars wrote, try the rule on a large corpus. As a start, you can test the rule with 250,000 sentences. Use the rule editor with devMode: Check a LanguageTool XML rule

The rule that you supplied does not have the correct syntax. It contains examples such as this:
<example correction=''>Can is there way to help you?</example>

The rule editor gives a warning that tells you that <marker> is missing from the ‘incorrect’ examples.

As @SkyCharger001 shows, the problem is missing quote marks. Possibly, in the source text the phrase was in italics. LanguageTool does not know that. Thus, there is a false warning. There is nothing we can do about it.

I ran your rule in the rule edtor (devMode). I found these false positives:

  • This whimsical watering can is filled with 8 oz. of delicious lemon pretzels.
  • After the bottle/can is empty, let the heater run for half an hour.
  • … and features a cover of his song “Southern Can Is Mine.”
  • Among younger speakers, can is more common, with tin referring to a …[Missing quote marks. There is no way to fix the false positive].

I agree, issues are a good place for rules (if you’re familiar with Github pull requests, that’s even better).