Hi, found this in the communication of a co-worker and thought I’d make a rule out of it:
<!-- English rule, 2016-10-20 -->
<rule id="SATSEATSET" name="sat/seat/set">
<pattern>
<token postag='PRP'></token>
<token regexp='yes'>sea?t</token>
<token>together</token>
</pattern>
<message>Instead of "<match no="2" regexp_match="(s)ea?(t)" regexp_replace="$1a$2"/>", did you mean "sat", the past tense form of "sit"?</message>
<short>Did you mean "sat"?</short>
<example correction=''><marker>We set together</marker>.</example>
<example>We sat together.</example>
</rule>
I searched the NOW corpus (English-Corpora: NOW), which has 2.8 billion words. I found 3 incorrect sentences for the structure pronoun+set/seat+together. Thus, I think that this rule is a candidate for the statistics rules (confusion_sets.txt).
@dnaber , when you get time, please look at these pairs, and if applicable, put them in confusion_sets.txt:
seat/sat
set/sat
I’ve added three pairs: seat/sat, set/sat, seat/set. They work quite well, catching between roughly 35% and 75% of confusions with low false alarm rates. They will become active on languagetool.org later today. Thanks!