English: set/sat/seat rule

eska · October 20, 2016, 8:44am

Hi, found this in the communication of a co-worker and thought I’d make a rule out of it:

<!-- English rule, 2016-10-20 -->
<rule id="SATSEATSET" name="sat/seat/set">
 <pattern>
  <token postag='PRP'></token>
  <token regexp='yes'>sea?t</token>
  <token>together</token>
 </pattern>
 <message>Instead of "<match no="2" regexp_match="(s)ea?(t)" regexp_replace="$1a$2"/>", did you mean "sat", the past tense form of "sit"?</message>
 <short>Did you mean "sat"?</short>
 <example correction=''><marker>We set together</marker>.</example>
 <example>We sat together.</example>
</rule>

Mike_Unwalla · October 20, 2016, 4:11pm

@eska,

Thank you.

I searched the NOW corpus (English-Corpora: NOW), which has 2.8 billion words. I found 3 incorrect sentences for the structure pronoun+set/seat+together. Thus, I think that this rule is a candidate for the statistics rules (confusion_sets.txt).

@dnaber , when you get time, please look at these pairs, and if applicable, put them in confusion_sets.txt:
seat/sat
set/sat

dnaber · October 20, 2016, 5:56pm

I’ve added three pairs: seat/sat, set/sat, seat/set. They work quite well, catching between roughly 35% and 75% of confusions with low false alarm rates. They will become active on languagetool.org later today. Thanks!