[en] Dash rules just like in pt_PT

Hello!

Could someone implement dash rules similar to the ones implemented by Tiago Santos?

PT:

EN:

Thanks!

Kind regards,

Marco, the rules are similar. Just translate the message and example strings and push to github. Double check with English maintainers, just in case.

P.S. - I notice now there is an entity that needs to be created and localized. This:
<token regexp='yes'>\d+;&meses_ano;|&meses_ano_abrev;|&dias_semana;|&dias_semana_abrev;</token>
This is for compatibility with the half dash rule.

Done!

Could you/someone check/fix if it is all okay and commit it?

I noticed that the bottom of your second rule had examples in Spanish but it is very easy to fix.

Thanks!

  <!DOCTYPE rules [
    <!ENTITY meses_ano_abrev "jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec">
    <!ENTITY meses_ano "january|february|march|april|may|june|july|august|september|october|november|december">
    <!ENTITY dias_semana_abrev "mon|tue|wed|thu|fri|sat|sun">
    <!ENTITY dias_semana "monday|tuesday|wednesday|thursday|friday|saturday|sunday">


  <rulegroup id="DASH_RULE" name='Hyphen, Half-Dash and Dash'>
    <!-- Created by Tiago F. Santos, 2017-01-23 -->
      <url>https://pt.wikipedia.org/wiki/Travessão</url>
    <rule>
      <pattern>
          <token postag='SENT_START'/>
          <token min='0' regexp='yes'>["«»“”]</token>
        <marker>
          <token regexp='yes'>-|–</token>
        </marker>
      </pattern>
      <message>In dialogues and enumerations you must use the dash.</message>
        <suggestion>—</suggestion>
      <short>In this situation you must use the dash.</short>
      <example correction='—'><marker>-</marker> What is that, mother?</example>
      <example correction='—'>« <marker>-</marker> What is that, mother?</example>
      <example>— It's your birthday present, my daughter.</example>
    </rule>
    <rule>
      <antipattern>
          <token regexp='yes'>\d+;&meses_ano;|&meses_ano_abrev;|&dias_semana;|&dias_semana_abrev;</token>
          <token regexp='yes'>-|–</token>
          <token regexp='yes'>\d+;&meses_ano;|&meses_ano_abrev;|&dias_semana;|&dias_semana_abrev;</token>
      </antipattern>
      <pattern>
        <marker>
          <token spacebefore="yes" regexp='yes'>-|–</token>
        </marker>
          <token spacebefore="yes"/>
      </pattern>
      <message>If you do not want to join two words, you must use the dash.</message>
        <suggestion>—</suggestion>
      <short>In this situation you must use the dash.</short>
      <example correction='—'>In these educational establishments there were enrollments <marker>-</marker> mostly from elementary school — and a total of teachers.</example>
      <example correction='—'>Institute Ricci de Macau <marker>-</marker> Association of cultural promotion of the Company of Jesus in Macau</example>
      <example>In the Midwest and northwest portion are higher elevations, reaching 500 meters above sea level, highlighting Serra do Tumucumaque and Sierra Lombarda.</example>
    </rule>
    <rule>
      <pattern>
          <token regexp='yes'>\d+|;&meses_ano;|&meses_ano_abrev;|&dias_semana;|&dias_semana_abrev;</token>
          <token regexp='yes'>-|—</token>
          <token regexp='yes'>\d+|;&meses_ano;|&meses_ano_abrev;|&dias_semana;|&dias_semana_abrev;</token>
      </pattern>
      <message>If you want to indicate a period of time, you must use half-dash.</message>
        <suggestion>\1 – \3</suggestion>
        <suggestion>\1–\3</suggestion>
      <short>In this situation you must use the half-dash.</short>
      <example correction='1901 – 1978|1901–1978'>Vitorino Nemésio (<marker>1901 - 1978</marker>) — writer and university professor.</example>
    </rule>
  </rulegroup>

    <rule id="ENUMERATION_AND_DASHES" name="Enumerations with dashes: 1.2.-">
    <!-- Localized and improved by Tiago F. Santos, Portuguese rule, 2016-10-27 -->
      <pattern>
          <token regexp="yes">\d[\d.]*</token>
          <token>.</token>
          <token regexp="yes" min="1" max="3">[—–‒-]</token>
      </pattern>
      <message>The dashes are unnecessary in enumerations.</message>
        <suggestion>\1.</suggestion>
      <example correction="1."><marker>1.-</marker> Introduction</example>
      <example correction="1."><marker>1.--</marker> Introduction</example>
    <!-- <example correction="1.3."><marker>1.3—</marker> Introducció</example>-->
      <example>50-30</example>
      <example>...with a rock...—, I believe that...</example>
    </rule>

Great. Don’t forget to update your credits as the translator.
That rule (ENUMERATION_AND_DASHES), I believe was localized from the Catalan file. I added that information here and added a few improvements with the new tokenizer.
If the English tokenizer works in a similar way (which I believe it does) you may want to replace the pattern to this:

<rule id="ENUMERATION_AND_DASHES" name="Enumerações com travessões: 1.2.-">
<!-- Localized and improved from Catalan by Tiago F. Santos, 2016-10-27 -->
  <pattern>
  <token postag='SENT_START'/>
  <token regexp="yes">\d+(\.\d+)?</token>
  <token>.</token>
  <token regexp="yes" min="1" max="3">[—–‒-]</token>
  </pattern>
  <message>Os travessões são desnecessários em enumerações.</message>
<suggestion>\2.</suggestion>

This avoids potential false positives in the middle of a sentence (not detected by the usual regression test).

@tiagosantos

Could you commit it?

:slight_smile:

I am so scared to screw things up in EN :frowning:

Is anyone there who can do it?

Thanks!

@tiagosantos

Yes, I seem a lot active…

:slight_smile:

That is because I had an accident at work (dislocated shoulder) and have a leave for weeks… probably around two months… I have been using my right hand to write :slight_smile:

Updated the code and improved the translation a bit:


  <!DOCTYPE rules [
    <!ENTITY meses_ano_abrev "jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec">
    <!ENTITY meses_ano "january|february|march|april|may|june|july|august|september|october|november|december">
    <!ENTITY dias_semana_abrev "mon|tue|wed|thu|fri|sat|sun">
    <!ENTITY dias_semana "monday|tuesday|wednesday|thursday|friday|saturday|sunday">


  <rulegroup id="DASH_RULE" name='Hyphen, Half-Dash and Dash'>
    <!-- Created by Tiago F. Santos, 2017-01-23 -->
      <url>https://pt.wikipedia.org/wiki/Travessão</url>
    <rule>
      <pattern>
          <token postag='SENT_START'/>
          <token min='0' regexp='yes'>["«»“”]</token>
        <marker>
          <token regexp='yes'>-|–</token>
        </marker>
      </pattern>
      <message>In dialogues and enumerations you must use a dash.</message>
        <suggestion>—</suggestion>
      <short>In this situation you must use the dash.</short>
      <example correction='—'><marker>-</marker> What is that, mother?</example>
      <example correction='—'>« <marker>-</marker> What is that, mother?</example>
      <example>— It's your birthday present, my daughter.</example>
    </rule>
    <rule>
      <antipattern>
          <token regexp='yes'>\d+;&meses_ano;|&meses_ano_abrev;|&dias_semana;|&dias_semana_abrev;</token>
          <token regexp='yes'>-|–</token>
          <token regexp='yes'>\d+;&meses_ano;|&meses_ano_abrev;|&dias_semana;|&dias_semana_abrev;</token>
      </antipattern>
      <pattern>
        <marker>
          <token spacebefore="yes" regexp='yes'>-|–</token>
        </marker>
          <token spacebefore="yes"/>
      </pattern>
      <message>If you do not want to join two words, you must use a dash.</message>
        <suggestion>—</suggestion>
      <short>In this situation you must use the dash.</short>
      <example correction='—'>In these educational establishments there were enrollments <marker>-</marker> mostly from elementary school — and a total of teachers.</example>
      <example correction='—'>Institute Ricci de Macau <marker>-</marker> Association of cultural promotion of the Company of Jesus in Macau</example>
      <example>In the Midwest and Northwest portion are higher elevations, reaching 500 meters above sea level, highlighting Serra do Tumucumaque and Sierra Lombarda.</example>
    </rule>
    <rule>
      <pattern>
          <token regexp='yes'>\d+|;&meses_ano;|&meses_ano_abrev;|&dias_semana;|&dias_semana_abrev;</token>
          <token regexp='yes'>-|—</token>
          <token regexp='yes'>\d+|;&meses_ano;|&meses_ano_abrev;|&dias_semana;|&dias_semana_abrev;</token>
      </pattern>
      <message>If you want to indicate a period of time, you must use a half-dash.</message>
        <suggestion>\1 – \3</suggestion>
        <suggestion>\1–\3</suggestion>
      <short>In this situation you must use a half-dash.</short>
      <example correction='1901 – 1978|1901–1978'>Vitorino Nemésio (<marker>1901 - 1978</marker>) — writer and university teacher.</example>
    </rule>
  </rulegroup>

    <rule id="ENUMERATION_AND_DASHES" name="Enumerations with dashes: 1.2.-">
    <!-- Localized and improved from Catalan by Tiago F. Santos, 2016-10-27 -->
      <pattern>
          <token postag='SENT_START'/>
          <token regexp="yes">\d+(\.\d+)?</token>
          <token>.</token>
          <token regexp="yes" min="1" max="3">[—–‒-]</token>
      </pattern>
      <message>Dashes are unnecessary in enumerations.</message>
        <suggestion>\2.</suggestion>
      <example correction="1."><marker>1.-</marker> Introduction</example>
      <example correction="1."><marker>1.--</marker> Introduction</example>
      <example correction="1.3."><marker>1.3.—</marker> Introduction</example>
      <example>50-30</example>
      <example>...with a rock...—, I believe that...</example>
    </rule>

Excellent but you should address the English maintainers. Portuguese is the only language that, for now, I volunteer my time.

@marcoagpinto , I noticed these problems.

  1. A link to the English Wikipedia is necessary.
  2. Use the English entity definitions in the rules.
  3. ‘«’ is not standard English.
  4. I have never heard the term ‘half-dash’. The standard terms are ‘hyphen’ (-), ‘m-dash’ (—), ‘n-dash’ (–). Chicago Manual of Style, 15th edition also mentions the ‘2-em dash’ and the ‘3-em dash’.

@Mike_Unwalla

Is it okay now?:

  <!DOCTYPE rules [
    <!ENTITY months_year_abrev "jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec">
    <!ENTITY months_year "january|february|march|april|may|june|july|august|september|october|november|december">
    <!ENTITY days_week_abrev "mon|tue|wed|thu|fri|sat|sun">
    <!ENTITY days_week "monday|tuesday|wednesday|thursday|friday|saturday|sunday">


  <rulegroup id="DASH_RULE" name='Hyphen, n-dash and m-dash'>
    <!-- Created by Tiago F. Santos, 2017-01-23 -->
	<!-- Localised to English by Marco A.G.Pinto, 2017-04-02 -->
      <url>https://en.wikipedia.org/wiki/Dash#Em_dash</url>
    <rule>
      <pattern>
          <token postag='SENT_START'/>
          <token min='0' regexp='yes'>["«»“”]</token>
        <marker>
          <token regexp='yes'>-|–</token>
        </marker>
      </pattern>
      <message>In dialogues and enumerations you must use an m-dash.</message>
        <suggestion>—</suggestion>
      <short>In this situation you must use an m-dash.</short>
      <example correction='—'><marker>-</marker> What is that, mother?</example>
      <example correction='—'>« <marker>-</marker> What is that, mother?</example>
      <example>— It's your birthday present, my daughter.</example>
    </rule>
    <rule>
      <antipattern>
          <token regexp='yes'>\d+;&months_year;|&months_year_abrev;|&days_week;|&days_week_abrev;</token>
          <token regexp='yes'>-|–</token>
          <token regexp='yes'>\d+;&months_year;|&months_year_abrev;|&days_week;|&days_week_abrev;</token>
      </antipattern>
      <pattern>
        <marker>
          <token spacebefore="yes" regexp='yes'>-|–</token>
        </marker>
          <token spacebefore="yes"/>
      </pattern>
      <message>If you do not want to join two words, you must use an m-dash.</message>
        <suggestion>—</suggestion>
      <short>In this situation you must use an m-dash.</short>
      <example correction='—'>In these educational establishments there were enrollments <marker>-</marker> mostly from elementary school — and a total of teachers.</example>
      <example correction='—'>Institute Ricci de Macau <marker>-</marker> Association of cultural promotion of the Company of Jesus in Macau</example>
      <example>In the Midwest and Northwest portion are higher elevations, reaching 500 meters above sea level, highlighting Serra do Tumucumaque and Sierra Lombarda.</example>
    </rule>
    <rule>
      <pattern>
          <token regexp='yes'>\d+|;&months_year;|&months_year_abrev;|&days_week;|&days_week_abrev;</token>
          <token regexp='yes'>-|—</token>
          <token regexp='yes'>\d+|;&months_year;|&months_year_abrev;|&days_week;|&days_week_abrev;</token>
      </pattern>
      <message>If you want to indicate a period of time, you must use a half-dash.</message>
        <suggestion>\1 – \3</suggestion>
        <suggestion>\1–\3</suggestion>
      <short>In this situation you must use an n-dash.</short>
      <example correction='1901 – 1978|1901–1978'>Vitorino Nemésio (<marker>1901 - 1978</marker>) — writer and university teacher.</example>
    </rule>
  </rulegroup>

    <rule id="ENUMERATION_AND_DASHES" name="Enumeration with dashes: 1.2.-">
    <!-- Localized and improved from Catalan by Tiago F. Santos, 2016-10-27 -->
	<!-- Localised to English by Marco A.G.Pinto, 2017-04-02 -->
      <pattern>
          <token postag='SENT_START'/>
          <token regexp="yes">\d+(\.\d+)?</token>
          <token>.</token>
          <token regexp="yes" min="1" max="3">[—–‒-]</token>
      </pattern>
      <message>Dashes are unnecessary in enumeration.</message>
        <suggestion>\2.</suggestion>
      <example correction="1."><marker>1.-</marker> Introduction</example>
      <example correction="1."><marker>1.--</marker> Introduction</example>
      <example correction="1.3."><marker>1.3.—</marker> Introduction</example>
      <example>50-30</example>
      <example>...with a rock...—, I believe that...</example>
    </rule>

Do some minor fixes if necessary, like the «

Thanks!

this is only on the internal, and it makes the rule solid on ‘non-standard’ texts. Especially on the Internet, they exist.

«The en dash, n dash, n-rule, or “nut” (–) is traditionally half the width of an em dash. In modern fonts, the length of the en dash is not standardized …»
From en.wikipedia.org. Although it is not technical, it is accurate. Anyway, Wikipedia is never a good reference.

By the way, I re-assess and make a minor contribution.
Daniel made huge strides in the n-gram front.

I share with the forum an automated triage of his Big Data extraction shared on GitHub. The criteria for this extractions exceeds the actual minimum standard used by the already selected n-gram pairs (p=1,000, >1000 combined samples, and using the minimum factor that complies with both criteria). They just need synonyms.

Cheers!

abridged; abridges; 100000;             #  p=1.000, r=0.745, 999+36
accommodated; accommodates; 100000;     #  p=1.000, r=0.286, 1000+1000
acidity; avidity; 1000000;              #  p=1.000, r=0.206, 1000+127
acorns; scorns; 100000;                 #  p=1.000, r=0.437, 999+192
acquiesced; acquiesces; 1000000;        #  p=1.000, r=0.111, 895+169
addressed; addressee; 100000;           #  p=1.000, r=0.630, 1000+550
addressed; addresses; 1000000;          #  p=1.000, r=0.458, 1000+1000
adjudicated; adjudicates; 100000;       #  p=1.000, r=0.492, 951+143
aged; ages; 10000;                      #  p=1.000, r=0.503, 991+1000
alleviated; alleviates; 10000;          #  p=1.000, r=0.504, 1000+336
antiqued; antiques; 1000000;            #  p=1.000, r=0.284, 34+1000
aped; apes; 100;                        #  p=1.000, r=0.846, 66+1000
apes; apex; 10000;                      #  p=1.000, r=0.252, 1000+999
arched; arches; 100000;                 #  p=1.000, r=0.458, 1000+1000
atoned; stoned; 100000;                 #  p=1.000, r=0.218, 198+1000
averred; averted; 1000000;              #  p=1.000, r=0.268, 217+1000
axed; aced; 10000;                      #  p=1.000, r=0.296, 1000+154
bartering; battering; 100000;           #  p=1.000, r=0.389, 376+1000
barters; batters; 10000;                #  p=1.000, r=0.347, 59+1000
basal; nasal; 1000000;                  #  p=1.000, r=0.323, 999+1000
batched; batches; 100000;               #  p=1.000, r=0.721, 70+1000
bats; bags; 10000;                      #  p=1.000, r=0.206, 997+999
battering; nattering; 1000000;          #  p=1.000, r=0.511, 1000+61
bearing; beating; 100000;               #  p=1.000, r=0.282, 1000+1000
belated; belayed; 10000;                #  p=1.000, r=0.652, 1000+29
belled; belles; 100000;                 #  p=1.000, r=0.326, 48+1000
biddies; buddies; 100000;               #  p=1.000, r=0.222, 22+1000
bilking; billing; 10000;                #  p=1.000, r=0.621, 45+1000
birding; girding; 1000;                 #  p=1.000, r=0.185, 1000+57
births; girths; 100000;                 #  p=1.000, r=0.549, 999+50
blasting; boasting; 1000000;            #  p=1.000, r=0.138, 1000+1000
blinders; blunders; 1000000;            #  p=1.000, r=0.144, 277+1000
blob; blog; 1000000;                    #  p=1.000, r=0.309, 1000+997
blog; blot; 1000000;                    #  p=1.000, r=0.201, 996+1000
blower; glower; 1000000;                #  p=1.000, r=0.273, 1000+29
blunders; bounders; 100000;             #  p=1.000, r=0.357, 1000+53
bolster; booster; 1000000;              #  p=1.000, r=0.326, 1000+999
boodle; noodle; 10000;                  #  p=1.000, r=0.245, 116+1000
bossed; bosses; 1000000;                #  p=1.000, r=0.372, 154+1000
botched; botches; 100000;               #  p=1.000, r=0.454, 1000+95
bounced; bounded; 1000000;              #  p=1.000, r=0.346, 1000+1000
bowels; vowels; 100000;                 #  p=1.000, r=0.297, 774+996
braced; braces; 1000000;                #  p=1.000, r=0.235, 1000+1000
braced; graced; 10000;                  #  p=1.000, r=0.336, 1000+1000
braces; braves; 100000;                 #  p=1.000, r=0.170, 1000+999
brad; bead; 100000;                     #  p=1.000, r=0.185, 988+1000
braided; braised; 1000000;              #  p=1.000, r=0.249, 1000+410
brains; grains; 100000;                 #  p=1.000, r=0.188, 999+1000
briefs; griefs; 100000;                 #  p=1.000, r=0.222, 1000+90
broadsided; broadsides; 1000000;        #  p=1.000, r=0.459, 46+1000
brooms; grooms; 10000;                  #  p=1.000, r=0.105, 755+1000
bubbled; bubbles; 10000;                #  p=1.000, r=0.338, 293+1000
buds; bids; 100000;                     #  p=1.000, r=0.258, 1000+998
buffered; buffeted; 1000000;            #  p=1.000, r=0.187, 832+297
buffers; buffets; 1000000;              #  p=1.000, r=0.134, 1000+213
bulky; bully; 10000;                    #  p=1.000, r=0.311, 1000+1000
bullies; gullies; 100000;               #  p=1.000, r=0.110, 1000+1000
bunched; bunches; 1000000;              #  p=1.000, r=0.476, 569+1000
burros; burrow; 1000000;                #  p=1.000, r=0.273, 353+1000
bushy; gushy; 1000000;                  #  p=1.000, r=0.387, 1000+71
busying; busting; 10000;                #  p=1.000, r=0.451, 61+1000
butted; buttes; 100000;                 #  p=1.000, r=0.239, 435+1000
butted; gutted; 100000;                 #  p=1.000, r=0.475, 435+1000
canonized; canonizes; 1000000;          #  p=1.000, r=0.704, 1000+38
capitulated; capitulates; 1000000;      #  p=1.000, r=0.143, 1000+104
carrels; cartels; 1000000;              #  p=1.000, r=0.413, 101+1000
cawing; casing; 100000;                 #  p=1.000, r=0.487, 27+1000
censer; denser; 10000;                  #  p=1.000, r=0.534, 254+1000
chimed; chimes; 100000;                 #  p=1.000, r=0.223, 470+999
chinks; chunks; 100000;                 #  p=1.000, r=0.378, 121+1000
chorused; choruses; 100000;             #  p=1.000, r=0.419, 37+1000
clears; cleats; 100000;                 #  p=1.000, r=0.422, 1000+456
cling; clung; 10000;                    #  p=1.000, r=0.319, 1000+852
cochleae; cochlear; 1000000;            #  p=1.000, r=0.693, 21+1000
collage; collate; 100000;               #  p=1.000, r=0.318, 1000+484
collared; collated; 1000;               #  p=1.000, r=0.413, 998+1000
collide; collude; 100000;               #  p=1.000, r=0.102, 999+180
collided; colluded; 1000000;            #  p=1.000, r=0.289, 1000+407
colliding; colluding; 10000;            #  p=1.000, r=0.280, 1000+422
colonnaded; colonnades; 1000000;        #  p=1.000, r=0.241, 436+586
comically; conically; 1000;             #  p=1.000, r=0.245, 1000+86
commemorated; commemorates; 10000;      #  p=1.000, r=0.465, 1000+999
conch; cinch; 1000000;                  #  p=1.000, r=0.236, 1000+271
condole; console; 100000;               #  p=1.000, r=0.445, 22+997
confer; conger; 100000;                 #  p=1.000, r=0.315, 1000+999
consigned; consignee; 10000;            #  p=1.000, r=0.798, 1000+100
cookie; coolie; 100000;                 #  p=1.000, r=0.104, 997+707
cookies; coolies; 1000000;              #  p=1.000, r=0.111, 1000+521
cooler; cooker; 1000000;                #  p=1.000, r=0.354, 1000+1000
coordinated; coordinates; 100000;       #  p=1.000, r=0.520, 1000+992
copied; copies; 100000;                 #  p=1.000, r=0.491, 1000+1000
corroborated; corroborates; 1000;       #  p=1.000, r=0.527, 1000+384
couched; couches; 100000;               #  p=1.000, r=0.473, 574+607
crawls; drawls; 100000;                 #  p=1.000, r=0.422, 1000+24
craws; crass; 100;                      #  p=1.000, r=0.509, 21+1000
cried; dried; 100000;                   #  p=1.000, r=0.487, 996+996
cued; curd; 100000;                     #  p=1.000, r=0.346, 519+1000
culled; dulled; 1000000;                #  p=1.000, r=0.206, 1000+225
cur; cir; 10000;                        #  p=1.000, r=0.393, 1000+1000
curds; cures; 100000;                   #  p=1.000, r=0.274, 388+1000
dabbled; dabbles; 1000000;              #  p=1.000, r=0.109, 1000+194
dame; came; 100000;                     #  p=1.000, r=0.588, 992+989
dangs; fangs; 100000;                   #  p=1.000, r=0.634, 65+1000
dashed; dashes; 1000000;                #  p=1.000, r=0.275, 1000+1000
decried; decries; 1000000;              #  p=1.000, r=0.170, 1000+227
deducing; seducing; 1000;               #  p=1.000, r=0.331, 424+1000
defaced; defaces; 10000;                #  p=1.000, r=0.526, 1000+70
dehydrated; dehydrates; 1000000;        #  p=1.000, r=0.472, 1000+79
demarcated; demarcates; 100000;         #  p=1.000, r=0.557, 1000+167
demoralized; demoralizes; 100000;       #  p=1.000, r=0.544, 1000+21
demur; femur; 100000;                   #  p=1.000, r=0.698, 94+1000
denominated; denominates; 1000;         #  p=1.000, r=0.733, 1000+21
deplored; deplores; 10000;              #  p=1.000, r=0.148, 938+239
deportee; deported; 1000000;            #  p=1.000, r=0.718, 133+1000
devolved; devolves; 1000000;            #  p=1.000, r=0.310, 1000+309
diffs; doffs; 10;                       #  p=1.000, r=0.313, 989+22
dignified; signified; 1000000;          #  p=1.000, r=0.282, 1000+1000
diked; dikes; 100000;                   #  p=1.000, r=0.576, 79+999
dimes; domes; 1000000;                  #  p=1.000, r=0.108, 1000+1000
dinning; donning; 1000;                 #  p=1.000, r=0.364, 259+1000
disagreed; disagrees; 1000000;          #  p=1.000, r=0.116, 999+1000
discharged; discharges; 1000000;        #  p=1.000, r=0.451, 1000+1000
disciplined; disciplines; 1000000;      #  p=1.000, r=0.435, 1000+1000
discoursed; discourses; 1000000;        #  p=1.000, r=0.301, 57+1000
disgraced; disgraces; 100000;           #  p=1.000, r=0.327, 1000+76
dismantled; dismantles; 1000000;        #  p=1.000, r=0.712, 1000+146
disparage; disparate; 10000;            #  p=1.000, r=0.706, 1000+1000
disproved; disproves; 1000000;          #  p=1.000, r=0.416, 1000+244
dissociated; dissociates; 1000000;      #  p=1.000, r=0.255, 741+391
dockers; dockets; 100000;               #  p=1.000, r=0.102, 1000+182
dockets; sockets; 100000;               #  p=1.000, r=0.328, 182+999
dogged; fogged; 10000;                  #  p=1.000, r=0.379, 999+120
dome; dime; 1000000;                    #  p=1.000, r=0.127, 998+999
domesticated; domesticates; 100000;     #  p=1.000, r=0.554, 1000+44
dote; doge; 1000;                       #  p=1.000, r=0.773, 131+1000
doubled; doubles; 1000;                 #  p=1.000, r=0.238, 1000+992
drabs; drags; 10000;                    #  p=1.000, r=0.534, 51+999
dramatized; dramatizes; 10000;          #  p=1.000, r=0.467, 1000+421
draws; craws; 100000;                   #  p=1.000, r=0.405, 1000+21
drenched; drenches; 100000;             #  p=1.000, r=0.474, 1000+57
dues; dies; 100000;                     #  p=1.000, r=0.327, 1000+993
dullness; fullness; 100000;             #  p=1.000, r=0.212, 300+999
dwindled; swindled; 10000;              #  p=1.000, r=0.578, 1000+385
dyeing; eyeing; 1000000;                #  p=1.000, r=0.383, 1000+396
dyes; dues; 10000;                      #  p=1.000, r=0.475, 1000+1000
eared; dared; 100;                      #  p=1.000, r=0.880, 999+1000
eaters; esters; 1000000;                #  p=1.000, r=0.341, 1000+1000
elide; elude; 1000000;                  #  p=1.000, r=0.319, 100+1000
elucidated; elucidates; 1000000;        #  p=1.000, r=0.470, 1000+226
emir; emit; 1000;                       #  p=1.000, r=0.567, 1000+1000
emptied; empties; 1000000;              #  p=1.000, r=0.241, 1000+1000
enabled; enables; 100000;               #  p=1.000, r=0.190, 1000+1000
ennobled; ennobles; 10000;              #  p=1.000, r=0.783, 1000+39
eons; dons; 1000000;                    #  p=1.000, r=0.132, 597+1000
epitomized; epitomizes; 100000;         #  p=1.000, r=0.306, 787+388
equated; equates; 100000;               #  p=1.000, r=0.287, 1000+999
eroded; erodes; 1000000;                #  p=1.000, r=0.383, 1000+440
ester; eater; 10000;                    #  p=1.000, r=0.321, 999+998
etas; eras; 1000000;                    #  p=1.000, r=0.412, 69+1000
excluded; excludes; 1000000;            #  p=1.000, r=0.350, 1000+1000
excreted; excretes; 1000000;            #  p=1.000, r=0.774, 1000+166
exemplified; exemplifies; 1000000;      #  p=1.000, r=0.357, 1000+1000
exiled; exiles; 1000000;                #  p=1.000, r=0.233, 1000+999
exp; esp; 1000000;                      #  p=1.000, r=0.147, 1000+991
expatriated; expatriates; 1000000;      #  p=1.000, r=0.207, 140+1000
fabled; gabled; 10000;                  #  p=1.000, r=0.462, 1000+1000
facial; racial; 10000;                  #  p=1.000, r=0.575, 1000+1000
fame; dame; 1000;                       #  p=1.000, r=0.582, 977+992
fares; fared; 10000;                    #  p=1.000, r=0.369, 999+1000
fascinated; fascinates; 1000000;        #  p=1.000, r=0.654, 1000+226
fated; fared; 100000;                   #  p=1.000, r=0.860, 1000+1000
faulted; vaulted; 1000000;              #  p=1.000, r=0.320, 1000+1000
feels; reels; 1000000;                  #  p=1.000, r=0.330, 999+998
fells; fella; 10000;                    #  p=1.000, r=0.158, 995+1000
fenced; fences; 1000000;                #  p=1.000, r=0.362, 1000+1000
fend; fens; 100000;                     #  p=1.000, r=0.697, 1000+996
fens; dens; 1000000;                    #  p=1.000, r=0.132, 996+1000
fertilized; fertilizes; 1000000;        #  p=1.000, r=0.705, 1000+161
fetching; retching; 100;                #  p=1.000, r=0.529, 958+94
fiats; fists; 10000;                    #  p=1.000, r=0.345, 146+1000
fiddling; riddling; 10000;              #  p=1.000, r=0.324, 1000+142
figs; fogs; 100000;                     #  p=1.000, r=0.371, 1000+390
figs; fits; 1000000;                    #  p=1.000, r=0.349, 1000+1000
flags; flats; 100000;                   #  p=1.000, r=0.120, 995+999
flaming; foaming; 1000000;              #  p=1.000, r=0.107, 1000+553
flaring; glaring; 1000000;              #  p=1.000, r=0.313, 747+1000
flays; flats; 10;                       #  p=1.000, r=0.618, 27+999
flips; flops; 1000000;                  #  p=1.000, r=0.213, 1000+1000
flirting; flitting; 100000;             #  p=1.000, r=0.342, 999+125
flossy; glossy; 1000000;                #  p=1.000, r=0.483, 123+1000
fluctuated; fluctuates; 100000;         #  p=1.000, r=0.172, 1000+878
foamed; roamed; 10000;                  #  p=1.000, r=0.630, 171+1000
foil; foul; 100000;                     #  p=1.000, r=0.193, 1000+999
frenzied; frenzies; 100;                #  p=1.000, r=0.685, 930+81
fumbled; rumbled; 100000;               #  p=1.000, r=0.112, 1000+179
fumbling; rumbling; 1000000;            #  p=1.000, r=0.111, 488+658
fumed; fumes; 100000;                   #  p=1.000, r=0.627, 163+1000
funnels; runnels; 10000;                #  p=1.000, r=0.265, 1000+653
galling; balling; 10000;                #  p=1.000, r=0.190, 545+1000
garb; barb; 100000;                     #  p=1.000, r=0.169, 1000+999
garbed; barbed; 100000;                 #  p=1.000, r=0.576, 185+999
gases; bases; 1000000;                  #  p=1.000, r=0.273, 999+1000
gashing; bashing; 100000;               #  p=1.000, r=0.246, 46+1000
gasp; hasp; 1000000;                    #  p=1.000, r=0.243, 997+88
gates; bates; 1000000;                  #  p=1.000, r=0.202, 995+991
gauged; gauges; 1000000;                #  p=1.000, r=0.395, 771+1000
gazetted; gazettes; 100000;             #  p=1.000, r=0.568, 1000+338
gear; hear; 1000000;                    #  p=1.000, r=0.402, 994+1000
gears; hears; 1000;                     #  p=1.000, r=0.611, 997+1000
genders; tenders; 100000;               #  p=1.000, r=0.301, 1000+1000
generated; generates; 100000;           #  p=1.000, r=0.348, 1000+1000
gestured; gestures; 1000000;            #  p=1.000, r=0.492, 182+1000
ginned; gunned; 10000;                  #  p=1.000, r=0.875, 49+1000
glace; glade; 1000000;                  #  p=1.000, r=0.109, 918+1000
glee; flee; 10000;                      #  p=1.000, r=0.400, 998+999
glued; glues; 1000000;                  #  p=1.000, r=0.456, 1000+423
goggled; goggles; 10000;                #  p=1.000, r=0.558, 28+1000
goggles; toggles; 100000;               #  p=1.000, r=0.275, 1000+175
golfing; goofing; 1000000;              #  p=1.000, r=0.242, 1000+257
gorging; forging; 1000000;              #  p=1.000, r=0.345, 78+1000
grains; trains; 1000000;                #  p=1.000, r=0.168, 1000+995
grander; grandee; 10000;                #  p=1.000, r=0.674, 1000+818
grayed; grated; 100;                    #  p=1.000, r=0.757, 136+1000
graze; braze; 1000000;                  #  p=1.000, r=0.593, 1000+101
gyro; tyro; 1000;                       #  p=1.000, r=0.177, 1000+307
hag; gag; 1000000;                      #  p=1.000, r=0.125, 1000+1000
hailed; hauled; 100000;                 #  p=1.000, r=0.336, 1000+1000
hailed; jailed; 1000000;                #  p=1.000, r=0.207, 1000+1000
hammed; jammed; 10000;                  #  p=1.000, r=0.625, 52+1000
hamstring; hamstrung; 1000;             #  p=1.000, r=0.722, 1000+235
harrow; narrow; 1000000;                #  p=1.000, r=0.334, 997+998
haying; hating; 100000;                 #  p=1.000, r=0.318, 74+999
hearer; heater; 1000;                   #  p=1.000, r=0.308, 292+999
hearers; heaters; 100000;               #  p=1.000, r=0.456, 379+999
heedlessly; needlessly; 10000;          #  p=1.000, r=0.345, 36+1000
hens; gens; 100000;                     #  p=1.000, r=0.167, 992+999
hie; hid; 100000;                       #  p=1.000, r=0.417, 414+1000
hilt; jilt; 100000;                     #  p=1.000, r=0.534, 1000+93
hilt; gilt; 1000000;                    #  p=1.000, r=0.190, 1000+1000
hinged; hinted; 1000000;                #  p=1.000, r=0.387, 1000+1000
hinted; hunted; 10000;                  #  p=1.000, r=0.505, 1000+999
hitch; hutch; 1000;                     #  p=1.000, r=0.170, 999+999
hitter; hotter; 1000;                   #  p=1.000, r=0.661, 999+999
hitting; hotting; 100000;               #  p=1.000, r=0.754, 995+20
hoaxed; hoaxes; 100000;                 #  p=1.000, r=0.147, 149+1000
hooped; hopped; 1000000;                #  p=1.000, r=0.289, 309+800
hops; hips; 100000;                     #  p=1.000, r=0.162, 1000+1000
hues; hies; 10;                         #  p=1.000, r=0.685, 1000+46
humeral; numeral; 100000;               #  p=1.000, r=0.633, 321+1000
hurdled; hurdles; 1000000;              #  p=1.000, r=0.438, 51+959
hyped; hypes; 10000;                    #  p=1.000, r=0.350, 999+46
idle; isle; 100000;                     #  p=1.000, r=0.265, 998+994
impoverished; impoverishes; 10000;      #  p=1.000, r=0.837, 1000+21
inborn; unborn; 10000;                  #  p=1.000, r=0.543, 752+1000
incest; invest; 1000000;                #  p=1.000, r=0.407, 1000+1000
inciting; inditing; 100;                #  p=1.000, r=0.793, 1000+21
included; includes; 1000000;            #  p=1.000, r=0.165, 990+989
indicated; indicates; 1000000;          #  p=1.000, r=0.356, 999+993
infilled; unfilled; 1000000;            #  p=1.000, r=0.196, 611+561
inmate; innate; 1000000;                #  p=1.000, r=0.405, 1000+1000
insure; unsure; 1000;                   #  p=1.000, r=0.845, 1000+1000
intruded; intrudes; 10000;              #  p=1.000, r=0.348, 935+391
ionized; ionizes; 100000;               #  p=1.000, r=0.659, 1000+156
irate; orate; 100000;                   #  p=1.000, r=0.330, 1000+139
irks; iris; 10000;                      #  p=1.000, r=0.417, 216+994
irradiated; irradiates; 10000;          #  p=1.000, r=0.710, 1000+31
irrigate; irritate; 100000;             #  p=1.000, r=0.271, 1000+892
irrigated; irritated; 1000000;          #  p=1.000, r=0.348, 997+1000
irrigating; irritating; 1000000;        #  p=1.000, r=0.380, 324+1000
irrigation; irritation; 1000000;        #  p=1.000, r=0.356, 998+1000
jailed; mailed; 1000000;                #  p=1.000, r=0.330, 1000+999
jangled; mangled; 1000000;              #  p=1.000, r=0.200, 22+1000
jeeps; keeps; 1000000;                  #  p=1.000, r=0.296, 1000+997
jilt; kilt; 100000;                     #  p=1.000, r=0.224, 93+999
joked; jokes; 100000;                   #  p=1.000, r=0.292, 1000+1000
jugs; mugs; 10000;                      #  p=1.000, r=0.115, 950+1000
jumps; mumps; 100000;                   #  p=1.000, r=0.294, 1000+1000
kilos; kills; 1000000;                  #  p=1.000, r=0.292, 853+998
knee; knew; 100000;                     #  p=1.000, r=0.688, 991+998
laminae; laminar; 100000;               #  p=1.000, r=0.525, 391+1000
lanced; lances; 1000000;                #  p=1.000, r=0.364, 114+1000
languished; languishes; 10000;          #  p=1.000, r=0.251, 1000+91
lapsing; lapwing; 1000000;              #  p=1.000, r=0.181, 221+1000
latched; latches; 100000;               #  p=1.000, r=0.179, 581+573
limed; limes; 100;                      #  p=1.000, r=0.475, 46+999
limping; lumping; 100000;               #  p=1.000, r=0.286, 646+636
lipped; lopped; 100000;                 #  p=1.000, r=0.549, 1000+114
liquidated; liquidates; 1000000;        #  p=1.000, r=0.625, 1000+52
livable; lovable; 10000;                #  p=1.000, r=0.253, 782+1000
lobes; loves; 100000;                   #  p=1.000, r=0.557, 1000+997
loci; loco; 1000000;                    #  p=1.000, r=0.279, 1000+1000
loiters; looters; 1000;                 #  p=1.000, r=0.591, 22+1000
looped; lopped; 100000;                 #  p=1.000, r=0.207, 1000+114
lounged; lounges; 100000;               #  p=1.000, r=0.394, 34+1000
loved; lived; 1000;                     #  p=1.000, r=0.552, 999+1000
lunched; lunches; 10000;                #  p=1.000, r=0.658, 176+1000
lunches; lynches; 10000;                #  p=1.000, r=0.755, 1000+97
lusted; listed; 1000000;                #  p=1.000, r=0.807, 149+998
lynched; lynches; 1000000;              #  p=1.000, r=0.613, 1000+97
maims; mains; 1000;                     #  p=1.000, r=0.580, 57+1000
mangled; mangles; 1000000;              #  p=1.000, r=0.249, 1000+254
marrows; narrows; 1000000;              #  p=1.000, r=0.151, 62+1000
mashed; mashes; 1000000;                #  p=1.000, r=0.455, 999+80
matted; mattes; 10000;                  #  p=1.000, r=0.371, 630+378
mesa; mess; 100000;                     #  p=1.000, r=0.270, 991+998
metamorphosed; metamorphoses; 100000;   #  p=1.000, r=0.250, 1000+1000
mired; mires; 100000;                   #  p=1.000, r=0.598, 1000+322
molds; moles; 1000000;                  #  p=1.000, r=0.116, 999+1000
moored; mooted; 100000;                 #  p=1.000, r=0.248, 1000+1000
mosses; misses; 10000;                  #  p=1.000, r=0.453, 999+1000
muskeg; musket; 1000000;                #  p=1.000, r=0.353, 337+1000
mute; mite; 100000;                     #  p=1.000, r=0.212, 999+999
nagging; bagging; 100000;               #  p=1.000, r=0.301, 1000+689
nailing; bailing; 1000000;              #  p=1.000, r=0.154, 1000+565
neater; beater; 1000000;                #  p=1.000, r=0.301, 506+918
negatived; negatives; 1000000;          #  p=1.000, r=0.434, 33+1000
nerved; nerves; 1000000;                #  p=1.000, r=0.730, 76+1000
neuritic; neurotic; 1000000;            #  p=1.000, r=0.400, 29+1000
nods; bods; 10000;                      #  p=1.000, r=0.412, 988+69
nullified; nullifies; 1000000;          #  p=1.000, r=0.403, 1000+243
oaf; oar; 1000000;                      #  p=1.000, r=0.202, 182+1000
oafs; oars; 1000000;                    #  p=1.000, r=0.519, 22+1000
oared; pared; 1000000;                  #  p=1.000, r=0.663, 360+1000
obliterated; obliterates; 1000000;      #  p=1.000, r=0.544, 1000+207
obsoleted; obsoletes; 100000;           #  p=1.000, r=0.718, 999+81
officiated; officiates; 10000;          #  p=1.000, r=0.214, 1000+212
outfight; outright; 1000000;            #  p=1.000, r=0.423, 29+1000
overhear; overheat; 100000;             #  p=1.000, r=0.114, 505+640
overstated; overstayed; 1000000;        #  p=1.000, r=0.512, 1000+153
overstated; overstates; 100000;         #  p=1.000, r=0.527, 1000+199
pancaked; pancakes; 100000;             #  p=1.000, r=0.569, 28+997
pantomimed; pantomimes; 100000;         #  p=1.000, r=0.247, 30+979
pants; pangs; 1000000;                  #  p=1.000, r=0.251, 1000+410
paraded; parades; 100000;               #  p=1.000, r=0.393, 1000+1000
paroled; parolee; 10000;                #  p=1.000, r=0.779, 1000+119
participated; participates; 1000000;    #  p=1.000, r=0.400, 1000+1000
patted; parted; 1000000;                #  p=1.000, r=0.643, 180+999
pedigreed; pedigrees; 100000;           #  p=1.000, r=0.317, 173+946
perforated; perforates; 1000000;        #  p=1.000, r=0.571, 1000+49
perfumed; perfumes; 1000000;            #  p=1.000, r=0.153, 514+1000
perpetrated; perpetrates; 100000;       #  p=1.000, r=0.703, 1000+42
perpetuated; perpetuates; 100000;       #  p=1.000, r=0.397, 1000+648
picketed; pocketed; 10000;              #  p=1.000, r=0.118, 581+551
picketing; pocketing; 1000000;          #  p=1.000, r=0.174, 989+419
piked; pikes; 100000;                   #  p=1.000, r=0.122, 99+1000
piled; piles; 1000000;                  #  p=1.000, r=0.399, 1000+1000
pinched; punched; 1000000;              #  p=1.000, r=0.228, 857+1000
pleaded; pleased; 1000000;              #  p=1.000, r=0.590, 1000+1000
plumed; plumes; 100000;                 #  p=1.000, r=0.329, 482+1000
poets; ports; 1000000;                  #  p=1.000, r=0.123, 999+1000
ponds; pones; 10000;                    #  p=1.000, r=0.657, 998+28
posed; poses; 10000;                    #  p=1.000, r=0.230, 1000+1000
posy; posh; 100000;                     #  p=1.000, r=0.121, 277+1000
pouched; pouches; 100000;               #  p=1.000, r=0.490, 360+1000
predatory; prefatory; 1000000;          #  p=1.000, r=0.381, 1000+418
predeceased; predeceases; 10000;        #  p=1.000, r=0.664, 1000+21
predicated; predicates; 10000;          #  p=1.000, r=0.670, 1000+1000
predisposed; predisposes; 1000;         #  p=1.000, r=0.660, 799+213
prefaced; prefaces; 100000;             #  p=1.000, r=0.492, 1000+841
preoccupied; preoccupies; 10000;        #  p=1.000, r=0.875, 1000+24
pretest; pretext; 1000000;              #  p=1.000, r=0.512, 93+1000
prides; prudes; 1000;                   #  p=1.000, r=0.751, 1000+89
proliferated; proliferates; 100000;     #  p=1.000, r=0.251, 1000+106
prom; pron; 1000000;                    #  p=1.000, r=0.130, 999+513
promoter; prompter; 1000000;            #  p=1.000, r=0.156, 999+196
prophesied; prophesies; 1000000;        #  p=1.000, r=0.211, 1000+431
pruned; prunes; 1000000;                #  p=1.000, r=0.287, 1000+723
puddle; piddle; 100000;                 #  p=1.000, r=0.339, 1000+159
pullet; puller; 10000;                  #  p=1.000, r=0.201, 82+932
pulley; pullet; 10000;                  #  p=1.000, r=0.482, 1000+82
puttee; putter; 100000;                 #  p=1.000, r=0.256, 55+1000
quaked; quakes; 10000;                  #  p=1.000, r=0.371, 22+998
quantified; quantifies; 100000;         #  p=1.000, r=0.533, 1000+461
quarried; quarries; 10000;              #  p=1.000, r=0.523, 1000+1000
quibbled; quibbles; 10000;              #  p=1.000, r=0.310, 34+997
radiated; radiates; 1000000;            #  p=1.000, r=0.296, 1000+934
raking; faking; 100000;                 #  p=1.000, r=0.137, 928+1000
randomized; randomizes; 100000;         #  p=1.000, r=0.725, 999+31
ranted; ranged; 1000000;                #  p=1.000, r=0.624, 153+998
rattling; tattling; 100000;             #  p=1.000, r=0.350, 968+72
receded; recedes; 100000;               #  p=1.000, r=0.235, 1000+415
rectified; rectifies; 10000;            #  p=1.000, r=0.670, 1000+71
reefs; reeds; 1000000;                  #  p=1.000, r=0.268, 1000+998
referential; reverential; 1000000;      #  p=1.000, r=0.332, 1000+271
reimbursed; reimburses; 10000;          #  p=1.000, r=0.626, 1000+184
rejuvenated; rejuvenates; 100;          #  p=1.000, r=0.579, 1000+75
rekindled; rekindles; 1000000;          #  p=1.000, r=0.291, 1000+240
rel; eel; 1000;                         #  p=1.000, r=0.389, 1000+1000
render; fender; 1000000;                #  p=1.000, r=0.269, 998+997
retook; retool; 100000;                 #  p=1.000, r=0.257, 1000+282
rife; fife; 1000000;                    #  p=1.000, r=0.316, 1000+1000
riffs; tiffs; 100000;                   #  p=1.000, r=0.173, 1000+76
rills; fills; 10000;                    #  p=1.000, r=0.623, 102+1000
rink; fink; 100000;                     #  p=1.000, r=0.157, 999+997
riyals; royals; 1000000;                #  p=1.000, r=0.140, 188+999
robs; fobs; 100000;                     #  p=1.000, r=0.414, 945+179
safer; sager; 10000;                    #  p=1.000, r=0.327, 1000+1000
safer; wafer; 10000;                    #  p=1.000, r=0.461, 1000+1000
sanctified; sanctifies; 10000;          #  p=1.000, r=0.548, 1000+61
scapulae; scapular; 1000000;            #  p=1.000, r=0.175, 196+915
scarfed; scarred; 100000;               #  p=1.000, r=0.472, 27+999
sci; xci; 1000;                         #  p=1.000, r=0.153, 997+81
scorched; scorches; 1000000;            #  p=1.000, r=0.478, 1000+62
scour; scout; 1000000;                  #  p=1.000, r=0.139, 838+998
sculptured; sculptures; 100000;         #  p=1.000, r=0.267, 1000+1000
sect; secy; 100000;                     #  p=1.000, r=0.464, 999+58
secy; sexy; 1000;                       #  p=1.000, r=0.504, 58+999
seedy; weedy; 10000;                    #  p=1.000, r=0.244, 1000+576
sens; dens; 1000000;                    #  p=1.000, r=0.243, 999+1000
shale; shake; 10000;                    #  p=1.000, r=0.371, 1000+995
sheen; shewn; 1000000;                  #  p=1.000, r=0.295, 997+77
ship; shop; 1000000;                    #  p=1.000, r=0.102, 994+998
shores; snores; 10000;                  #  p=1.000, r=0.338, 1000+39
shout; snout; 10000;                    #  p=1.000, r=0.147, 993+1000
shrews; shrewd; 1000000;                #  p=1.000, r=0.391, 1000+1000
shrink; shrunk; 100000;                 #  p=1.000, r=0.364, 1000+1000
siding; aiding; 10000;                  #  p=1.000, r=0.500, 1000+1000
signified; signifies; 100000;           #  p=1.000, r=0.144, 1000+1000
silhouetted; silhouettes; 1000000;      #  p=1.000, r=0.218, 452+1000
silting; wilting; 100000;               #  p=1.000, r=0.222, 732+321
skates; slates; 100000;                 #  p=1.000, r=0.189, 1000+999
skew; slew; 100000;                     #  p=1.000, r=0.292, 1000+1000
skimming; slimming; 100000;             #  p=1.000, r=0.335, 1000+328
skip; slip; 1000000;                    #  p=1.000, r=0.106, 998+998
skipped; slipped; 100;                  #  p=1.000, r=0.350, 1000+999
skis; skid; 10000;                      #  p=1.000, r=0.225, 999+1000
slicer; sliver; 100000;                 #  p=1.000, r=0.331, 376+984
slicer; slider; 100000;                 #  p=1.000, r=0.114, 376+1000
slops; slips; 1000000;                  #  p=1.000, r=0.250, 70+1000
slot; slit; 10000;                      #  p=1.000, r=0.303, 1000+1000
slows; sloes; 100000;                   #  p=1.000, r=0.718, 1000+45
sniffing; snuffing; 1000000;            #  p=1.000, r=0.199, 998+49
spates; spares; 100;                    #  p=1.000, r=0.644, 52+1000
sped; aped; 1000000;                    #  p=1.000, r=0.621, 1000+66
spike; spoke; 1000000;                  #  p=1.000, r=0.331, 995+997
spindled; spindles; 100000;             #  p=1.000, r=0.539, 74+1000
spite; spire; 10000;                    #  p=1.000, r=0.735, 1000+999
spitting; spotting; 10000;              #  p=1.000, r=0.228, 1000+998
spoil; spool; 1000000;                  #  p=1.000, r=0.280, 999+1000
sprats; sprays; 1000000;                #  p=1.000, r=0.342, 73+1000
stab; stag; 1000000;                    #  p=1.000, r=0.156, 999+999
stabs; stags; 1000;                     #  p=1.000, r=0.322, 998+999
staging; stating; 100000;               #  p=1.000, r=0.493, 1000+1000
stagnated; stagnates; 10000;            #  p=1.000, r=0.295, 939+72
staked; stakes; 1000000;                #  p=1.000, r=0.185, 1000+1000
stapled; staples; 1000000;              #  p=1.000, r=0.116, 384+1000
stiff; stuff; 1000000;                  #  p=1.000, r=0.229, 1000+996
stifled; stifles; 1000000;              #  p=1.000, r=0.339, 894+206
stint; stunt; 100000;                   #  p=1.000, r=0.107, 1000+999
stockpiled; stockpiles; 1000000;        #  p=1.000, r=0.272, 614+864
stride; strife; 1000000;                #  p=1.000, r=0.224, 1000+999
stride; strode; 1000000;                #  p=1.000, r=0.213, 1000+1000
striped; stripes; 100000;               #  p=1.000, r=0.188, 1000+1000
strive; strove; 1000000;                #  p=1.000, r=0.226, 1000+1000
strobe; strove; 1000000;                #  p=1.000, r=0.525, 999+1000
stuccoed; stuccoes; 100000;             #  p=1.000, r=0.335, 1000+84
subset; sunset; 100000;                 #  p=1.000, r=0.394, 1000+991
substantiated; substantiates; 1000;     #  p=1.000, r=0.687, 1000+281
subsumed; subsumes; 1000000;            #  p=1.000, r=0.615, 1000+258
suffused; suffuses; 10;                 #  p=1.000, r=0.847, 1000+49
sulks; silks; 1000000;                  #  p=1.000, r=0.257, 79+1000
surged; surges; 1000000;                #  p=1.000, r=0.220, 1000+1000
sutured; sutures; 1000000;              #  p=1.000, r=0.576, 253+1000
swearer; sweater; 10000;                #  p=1.000, r=0.500, 78+1000
swears; sweats; 100000;                 #  p=1.000, r=0.413, 1000+355
sword; swore; 1000000;                  #  p=1.000, r=0.552, 1000+1000
tackled; tackles; 10000;                #  p=1.000, r=0.257, 1000+996
tacks; racks; 100000;                   #  p=1.000, r=0.254, 526+1000
taints; taunts; 100000;                 #  p=1.000, r=0.159, 179+1000
tanks; ranks; 1000000;                  #  p=1.000, r=0.137, 997+998
taped; gaped; 1000000;                  #  p=1.000, r=0.385, 1000+77
teamed; reamed; 10000;                  #  p=1.000, r=0.677, 1000+79
tend; rend; 1000000;                    #  p=1.000, r=0.644, 1000+426
tends; rends; 100000;                   #  p=1.000, r=0.745, 1000+63
tensed; tenses; 1000000;                #  p=1.000, r=0.336, 296+1000
tent; gent; 1000000;                    #  p=1.000, r=0.130, 997+999
tided; tides; 1000000;                  #  p=1.000, r=0.519, 26+1000
tides; rides; 100000;                   #  p=1.000, r=0.171, 1000+1000
timed; rimed; 10000;                    #  p=1.000, r=0.594, 1000+33
tinged; ringed; 1000000;                #  p=1.000, r=0.342, 1000+998
tipping; ripping; 10000;                #  p=1.000, r=0.134, 1000+1000
toed; toes; 100000;                     #  p=1.000, r=0.766, 1000+999
toiling; tooling; 10000;                #  p=1.000, r=0.276, 375+1000
told; gold; 100000;                     #  p=1.000, r=0.444, 1000+948
tolling; tilling; 100000;               #  p=1.000, r=0.115, 888+717
torpedoed; torpedoes; 100000;           #  p=1.000, r=0.558, 1000+1000
tote; rote; 100000;                     #  p=1.000, r=0.166, 1000+1000
totting; rotting; 1000;                 #  p=1.000, r=0.676, 39+998
tout; rout; 10000;                      #  p=1.000, r=0.455, 999+998
traced; traces; 1000000;                #  p=1.000, r=0.638, 1000+1000
trampled; tramples; 100000;             #  p=1.000, r=0.507, 1000+113
traumatized; traumatizes; 10000;        #  p=1.000, r=0.693, 1000+37
tripe; gripe; 10000;                    #  p=1.000, r=0.214, 893+599
tuba; tubs; 1000;                       #  p=1.000, r=0.306, 1000+999
tucking; ticking; 100000;               #  p=1.000, r=0.560, 260+1000
tums; gums; 1000000;                    #  p=1.000, r=0.495, 126+1000
typified; typifies; 1000000;            #  p=1.000, r=0.480, 1000+440
understated; understates; 10000;        #  p=1.000, r=0.527, 1000+120
unsettled; unsettles; 10000;            #  p=1.000, r=0.715, 1000+60
upstage; upstate; 100000;               #  p=1.000, r=0.503, 401+998
vane; cane; 1000000;                    #  p=1.000, r=0.121, 1000+998
vanished; vanishes; 1000000;            #  p=1.000, r=0.118, 997+1000
vaster; caster; 1000000;                #  p=1.000, r=0.128, 106+1000
vatted; batted; 100000;                 #  p=1.000, r=0.676, 34+1000
vend; fend; 100000;                     #  p=1.000, r=0.787, 128+1000
vending; fending; 10000;                #  p=1.000, r=0.861, 999+835
venerated; venerates; 100000;           #  p=1.000, r=0.653, 1000+160
victimized; victimizes; 100;            #  p=1.000, r=0.836, 1000+34
vie; fie; 10000;                        #  p=1.000, r=0.451, 1000+617
vignetted; vignettes; 100000;           #  p=1.000, r=0.402, 24+999
vindicated; vindicates; 1000000;        #  p=1.000, r=0.444, 1000+123
visualized; visualizes; 10000;          #  p=1.000, r=0.631, 1000+251
volts; colts; 100000;                   #  p=1.000, r=0.340, 1000+997
vow; cow; 1000000;                      #  p=1.000, r=0.105, 997+995
vowed; cowed; 1000000;                  #  p=1.000, r=0.528, 1000+274
vowed; bowed; 1000000;                  #  p=1.000, r=0.244, 1000+1000
wafts; warts; 100000;                   #  p=1.000, r=0.566, 51+1000
wager; eager; 100000;                   #  p=1.000, r=0.397, 1000+999
waltzed; waltzes; 100000;               #  p=1.000, r=0.226, 39+1000
webs; wens; 10000;                      #  p=1.000, r=0.551, 1000+120
wedded; weeded; 100000;                 #  p=1.000, r=0.625, 919+381
wetter; setter; 10000;                  #  p=1.000, r=0.376, 1000+999
widely; wifely; 100;                    #  p=1.000, r=0.968, 1000+76
wilt; silt; 1000000;                    #  p=1.000, r=0.319, 999+1000
winced; winded; 1000000;                #  p=1.000, r=0.779, 27+1000
witched; witches; 1000000;              #  p=1.000, r=0.494, 312+1000
witters; sitters; 1000000;              #  p=1.000, r=0.591, 32+1000
wowing; sowing; 100000;                 #  p=1.000, r=0.670, 64+999
wretches; wretched; 1000000;            #  p=1.000, r=0.334, 157+1000
yeas; teas; 100000;                     #  p=1.000, r=0.442, 109+1000
yours; hours; 1000000;                  #  p=1.000, r=0.376, 971+988
yours; tours; 100000;                   #  p=1.000, r=0.484, 971+994
zeroed; zeroes; 100000;                 #  p=1.000, r=0.289, 255+1000

Hi @marcoagpinto,

Testrules gives an error message for rule ENUMERATION_AND_DASHES. I could not find the cause of the problem:

Running pattern rule tests for English... Exception in thread "main" java.lang.AssertionError: English rule ENUMERATION_
AND_DASHES[1]:
"1.3.? Introduction"
Errors expected: 1
Errors found   : 0

I edited the rulegroup DASH_RULE (indents to match the other rules, and use the default entity names). Testrules validates successfully. But, when I do the maven tests, I get this error (in part):

Failed tests:
  JLanguageToolTest.testEnglish:114->assertNoError:127 Did not expect an error in test sentence: 'The sea ice is highly
variable - frozen solid during cold, calm weather and broke...', but got: [DASH_RULE:31-32:If you do not want to join tw
o words, you must use an m-dash.] expected:<0> but was:<1>

Tests run: 67, Failures: 1, Errors: 0, Skipped: 5

[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] languagetool-parent ................................ SUCCESS [  0.440 s]
[INFO] LanguageTool Style and Grammar Checker Core ........ SUCCESS [ 39.459 s]
[INFO] English module for LanguageTool .................... FAILURE [ 41.215 s]

But, that error makes no sense to me. The sentence, "The sea ice… " is not in the rule. (I know that maven can be ‘confused’, so I did a clean test a couple of times.)

Here’s my edited version of DASH_RULE:

<rulegroup id="DASH_RULE" name='Hyphen, n-dash and m-dash'>
    <!-- Created by Tiago F. Santos, 2017-01-23 -->
    <!-- Localised to English by Marco A.G.Pinto, 2017-04-02 -->
    <url>https://en.wikipedia.org/wiki/Dash#Em_dash</url>
    <rule>
        <pattern>
            <token postag='SENT_START'/>
            <token min='0' regexp='yes'>["«»“”]</token>
            <marker>
                <token regexp='yes'>-|–</token>
           </marker>
        </pattern>
        <message>In dialogues and enumerations you must use an m-dash.</message>
        <suggestion>—</suggestion>
        <short>Use an m-dash.</short>
        <example correction='—'><marker>-</marker> What is that, mother?</example>
        <example correction='—'>« <marker>-</marker> What is that, mother?</example>
        <example>— It's your birthday present, my daughter.</example>
    </rule>
    <rule>
        <antipattern>
            <token regexp='yes'>\d+;&months;|&abbrevMonths;|&weekdays;|&abbrevWeekdays;</token>
            <token regexp='yes'>-|–</token>
            <token regexp='yes'>\d+;&months;|&abbrevMonths;|&weekdays;|&abbrevWeekdays;</token>
        </antipattern>
        <pattern>
            <marker>
                <token spacebefore="yes" regexp='yes'>-|–</token>
            </marker>
            <token spacebefore="yes"/>
        </pattern>
        <message>If you do not want to join two words, you must use an m-dash.</message>
        <suggestion>—</suggestion>
        <short>Use an m-dash.</short>
        <example correction='—'>In these educational establishments there were enrollments <marker>-</marker> mostly from elementary school — and a total of teachers.</example>
        <example correction='—'>Institute Ricci de Macau <marker>-</marker> Association of cultural promotion of the Company of Jesus in Macau</example>
        <example>In the Midwest and Northwest portion are higher elevations, reaching 500 meters above sea level, highlighting Serra do Tumucumaque and Sierra Lombarda.</example>
    </rule>
    <rule>
        <pattern>
            <token regexp='yes'>\d+|;&months;|&abbrevMonths;|&weekdays;|&abbrevWeekdays;</token>
            <token regexp='yes'>-|—</token>
            <token regexp='yes'>\d+|;&months;|&abbrevMonths;|&weekdays;|&abbrevWeekdays;</token>
        </pattern>
        <message>If you want to indicate a period of time, you must use a half-dash.</message>
        <suggestion>\1 – \3</suggestion>
        <suggestion>\1–\3</suggestion>
        <short>Use an n-dash.</short>
        <example correction='1901 – 1978|1901–1978'>Vitorino Nemésio (<marker>1901 - 1978</marker>) — writer and university teacher.</example>
    </rule>
</rulegroup>

testEnglish is a Java test. the hifen must be replaced with the m-dash. Alternatively, this can be set as a sentence with an error.

Running pattern rule tests for English… Exception in thread “main” java.lang.AssertionError: English rule ENUMERATION_
AND_DASHES[1]:
“1.3.? Introduction”

The question mark should be replaced by an hiphen (-) or a n-dash (aka half-dash).

@Yakov

Can you help?

It is a very important rule since not even M$ Word 2016 has it.

Thanks!

I do not understand. ‘Testrules’ tells me that the rules in rulegroup DASH_RULE are correct. Maven tells me that one rule is not correct.

The question mark is in the testrules error message for rule ENUMERATION_AND_DASHES. Marco’s rule above has this example:

As I wrote previously, I could not find the cause of the problem.

I fix it in languagetool-language-modules/en/src/test/java/org/languagetool/JLanguageToolTest.java

You rock!

:slight_smile:

Thank you!

Added DASH_RULE ([en] Add rulegroup DASH_RULE · languagetool-org/languagetool@cfab2e5 · GitHub).

Thanks @Yakov. @tiagosantos, now I understand.

@marcoagpinto I didn’t add ENUMERATION_AND_DASHES because of the testrules error.

Tested on my computer and testrules.sh worked. It could be a possible UNICODE to ASCII error, but travis still complains. I reverted now that commit. It is an interesting situation and I believe it has to do with the tokenizer.

EnglishWordTokenizer.java

  @Override
  public String getTokenizingCharacters() {
    return super.getTokenizingCharacters() + "–";  // n-dash
  }

I will look into it, but I will have to leave in a few moments.