[en-gb] Rule for Oxford spelling: help needed

Mike_Unwalla · August 20, 2018, 5:07pm

I want to create rules to find words that are not the Oxford spelling.I have a prototype rule for singular nouns, but it does not always give the results that I expect. See the ‘False negative’ examples. I cannot figure why the rule does not find these words. Any ideas?

<rule id="OXFORD_SPELLING_NOUNS" name="Oxford spelling of nouns (~ization not ~isation)">
  <pattern>
    <token regexp="yes">([a-z]+?)(?:isation)<exception postag="NNP"/></token>
  </pattern>
  <filter class="org.languagetool.rules.en.EnglishPartialPosTagFilter"
      args="no:1 regexp:(?i)\b([a-z]+?)(?:isation)\b postag_regexp:NN(:UN?)?"/>
  <message>The word '\1' is not the Oxford spelling. Use the Oxford spelling '<suggestion><match no="1" regexp_match="([a-z]+?)(?:isation)\b" regexp_replace="$1ization"/></suggestion>'.</message>
  <url>https://blog.oxforddictionaries.com/2011/03/28/ize-or-ise/</url>
  <short>Oxford spelling: ~ization nouns</short>
  <example correction="organization">The word "<marker>organisation</marker>" is not the Oxford spelling.</example>
  <example>The word '<marker>organization</marker>' is the Oxford spelling.</example>
  <example>The word '<marker>optimization</marker>' is correct.</example>
  <example correction="actualization"><marker>actualisation</marker></example>
  <example correction="alphabetization"><marker>alphabetisation</marker></example>
  <example correction="atomization"><marker>atomisation</marker></example>
  <example correction="authorization"><marker>authorisation</marker></example>
  <example correction="Localization"><marker>Localisation</marker> is not the same as translation.</example>
  <example correction="randomization">The <marker>randomisation</marker> of the data was not easy.</example>
  <example>False negative. The word '<marker>optimisation</marker>' is not the Oxford spelling.</example>
  <example>False negative. <marker>acclimatisation</marker></example>
  <example>False negative. <marker>amortisation</marker></example>
  <example>False negative. <marker>anaesthetisation</marker></example>
</rule>

dnaber · August 20, 2018, 6:42pm

Have you tried simplifying that regex, especially leaving off the \b? If that doesn’t help, does it help to make it more and more simple?

Mike_Unwalla · August 21, 2018, 9:12am

@danielnaber, thanks. The problem was with the parentheses, not with the \b.

Not correct: args="no:1 regexp:(?i)\b([a-z]+?)(?:isation)\b
Correct: args="no:1 regexp:(?i)\b([a-z]+?isation)\b

The incorrect rule found the NN words organ, alphabet, atom, etc and added the suffix. But, it did not find the unknown word ‘optim’. The correct rule finds the postag for the full word.