Help needed with 'suppress_misspelled'

Mike_Unwalla · August 21, 2018, 3:18pm

For the Oxford spelling rules for verbs, I want to prevent a rule from giving a message for verbs such as advertise, advise, appraise, and chastise. I can do that by putting the verbs into an exception on the token. To make the rule as accurate as possible, I also would like to use ‘suppress_misspelled’, because I probably do not have a full list of verbs that cannot be spelled with ize. But, I cannot do what I want.

“You can even suppress the whole rule from being matched if you use the same attribute for any suggestion element” (Development Overview - LanguageTool Wiki). I think this sentence means that I must put ‘suppress_misspelled’ into both the suggestion and the match, as is done in some of the rules in English grammar.xml.

The rule that follows correctly ignores ‘advise’ but gives a message for ‘televise’. Any ideas why?

<rule id="TEST_SUPPRESS_MISSPELLED1" name="Test: suppress_mispelled">
    <pattern>
        <token regexp="yes">([a-z]+?)ise</token>
    </pattern>
    <filter class="org.languagetool.rules.en.EnglishPartialPosTagFilter"
        args="no:1 regexp:(?i)([a-z]+?ise) postag_regexp:VBP?"/>
    <message>TEST1. The word '\1' is not the Oxford spelling. Use '<suggestion suppress_misspelled="yes"><match suppress_misspelled="yes" no="1" regexp_match="([a-z]+?)ise" regexp_replace="$1ize"/></suggestion>'.</message>
    <example correction="organize">The verb '<marker>organise</marker>' is not the Oxford spelling.</example>
    <example>The word '<marker>organize</marker>' is the Oxford spelling.</example>
    <example>We <marker>advise</marker> you to be careful.</example>
    <example correction="televize">They will <marker>televise</marker> the football match.</example>
</rule>

dnaber · August 22, 2018, 7:25am

I see no obvious reason, so this would need real debugging from Java I guess. Just so I understand: “Oxford spelling” is not just en-GB, but even more special and it should be possible to enable it via rules?

Mike_Unwalla · August 22, 2018, 7:58am

Oxford spelling is a style preference that is applicable to en-GB. The Oxford Dictionaries blog has a good summary: Oxford Languages | The Home of Language Data.

My plan is to create a set of rules that a user can enable to check for Oxford spelling (that is, find ~ise spellings).

Mike_Unwalla · August 22, 2018, 3:26pm

The unexpected behaviour is not a problem for me now.

I found an alternative method: put the POS in the token and use regexp_match and regexp_replace in the suggestion. With that method, suppress_misspelled works as I expect.

fkoglin · August 26, 2018, 10:00pm

Oxford spelling actually has its own IETF code, en-GB-oxendict. However, the differences aren’t so large that it couldn’t be implemented with a handful of optional rules in en-GB.

Mike_Unwalla · August 27, 2018, 7:58am

Yes, I know. But thanks for making sure that I know.

This morning I found an OpenOffice .oxt file: Download en_gb-oed.oxt (Apache OpenOffice Extensions). Possibly, we could use that as a source, but I don’t know about copyright status.

Also, @dnaber, I think that the Oxford spelling rules are a good candidate for inclusion in the premium version of LT only. If you agree, tell me, and I will send you the prototype Oxford spelling rules that I have.

dnaber · August 27, 2018, 9:30am

Thanks. The concept of the premium rules is currently that they are active by default, i.e. there isn’t even a UI to enable rules on the website. I understand the Oxford rules would be optional and not active by default?

aafreen · August 27, 2018, 9:36am

How you gave this filter option. For what classes we can use this option???

Mike_Unwalla · August 27, 2018, 10:21am

Yes (if I put the rules into LT). But I thought that maybe the rules would be better only in the premium version, because they are useful for professional proofreaders.

Mike_Unwalla · August 27, 2018, 10:27am

As best I know, the only documentation for EnglishPartialPosTagFilter is in CHANGES.txt (for LT 2.8). Line 165 and following lines tell you how to use the filter.