Back to LanguageTool Homepage - Privacy - Imprint

Help with hyphenated terms including POS

I’m trying to write a rule that will spot hyphenated adverbs with past participles. (Eg, a carefully-written letter). I have written this but it does not work. Can anybody say why?

<rule>
<antipattern>
    <token regexp="yes">family|fly|early|ugly|friendly|lovely|daily|butterfly|dolly|lily|jelly|belly|filly|folly|ally|dilly|early|gully|holly|imply|italy|jolly|july|lolly|mayfly|molly|ply|rally|rely|reply|sally|sicily|sly|tally|telly|willy|wily</token>
    <token spacebefore="no" regexp='yes'>-</token>
<token postag="VBN" spacebefore="no"/>
</antipattern>
<pattern>
    <token regexp="yes">[a-z]+ly</token>
    <token regexp="yes" spacebefore="no">[-]</token>
<token postag="VBN" spacebefore="no"></token>
</pattern>
    
<message>Most adverbial phrases do not need hyphens: **<suggestion>\1 \3</suggestion>**</message>
</rule>

Currently, “carefully-written” (or any word with hyphens) is analyzed as one token.

We have talked sometimes about this issue. We would need to change the tokenizer and address this problem, but it will be a lot of work (CC: @udomai).

Un uncompounder could do this. Or specific Java code.

So, quick thought about the rule by @Maximum:

Their pattern has to be expressed in one single token. The suggestion can then regexp_replace the hyphen with a space. If the message is suppress_misspelled="yes", this will prevent the rule from popping up if the resulting words are not in the dictionary. Right?

Possibly, but postag checking parts is not possible then.

@Maximum, @jaumeortola,

You can use EnglishPartialPosTagFilter to make a rule to find hyphenated adverbs. We do this in grammar-premium4/HYPHENATED_LY_ADVERB_ADJECTIVE.

@udomai, as best I remember, there is no documentation for EnglishPartialPosTagFilter. Can we have some, please? (Also, I did not find Search box on https://dev.languagetool.org/, so I could not check to make sure that there is no documentation.)