Help with hyphenated terms including POS

Maximum · May 24, 2021, 4:54pm

I’m trying to write a rule that will spot hyphenated adverbs with past participles. (Eg, a carefully-written letter). I have written this but it does not work. Can anybody say why?

<rule>
<antipattern>
    <token regexp="yes">family|fly|early|ugly|friendly|lovely|daily|butterfly|dolly|lily|jelly|belly|filly|folly|ally|dilly|early|gully|holly|imply|italy|jolly|july|lolly|mayfly|molly|ply|rally|rely|reply|sally|sicily|sly|tally|telly|willy|wily</token>
    <token spacebefore="no" regexp='yes'>-</token>
<token postag="VBN" spacebefore="no"/>
</antipattern>
<pattern>
    <token regexp="yes">[a-z]+ly</token>
    <token regexp="yes" spacebefore="no">[-]</token>
<token postag="VBN" spacebefore="no"></token>
</pattern>
    
<message>Most adverbial phrases do not need hyphens: **<suggestion>\1 \3</suggestion>**</message>
</rule>

jaumeortola · May 24, 2021, 5:26pm

Currently, “carefully-written” (or any word with hyphens) is analyzed as one token.

We have talked sometimes about this issue. We would need to change the tokenizer and address this problem, but it will be a lot of work (CC: @udomai).

Ruud_Baars · May 25, 2021, 5:25am

Un uncompounder could do this. Or specific Java code.

udomai · May 25, 2021, 6:28am

So, quick thought about the rule by @Maximum:

Their pattern has to be expressed in one single token. The suggestion can then regexp_replace the hyphen with a space. If the message is suppress_misspelled="yes", this will prevent the rule from popping up if the resulting words are not in the dictionary. Right?

Ruud_Baars · May 25, 2021, 7:47am

Possibly, but postag checking parts is not possible then.

Mike_Unwalla · May 26, 2021, 7:35am

@Maximum, @jaumeortola,

You can use EnglishPartialPosTagFilter to make a rule to find hyphenated adverbs. We do this in grammar-premium4/HYPHENATED_LY_ADVERB_ADJECTIVE.

@udomai, as best I remember, there is no documentation for EnglishPartialPosTagFilter. Can we have some, please? (Also, I did not find Search box on https://dev.languagetool.org/, so I could not check to make sure that there is no documentation.)