Hello,
We implemented some XML rules to comply to a specific English style guide where straight quotes ('
) should be replaced with typographical quotes (’
), such as:
<rule id="rule10" name="possessive form">
<regexp type="exact">(\p{Alpha}+)'(\p{Alpha}+)</regexp>
<message>Use a smart single quotation mark for apostrophe</message>
<suggestion>\1’\2</suggestion>
<url>xxx</url>
<example correction="actress’s">An <marker>actress's</marker> role</example>
</rule>
It worked fine until LanguageTool 5.2, but since 5.3 some of the rules activate also when the input contains a typographical quote, leading to infinite replacement loops. Apparently this is due to the recent efforts to unify the handling of quotes, and the fact that the English tokenizer now replaces typographical quotes with straight ones:
Hari’s car.
→
<S> Hari[Hari/NNP,B-NP-singular|E-NP-singular]'s['s/POS,B-NP-singular] car[car/NN,E-NP-singular].[./.,</S>./PCT,O]
Is there a way to deactivate this automatic replacement? Or can we expect the same behaviour as for French/Catalan (Handling apostrophes, quotes and other typography issues (French, Catalan...) · Issue #3390 · languagetool-org/languagetool · GitHub) to be implemented for English as well? A “containsTypographicApostrophe” tag would solve our issue.
The “user setting for typographical/typewriter apostrophes” ([fr] incosistent apostrophe suggestion · Issue #5239 · languagetool-org/languagetool · GitHub) would also be an alternative; is it planned for the future?
Thank you,
Adrien