Infinite replacement loop with typographical apostrophe

Hello,

We implemented some XML rules to comply to a specific English style guide where straight quotes (') should be replaced with typographical quotes (), such as:

<rule id="rule10" name="possessive form">
  <regexp type="exact">(\p{Alpha}+)'(\p{Alpha}+)</regexp>
  <message>Use a smart single quotation mark for apostrophe</message>
  <suggestion>\1’\2</suggestion>
  <url>xxx</url>
  <example correction="actress’s">An <marker>actress's</marker> role</example>
</rule>

It worked fine until LanguageTool 5.2, but since 5.3 some of the rules activate also when the input contains a typographical quote, leading to infinite replacement loops. Apparently this is due to the recent efforts to unify the handling of quotes, and the fact that the English tokenizer now replaces typographical quotes with straight ones:

Hari’s car.

<S> Hari[Hari/NNP,B-NP-singular|E-NP-singular]'s['s/POS,B-NP-singular] car[car/NN,E-NP-singular].[./.,</S>./PCT,O]

Is there a way to deactivate this automatic replacement? Or can we expect the same behaviour as for French/Catalan (Handling apostrophes, quotes and other typography issues (French, Catalan...) · Issue #3390 · languagetool-org/languagetool · GitHub) to be implemented for English as well? A “containsTypographicApostrophe” tag would solve our issue.

The “user setting for typographical/typewriter apostrophes” ([fr] incosistent apostrophe suggestion · Issue #5239 · languagetool-org/languagetool · GitHub) would also be an alternative; is it planned for the future?

Thank you,
Adrien

Thanks for the report. I am aware of this issue, and I will try to provide a solution as soon as possible.

1 Like

Here is the solution: [en] new rule and filter TYPOGRAPHICAL_APOSTROPHE · languagetool-org/languagetool@9c5ec14 · GitHub

It is not a full solution. Other XML rules are needed for apostrophe after “s” (United States') or for quotation marks (‘this is a quotation’).

That was fast, thanks a lot!
We will adapt our rules similarly for other cases.