Back to LanguageTool Homepage - Privacy - Imprint

Infinite replacement loop with typographical apostrophe


We implemented some XML rules to comply to a specific English style guide where straight quotes (') should be replaced with typographical quotes (), such as:

<rule id="rule10" name="possessive form">
  <regexp type="exact">(\p{Alpha}+)'(\p{Alpha}+)</regexp>
  <message>Use a smart single quotation mark for apostrophe</message>
  <example correction="actress’s">An <marker>actress's</marker> role</example>

It worked fine until LanguageTool 5.2, but since 5.3 some of the rules activate also when the input contains a typographical quote, leading to infinite replacement loops. Apparently this is due to the recent efforts to unify the handling of quotes, and the fact that the English tokenizer now replaces typographical quotes with straight ones:

Hari’s car.

<S> Hari[Hari/NNP,B-NP-singular|E-NP-singular]'s['s/POS,B-NP-singular] car[car/NN,E-NP-singular].[./.,</S>./PCT,O]

Is there a way to deactivate this automatic replacement? Or can we expect the same behaviour as for French/Catalan ( to be implemented for English as well? A “containsTypographicApostrophe” tag would solve our issue.

The “user setting for typographical/typewriter apostrophes” ( would also be an alternative; is it planned for the future?

Thank you,

Thanks for the report. I am aware of this issue, and I will try to provide a solution as soon as possible.

1 Like

Here is the solution:

It is not a full solution. Other XML rules are needed for apostrophe after “s” (United States') or for quotation marks (‘this is a quotation’).

That was fast, thanks a lot!
We will adapt our rules similarly for other cases.