Replace double quotes with guillemets

periodista · June 22, 2021, 10:23am

Using the Docker image erikvl87/languagetool i added a custom rule to the grammar.xml category TYPOS like it is explained in the development-overview documentation. It shuold replace double quotes with guillemets.

The rule looks like this

<category id="TYPOS" name="M&#246;gliche Tippfehler">
...
    <rule id="GUILLEMENTS" name="Guillements als Anführungszeichen verwenden">
        <regexp>(^"|\s")([[\d\p{L}\p{Punct}&#38;&#38;[^"]]\s]*)([\d\p{L}\p{Punct}&#38;&#38;[^"]]{1})"</regexp>
        <message>Bitte die französischen Anführungszeichen verwenden <suggestion><match no="1" regexp_match="&#34;" regexp_replace="" />»\2\3«</suggestion></message>
        <example correction="»\2\3«"><marker>"\2\3"</marker></example>
    </rule>

But a double quoted text with multiple sentences is not matched.

eg. “Tom goes out. Tom returns.”

Multiline sentences seperated by commas are matched.

When i try out my regex in an Java Regex Tool the example above is matched.

I think the sentence tokenizer is seperating the sentences before my custom rule is evaluated.

Any suggestions help how to solve this case just with a custom rule in the grammar.xml?

Thanks in advance for any help!

dnaber · June 22, 2021, 11:15am

XML rules only work on the sentence level. You need to use a Java rule to span sentence boundaries.

periodista · June 22, 2021, 11:49am

Thanks for your answer. I was afraid, that it will be nescessary to add a Java rule.