How to find the tab character?

Mike_Unwalla · October 23, 2020, 1:12pm

@udomai, update 2020-10-23

I changed the rule to use the Unicode code point: Unicode Character 'CHARACTER TABULATION' (U+0009). Also, I used tokens rather than regexp. Unicode code points work fine in NON_STANDARD_ALPHABETIC_CHARACTERS.

<rule id="TAB_CHARACTER2" name="Find a tab character">
    <pattern>
        <token/>
        <token regexp="yes">\b(\u0009|\u043E)\b</token>
        <token/>
    </pattern>
    <message>Found a tab v2</message>
    <short>Tab</short>
    <example correction="">Cyrillic small <marker>letter о is</marker> found.</example>
    <example correction="">Tab character between <marker>two	words</marker>.</example>
    <example>No tab character.</example>
</rule>

Testrules gives this warning:
Exception in thread “main” org.languagetool.rules.patterns.PatternRuleTest$PatternRuleTestFailure: Test failure for rule TAB_CHARACTER2[1] in file /org/languagetool/rules/en/grammar.xml: Tab character between twowords."
Errors expected: 1
Errors found : 0
Message: Found a tab v2

Note: twowords

LT does not ‘see’ the tab: