Back to LanguageTool Homepage - Privacy - Imprint

Problem with regexp-based rule


(Alex) #1

Hello
I have the following text: xxxxxxxx = yyyyyyyyy. Spell checking is needed only for yyyyyyy part, so I added the rule:
[rule name="1" id="2"]
[pattern]
[token regexp="yes"]^(.*?)=[/token]
[/pattern]
[disambig action="immunize"/]
[/rule]

but regexp, mentioned above is not working. It works fine in Java or Javascript and here I see no effect. When regexp changes, for example, to ^(.*?)abc - everything is ok if text is xxxxxxxxabc yyyyyyyyyy
Could you suggest please?


(Andriy) #2

In this case you'll have 3 tokens "xxxxxxxx", "=", "yyyyyyyy" so you have to build your rule to take to account 3 tokens not one. Something like:
[pattern]
[marker]
[token][/token]
[/maker]
[token]=[/token]
[token][/token]
[/pattern]


(Alex) #3

Actually, they may be not separated by space symbol. It may come as following:
xxxxx=yyyyy
or
xxxx = yyyyy
or
xxxx= yyyyy

So, should I create several rules for all these scenarios?


(Andriy) #4

Space is only one of the token separators, for most of the languages = will split the tokens as well.
You can observe how tokens are split here:
http://community.languagetool.org/analysis/analyzeText