Searching for specific unicode characters

SafeTex · December 1, 2012, 8:07pm

Hello

I’d like to search for specific Unicode characters

So far, I have this

\P{u00AB}

where U00AB = «

But it doesn’t work and I just can’t spend more time on LT at the moment cos of work.

Can anyone help me with my code ?

Thanks

dnaber · December 2, 2012, 6:06pm

On Sa 01.12.2012, 12:07:37 you wrote:

\P{u00AB}
where U00AB = «

You don’t need to escape characters, as the grammar.xml is UTF-8 and can
contain the characters directly:

«

–
http://www.danielnaber.de

Vincenzo_Turco1 · July 3, 2013, 9:51am

Hi Daniel,
I’m trying to do exactly the same.
I’d like to detect \n and \t (newline and tab character).
Following your suggestion, I typed those characters directly into the grammar.xml.
My pattern for \n is just this:

The problem I have is that it’s detecting newLines almost anywhere in the file now.
I guess there must be something I’m missing, can that be done?
thanks, regards
Vincenzo

Vincenzo_Turco1 · July 3, 2013, 1:05pm

Just un update, I tried to use regexp, my pattern is like this:

MY_REGEX_HERE

I’ve tried multiple regexes like:
\n \u000A

They worked ok in both
http://regexpal.com/ and RegexPlanet: online regular expression testing for Java
However none of them worked in Language Tool.

Anyone can support on this please?
Thanks, regards
Vincenzo

dnaber · July 3, 2013, 4:57pm

I’m trying to do exactly the same.
I’d like to detect n and t (newline and tab character).

This is not possible with XML rules, you’d need to write a Java rule.
Actually just detecting single characters can be done easily in any
programming language, so I’m not sure if using LT isn’t quite overkill
here.

Regards
Daniel

–
http://www.danielnaber.de

Vincenzo_Turco1 · July 4, 2013, 3:55pm

Hi Daniel,
thanks for your help.
I’m using LT because it’s already used in my project, so it would be good to have all logic using the same technology.
I guess that also blank spaces can’t be detected as well.
I’m trying to detect leading spaces in a sentence with regex like: ^ .* or ^\s.*
but it won’t work.
Can you please confirm that blanks can’t be detected this way too?
Thanks, regards
Vincenzo

dnaber · July 4, 2013, 9:05pm

Can you please confirm that blanks can’t be detected this way too?

That’s right, no kind of whitespace can be detected by rules. There’s
one exception described here, but I don’t know if that’s enough for you:

http://wiki.languagetool.org/tips-and-tricks#toc13

Regards
Daniel

–
http://www.danielnaber.de