Back to LanguageTool Homepage - Privacy - Imprint

Searching for specific unicode characters


(SafeTex) #1

Hello

I'd like to search for specific Unicode characters

So far, I have this


\P{u00AB}

where U00AB = «

But it doesn't work and I just can't spend more time on LT at the moment cos of work.

Can anyone help me with my code ?

Thanks


(Daniel Naber) #2

On Sa 01.12.2012, 12:07:37 you wrote:


\P{u00AB}

where U00AB = «

You don't need to escape characters, as the grammar.xml is UTF-8 and can
contain the characters directly:


«

--
http://www.danielnaber.de


(Vincenzo Turco) #3

Hi Daniel,
I'm trying to do exactly the same.
I'd like to detect \n and \t (newline and tab character).
Following your suggestion, I typed those characters directly into the grammar.xml.
My pattern for \n is just this:



The problem I have is that it's detecting newLines almost anywhere in the file now.
I guess there must be something I'm missing, can that be done?
thanks, regards
Vincenzo


(Vincenzo Turco) #4

Just un update, I tried to use regexp, my pattern is like this:

MY_REGEX_HERE

I've tried multiple regexes like:
\n \u000A

They worked ok in both
http://regexpal.com/ and http://www.regexplanet.com/advanced/java/index.html
However none of them worked in Language Tool.

Anyone can support on this please?
Thanks, regards
Vincenzo


(Daniel Naber) #5

I'm trying to do exactly the same.
I'd like to detect n and t (newline and tab character).

This is not possible with XML rules, you'd need to write a Java rule.
Actually just detecting single characters can be done easily in any
programming language, so I'm not sure if using LT isn't quite overkill
here.

Regards
Daniel

--
http://www.danielnaber.de


(Vincenzo Turco) #6

Hi Daniel,
thanks for your help.
I'm using LT because it's already used in my project, so it would be good to have all logic using the same technology.
I guess that also blank spaces can't be detected as well.
I'm trying to detect leading spaces in a sentence with regex like: ^ .* or ^\s.*
but it won't work.
Can you please confirm that blanks can't be detected this way too?
Thanks, regards
Vincenzo


(Daniel Naber) #7

Can you please confirm that blanks can't be detected this way too?

That's right, no kind of whitespace can be detected by rules. There's
one exception described here, but I don't know if that's enough for you:

http://wiki.languagetool.org/tips-and-tricks#toc13

Regards
Daniel

--
http://www.danielnaber.de