Back to LanguageTool Homepage - Privacy - Imprint

XML-Rule with regex: Some cases are found, others aren't


(tjf) #1

Hello again,

unfortunately, I have got another question. Maybe someone can help.

I want to create an XML rule that works if the following case is not given: "Cassel-Caspers"

Therefore I created the following term: Cassel[^-]Caspers

It works for many characters (e.g. "Cassel+Caspers", "Cassel_Caspers", "Cassel*Caspers", ...) but for others it does not (e.g. "Cassel.Caspers", "Cassel/Caspers", ...). Does anybody know how to realize a term that hits all characters (exclusive "-")?

Thanks!

Till


(Daniel Naber) #2

Am 21.05.2013 20:39, schrieb tjf [via LanguageTool User Forum]:

Therefore I created the following term: regexp="yes">Cassel[^-]Caspers

The text is tokenized before it is matched against our rules. Thus to
match Cassel/Caspers you'd need to use this:

Cassel
/
Caspers

Cassel.Caspers is yet another case: the dot marks a sentence boundary,
and our XML rules never span those boundaries. Thus this cannot be
matched with XML rules (at least not without changing the way sentence
boundaries are detected).

To see how text is tokenized, you could use Ctrl-T in the stand-alone
client. This will assign each token its part-of-speech tag (or null, if
unknown).

Regards
Daniel

--
http://www.danielnaber.de


(tjf) #3

Thanks again, dear Daniel!