XML-Rule with regex: Some cases are found, others aren't

tjf · May 21, 2013, 6:39pm

Hello again,

unfortunately, I have got another question. Maybe someone can help.

I want to create an XML rule that works if the following case is not given: “Cassel-Caspers”

Therefore I created the following term: Cassel[^-]Caspers

It works for many characters (e.g. “Cassel+Caspers”, “Cassel_Caspers”, “Cassel*Caspers”, …) but for others it does not (e.g. “Cassel.Caspers”, “Cassel/Caspers”, …). Does anybody know how to realize a term that hits all characters (exclusive “-”)?

Thanks!

Till

dnaber · May 21, 2013, 8:36pm

Am 21.05.2013 20:39, schrieb tjf [via LanguageTool User Forum]:

Therefore I created the following term: Cassel[^-]Caspers

The text is tokenized before it is matched against our rules. Thus to
match Cassel/Caspers you’d need to use this:

Cassel
/
Caspers

Cassel.Caspers is yet another case: the dot marks a sentence boundary,
and our XML rules never span those boundaries. Thus this cannot be
matched with XML rules (at least not without changing the way sentence
boundaries are detected).

To see how text is tokenized, you could use Ctrl-T in the stand-alone
client. This will assign each token its part-of-speech tag (or null, if
unknown).

Regards
Daniel

–
http://www.danielnaber.de

tjf · May 24, 2013, 8:47pm

Thanks again, dear Daniel!