Back to LanguageTool Homepage - Privacy - Imprint

Xml pattern across sentences

Would it be possible to make it possible to check around detected sentence endings in xml?
Vrij. 21 januari
creates a sentence end after vrij because of the srx. Which is okay, since vrij is a normal word, and not an abbreviation.
If it were possible to check for
<token>vrij</token><token postag="SENT_END">.</token><token regexp="yes">[0-9]{1,2}</token>
it would be possible to warn that there is no need for the . after vrij.

I think this would be quite some work and maybe cause many internal changes, so I don’t think it’s worth it yet.

That is a pity. I will put it on the whish list…

There’s text-level rules that allow to check things across sentences but I think they are only available in Java (not xml).
Alternatively you could prevent such cases to be split into sentences via srx, e.g. in Ukrainian there’s often a mistake of putting period after “млн (million)” so if there’s a digit before and after “млн.” we make exception and don’t break sentence and then you can easily catch it in the rule.

That would mean making exceptions in the (already hardly understandable) srx as well as adding it to a rule; two changes for 1 error. Feels not very easy to maintain.

I will just leave it for now.