Back to LanguageTool Homepage - Privacy - Imprint

SRX rule for "FRITZ!Box"


(Martin von Wittich) #1

Hi,

I'm currently trying to spellcheck our documentation, and I noticed that LanguageTool doesn't like "FRITZ!Box" (the name of a common German DSL router):

www0.iserv.eu ~ # echo 'FRITZ!Box' | java -jar /root/LanguageTool-3.5/languagetool-commandline.jar -l de-DE
Expected text language: German (Germany)
Working on STDIN...
1.) Line 1, column 7, Rule ID: DE_SENTENCE_WHITESPACE
Message: Fügen Sie zwischen Sätzen ein Leerzeichen ein
Suggestion: Box
FRITZ!Box
^^^
Time: 655ms for 2 sentences (3.1 sentences/sec)

That wasn't as easy to fix as the other things I've stumbled over because LanguageTool is actually splitting the string into two sentences:

www0.iserv.eu ~ # echo 'FRITZ!Box' | java -jar /root/LanguageTool-3.5/languagetool-commandline.jar -l de-DE -t
Expected text language: German (Germany)
Working on STDIN...
<S> FRITZ[FRITZ/null,O]![</S>!/PKT,O]
<S> Box[Box/SUB:AKK:SIN:FEM,Box/SUB:DAT:SIN:FEM,Box/SUB:GEN:SIN:FEM,Box/SUB:NOM:SIN:FEM,</S>,B-NP|NPS]

Fortunately I had just learned from Daniel's commit in response to another one of my bug reports that a file called segment.srx is reponsible for splitting sentences, so I extracted it from languagetool-core.jar and I was able to figure out a rule that solves this:

<rule break="no">
<beforebreak>(?i)FRITZ!</beforebreak>
<afterbreak>(?i)Box</afterbreak>
</rule>


(Daniel Naber) #2

Thanks, I've added this to our segment.srx.