Hi,
I’ve noticed that LT automatically understands compound words (I hope this is the right term) that consist of two dictionary words. For example, by default LT doesn’t understand “PNG-Datei”:
www0.iserv.eu ~/LanguageTool-3.5 # echo "PNG-Datei" | java -jar languagetool-commandline.jar -l de-DE
Expected text language: German (Germany)
Working on STDIN...
1.) Line 1, column 1, Rule ID: GERMAN_SPELLER_RULE
Message: Möglicher Rechtschreibfehler gefunden
Suggestion: Pol-Datei
PNG-Datei
^^^^^^^^^
Time: 462ms for 1 sentences (2.2 sentences/sec)
But this is easily fixable by adding “PNG” to hunspell/spelling.txt
:
www0.iserv.eu ~/LanguageTool-3.5 # tail -n1 org/languagetool/resource/de/hunspell/spelling.txt
PNG
www0.iserv.eu ~/LanguageTool-3.5 # echo "PNG-Datei" | java -jar languagetool-commandline.jar -l de-DE
Expected text language: German (Germany)
Working on STDIN...
Time: 347ms for 1 sentences (2.9 sentences/sec)
So it understands that “PNG-Datei” is “PNG” combined with “Datei”, and because these two are now valid words, “PNG-Datei” is also valid. Unfortunately hunspell/spelling.txt
doesn’t allow for mulitple words, e.g. I can’t add “Portable Network Graphics” to the list; therefor I use a rule in the disambiguation.xml
:
<rule name="Portable Network Graphics" id="PORTABLE_NETWORK_GRAPHICS">
<pattern>
<token>Portable</token>
<token>Network</token>
<token>Graphics</token>
</pattern>
<disambig action="ignore_spelling"/>
</rule>
This works on its own:
www0.iserv.eu ~/LanguageTool-3.5 # echo "Portable Network Graphics" | java -jar languagetool-commandline.jar -l de-DE
Expected text language: German (Germany)
Working on STDIN...
Time: 352ms for 1 sentences (2.8 sentences/sec)
But it no longer works when I use that in a compound word:
www0.iserv.eu ~/LanguageTool-3.5 # echo "Portable Network Graphics-Datei" | java -jar languagetool-commandline.jar -l de-DE
Expected text language: German (Germany)
Working on STDIN...
1.) Line 1, column 10, Rule ID: GERMAN_SPELLER_RULE
Message: Möglicher Rechtschreibfehler gefunden
Suggestion: Netbook; Neuwerk
Portable Network Graphics-Datei
^^^^^^^
2.) Line 1, column 18, Rule ID: GERMAN_SPELLER_RULE
Message: Möglicher Rechtschreibfehler gefunden
Suggestion: Graphits-Datei; Graphems-Datei; Graphik-Datei; Graphisch-Datei; Graphit-Datei; Graphite-Datei; Gryphius-Datei
Portable Network Graphics-Datei
^^^^^^^^^^^^^^
Time: 456ms for 1 sentences (2.2 sentences/sec)
Is there a way to adapt my disambiguation rule to fix that?