German dictionary now in a separate project

I’ve moved the German part-of-speech data to its own Maven and git project at GitHub - languagetool-org/german-pos-dict: German part-of-speech dictionary. This way we don’t have to keep the binary files in the LT git repo. Also see #200. This new external project is thus now a dependency for LT.

If you want to do the same for “your” language, feel free. It does come with some cost, though: you’ll need to make your own releases and LT can only rely on releases of your artifact, not on snapshots. (Actually, you can use snapshots, but then need to change to a release version before we can make a LT release.)

Interesting. How does it work for users? If they checkout LanguageTool from git, do they automatically get the binary dictionaries from an external git repo? Or are there extra steps requires to get the dictionaries in those external repositories?

I could try doing the same for French and Breton, but I don’t know what are all the steps involved yet.

It’s an external dependency like many others, Maven will automatically download it from the central Maven repo.

Does that mean we have to worry less about the repo size and can update the binary dictionaries more frequently if necessary?

Yes, but adding words to added.txt (still in the same project as before) should still be preferred. I’ll update the binaries from time to time, also because we get updates from korrekturen.de/flexion, i.e. Julian von Heyl from korrekturen.de is improving and extending the part-of-speech data and we can use it.

This morning I have just cloned the language tool project.
When I run the ./testrules.sh de, I get the error message:
Running XML validation for de/grammar.xml…
Running pattern rule tests for German… Exception in thread “main” java.lang.RuntimeException: Path /de/german.dict not found in class path at /org/languagetool/resource/de/german.dict
at org.languagetool.databroker.DefaultResourceDataBroker.assertNotNull(DefaultResourceDataBroker.java:203)
at org.languagetool.databroker.DefaultResourceDataBroker.getFromResourceDirAsUrl(DefaultResourceDataBroker.java:150)
at org.languagetool.tagging.BaseTagger.(BaseTagger.java:83)
at org.languagetool.tagging.BaseTagger.(BaseTagger.java:69)
at org.languagetool.tagging.de.GermanTagger.(GermanTagger.java:46)
at org.languagetool.language.German.getTagger(German.java:125)
at org.languagetool.rules.de.CaseRule.(CaseRule.java:527)
at org.languagetool.language.German.getRelevantRules(German.java:169)
at org.languagetool.JLanguageTool.getAllBuiltinRules(JLanguageTool.java:235)
at org.languagetool.JLanguageTool.(JLanguageTool.java:166)
at org.languagetool.MultiThreadedJLanguageTool.(MultiThreadedJLanguageTool.java:76)
at org.languagetool.MultiThreadedJLanguageTool.(MultiThreadedJLanguageTool.java:67)
at org.languagetool.MultiThreadedJLanguageTool.(MultiThreadedJLanguageTool.java:51)
at org.languagetool.rules.patterns.PatternRuleTest.runTestForLanguage(PatternRuleTest.java:156)
at org.languagetool.rules.patterns.PatternRuleTest.runGrammarRulesFromXmlTestIgnoringLanguages(PatternRuleTest.java:149)
at org.languagetool.rules.patterns.PatternRuleTest.main(PatternRuleTest.java:570)
Running disambiguator rule tests…
Running disambiguation tests for German…
Exception in thread “main” java.lang.RuntimeException: Path /de/german.dict not found in class path at /org/languagetool/resource/de/german.dict
at org.languagetool.databroker.DefaultResourceDataBroker.assertNotNull(DefaultResourceDataBroker.java:203)
at org.languagetool.databroker.DefaultResourceDataBroker.getFromResourceDirAsUrl(DefaultResourceDataBroker.java:150)
at org.languagetool.tagging.BaseTagger.(BaseTagger.java:83)
at org.languagetool.tagging.BaseTagger.(BaseTagger.java:69)
at org.languagetool.tagging.de.GermanTagger.(GermanTagger.java:46)
at org.languagetool.language.German.getTagger(German.java:125)
at org.languagetool.rules.de.CaseRule.(CaseRule.java:527)
at org.languagetool.language.German.getRelevantRules(German.java:169)
at org.languagetool.JLanguageTool.getAllBuiltinRules(JLanguageTool.java:235)
at org.languagetool.JLanguageTool.(JLanguageTool.java:166)
at org.languagetool.JLanguageTool.(JLanguageTool.java:151)
at org.languagetool.tagging.disambiguation.rules.DisambiguationRuleTest.testDisambiguationRulesFromXML(DisambiguationRuleTest.java:60)
at org.languagetool.tagging.disambiguation.rules.DisambiguationRuleTest.main(DisambiguationRuleTest.java:230)
Running XML bitext pattern tests…

I think the german.dict is missing or there is a wrong path in the shell-script

You’re right, please add libs/german-pos-dict.jar to the CPATH variable in the script, or simply update from git, it should be fixed now.