Back to LanguageTool Homepage - Privacy - Imprint

Compound nouns in German


(Michael) #1

Hi,

I'm using LanguageTool with the JavaAPI and want to check German text. I create JLanguageTool with:

langTool = new JLanguageTool(new GermanyGerman());
langTool.activateDefaultPatternRules();

Unfortunately, I get all compound nouns as an error, for example "Handbuch" or "Funktionsweise". Hunspell should be able to cope with such nouns. So, what do I have to do, to activate it?

Best regards,
Michael


(Daniel Naber) #2

Can you reproduce that with a minimal example? Because this works for
me, its returns one error:

public static void main(String[] args) throws IOException {
JLanguageTool languageTool = new JLanguageTool(new GermanyGerman());
languageTool.activateDefaultPatternRules();
List result = languageTool.check("Das Handbuch und die
Funktionsweise fdasdasads");
System.out.println(result);
}


(Michael) #3

Thank you for the code. I found my error. Your example worked exactly as it should. But in my code, I've tried the trick with a new MorfologikSpellerRule in order to use an "ignore words" list for English also in German. This seems to interfere with the standard behavior. When I remove it, the German language check works fine.
So, thank you very much for the quick answer.

Nevertheless, this leads to another question :wink: How can I add a list with words to ignore to the German JLanguageTool?
And one more, if you don't mind: I check DocBook XML files. At the moment I use Tika to extract the plain text from the XML-files. But sometimes it would be nice to tell LanguageTool to ignore, for example, text that is contained in a code-tag. So, is there a was to use and configure XML-files directly in LanguageTool??


(Daniel Naber) #4

To answer the second question first: The upcoming release has a feature
for that, quoting from the changelog:

-A new method JLanguageTool.check(AnnotatedText) has been introduced
that allows
you to check text with markup. Use AnnotatedTextBuilder to build up
the input.

This is already available in the snapshot releases and we're glad if
people test it (only a few days are left to the release).


(Daniel Naber) #5

Nevertheless, this leads to another question :wink: How can I add a list
with words to ignore to the German JLanguageTool?

I haven't tried it, but you should overwrite GermanSpellerRule, not
MorfologikSpellerRule. The method you will need to implement is
ignoreToken().

Regards
Daniel


(JensP) #6

I can confirm, I have overwritten the GermanSpellerRule and I just changed the output of getId with my new name. Nothing more, afterwards I could use addIgnoreTokens to add word to the dictionary.