Back to LanguageTool Homepage - Privacy - Imprint

N-Gram question

(helz) #1


I'm using my own server with the n-gram data. I downloaded this archive:

When i use command like this:

java -cp languagetool.jar /media/8E9ED52D9ED50E99/ngram-ru-20150914/ru "some_wrong_word"

it returns 0 occurrences and that's correct. But when i run server like this:

java -cp languagetool-server.jar org.languagetool.server.HTTPServer --config --port 8082 --allow-origin '*'

and checking same "wrong_word" there is no error. LT just returns that everything is ok.

Do you have any idea why it can happen?

I'm using latest version on LT
And here is my "" settings:



(Daniel Naber) #2

The ngrams only work in context, they are not related to all the other checks (like spell check). So if "wrongword" is accepted that's because the spellchecker has it in its dictionary. If you think the word should not be in the dictionary, please open a bug report at

(helz) #3

Yeah, you're right. Sorry for the confusion. My "wrongword" is not actually wrong. It may be wrong based on a word before it.

So, here is how i check it:

java -cp languagetool.jar /media/8E9ED52D9ED50E99/ngram-ru-20150914/ru "context_word_1 verifiable_word" - returns n occurrences and that's right

java -cp languagetool.jar /media/8E9ED52D9ED50E99/ngram-ru-20150914/ru "context_word_2 verifiable_word" - returns 0 occurrences and that's right too, bacause in this case it means that there is no such combination of words

I also checked both combinations here:
And didn't get correct result. But i'm sure there is error in one of these cases

(Daniel Naber) #4

The Russian confusion rule so far only has two pairs that are checked (не/ни and шасси/шоссе), so if your word isn't one of those, the ngram rule is active at all for your case. (We only check specific hand-chosen pairs to avoid getting too many false alarms.)

(helz) #5

Got it. Thanks!