Best way to apply suggestions/corrections automatically

dnaber · January 21, 2018, 2:53pm

That’s what I would try. You can use org.languagetool.dev.bigdata.NGramLookup to compare sequence probabilities (that class is not part of a JAR, you need to check out the code). das schwarze Hemd is (slightly) more probable than das schwarze Amt according to that class. The complete sequence “das schwarze Hemd” has no occurrences in our n-gram data, though, probably because the Google ngram data we use has a minimum occurrence value of 40.

Adding your own ngrams would be even better, if you have enough data and the quality isn’t too bad.