I’ve changed the code a bit to make it easier, but it’s still not trivial. So this is how the JAR needed can be built with a developer set-up, but I will later today sent you a JAR directly:
mvn clean compile assembly:single
target there’s a file called
languagetool-dev-3.6-SNAPSHOT-jar-with-dependencies.jar which can be run like this:
java -cp languagetool-dev-3.6-SNAPSHOT-jar-with-dependencies.jar org.languagetool.dev.bigdata.ConfusionRuleEvaluator
It will print the exact usage. To work, it requires not only the pair of words to be checked (like “their” and “there”) but also a plain text file with example sentences that contain these words. I get these examples by running the Unix command
grep on a list of sentences extracted from Wikipedia and tatoeba. You can also specify a Wikipedia XML, but then the whole XML will need to be scanned for example sentences and everything will become much slower.
So in a nutshell, if you just have a few words you can also send them to me and I’ll run this process.