So, here’s a little update:
First of all, the demo page supports Portuguese now and the list of supported confusion pairs has been extended. (NB: The rules for Portuguese are not calibrated, so there can be lots of false alarms.)
It is no longer necessary to create a Java file for each confusion pair, but the rules are generated dynamically from a
neuralnetwork/confusion_sets.txt file which has the same format as the ngram
confusion_sets.txt file. This makes adding new rules much easier. Furthermore, the word2vec language models are no longer part of the LanguageTool zip, but are loaded from a folder given in the configuration (just like you can load ngram data from a directory).