I’ve been testing this functionality for a few weeks now and I feel it could be quite useful.
A few suggestions would be to give the user the ability to configure the attributes MIN_SCORE_DIFF and MIN_ALTERNATIVE_SCORE of the ConfusionProbabilityRule rule via the command line interface.
Also with the test for finding alternative suggestions, see function getBetterAlternativeOrNull of ConfusionProbabilityRule (Note I’ve been looking at version 2.8/2.9 code)
There’s this if condition…
if (alternativeScore >= bestScore + MIN_SCORE_DIFF && alternativeScore >= MIN_ALTERNATIVE_SCORE)
I found that if bestScore has a small value, i.e. the current word has a very low prob, and the alternativeScore never exceeded MIN_ALTERNATIVE_SCORE you wouldn’t get any alternative suggestions.
In my hacked version I’ve changed the line to this, which seems to help in this rare situation.
if (alternativeScore >= bestScore + MIN_SCORE_DIFF && (alternativeScore >= MIN_ALTERNATIVE_SCORE || bestScore<0.01)
Note I’ve been testing this rule with possible common typos.
i.e. mistyping confirmation as conformation which is because the I and O letters are next to each other on the keyboard. And this is something that a spell-checker would miss.
I’ve been working on a way to use a remote NGram database and have written my own class ConfusionProbabilityRemoteRule, which I’m using with the standalone interface. However, the issue is that the “score” function can be quite slow in this case. So for this option to be effective I really only want to call the confusion prob rule for changed text which is within the NGram range.
Using a remote NGram database is probably something you wouldn’t use on the server version of LanguageTool.
It’s work in progress and I may post more details in future if I refine it further.
Anyway, I’m looking forward to see how this functionality develops in future releases.