Hi All,
Just started playing with LanguageTool as part of a home project for the kids and was wondering if one of you would be able to help with a few questions. If I’ve missed some documentation somewhere please just re-point me…
Use Case: Ideally would like to submit a (very large) number of ‘words’ to LanguageTool (most of which are misspelt / not words) and to be returned with just the correctly spelt words. I thought I’d be able to adapt langTool.check(wordList) to my needs but it’s not working as I’d expect (if it called out every ‘word’ that isn’t actually a word in the English language then I could wrap logic around this to make it work)
Using the example code I submit:-
Word List: "OWL PIG DFH BAB COD CID TIN TIM ABC DEF GHI JKL"
Example code for reference:-
try {
JLanguageTool langTool = new JLanguageTool(new BritishEnglish());
for (Rule rule : langTool.getAllRules()) {
if (!rule.isDictionaryBasedSpellingRule()) {
langTool.disableRule(rule.getId());
}
}
List<RuleMatch> matches;
**matches = langTool.check(wordList);**
for (RuleMatch match : matches) {
logger.debug("Potential typo at characters " +
match.getFromPos() + "-" + match.getToPos() + ": " +
match.getMessage());
match.getSuggestedReplacements());
}
The output is:-
Potential typo at characters 8-11: Possible spelling mistake found.
Potential typo at characters 12-15: Possible spelling mistake found.
Potential typo at characters 44-47: Possible spelling mistake found.
So whilst DFH, BAB & JKL are called out as possible spelling mistakes, it allows through CID TIM ABC DEF GHI - Is this because these words are so ‘badly spelt’ that it has no correction? Is there another rule I could use?
Also, can anyone give me a rough idea of the performance limitations? Running on reasonably speced Windows machine? What is a sensible limit to the number of words I can supply in one go? My Use case calls for several thousand which seems to be problematic…
And finally, if anyone knows of a better (free) solution or API that would be better placed to meet my requirements could you get me know?
Thanks very much in advance
Steve