Back to LanguageTool Homepage - Privacy - Imprint

Is LanguageTool suitable for my Usecase?

Hi All,

Just started playing with LanguageTool as part of a home project for the kids and was wondering if one of you would be able to help with a few questions. If I’ve missed some documentation somewhere please just re-point me…

Use Case: Ideally would like to submit a (very large) number of ‘words’ to LanguageTool (most of which are misspelt / not words) and to be returned with just the correctly spelt words. I thought I’d be able to adapt langTool.check(wordList) to my needs but it’s not working as I’d expect (if it called out every ‘word’ that isn’t actually a word in the English language then I could wrap logic around this to make it work)

Using the example code I submit:-

Word List: "OWL PIG DFH BAB COD CID TIN TIM ABC DEF GHI JKL"

Example code for reference:-

try {
    JLanguageTool langTool = new JLanguageTool(new BritishEnglish());
        for (Rule rule : langTool.getAllRules()) {
            if (!rule.isDictionaryBasedSpellingRule()) {
                langTool.disableRule(rule.getId());
            }
        }
        List<RuleMatch> matches;
        **matches = langTool.check(wordList);**
        for (RuleMatch match : matches) {
            logger.debug("Potential typo at characters " +
              match.getFromPos() + "-" + match.getToPos() + ": " +
              match.getMessage());
              match.getSuggestedReplacements());
     }

The output is:-

Potential typo at characters 8-11: Possible spelling mistake found.
Potential typo at characters 12-15: Possible spelling mistake found.
Potential typo at characters 44-47: Possible spelling mistake found.

So whilst DFH, BAB & JKL are called out as possible spelling mistakes, it allows through CID TIM ABC DEF GHI - Is this because these words are so ‘badly spelt’ that it has no correction? Is there another rule I could use?

Also, can anyone give me a rough idea of the performance limitations? Running on reasonably speced Windows machine? What is a sensible limit to the number of words I can supply in one go? My Use case calls for several thousand which seems to be problematic…

And finally, if anyone knows of a better (free) solution or API that would be better placed to meet my requirements could you get me know?

Thanks very much in advance
Steve

You can use http://app.aspell.net/lookup to see if a word is in the dictionary. It’s not exactly the same dataset as the one used by LT, but close. I think many three-character words with all-uppercase letters will be accepted, as these are acronyms or uppercased variants of common words.