Back to LanguageTool Homepage - Privacy - Imprint

isCompoundWord using JLanguageTool

(Idan Morad) #1


I created the method:
public boolean isCompoundWord(String word)
using JLanguageTool. first I deactivated any rule that isn't related to spell checking, and then I try to split the word at any char and see if the one of the words isn't a spelling mistake. The running time for this method takes a lot of time of a text of 9 sentences vs. just check all the sentences together. Is there a simple way to implement this function or a way to just use the JLanguageTool English dictionary and check if a word exists?

(Daniel Naber) #2

How exactly did you implement isCompoundWord? Could you post the code?

(Idan Morad) #3
public class SpellChecker
    protected final static Logger logger = LoggerFactory.getLogger(SpellChecker.class);
    private final Language language;
    private final UrlsCleaner urlsCleaner;
    private final Names names;
    private final WordHelper wordHelper;
    private final Acronym acronym;
    private final HtmlTagsRemoval htmlTagsRemoval;
    private final List<String> preliminaryWordsToIgnore;
    private Predicate<Rule> rulePredicate;

    public SpellChecker(Language language, UrlsCleaner urlsCleaner, Names names, WordHelper wordHelper,
                        Acronym acronym, HtmlTagsRemoval htmlTagsRemoval, List<String> wordsToIgnore,
                        Predicate<Rule> rulePredicate)
        this.language = language;
        this.urlsCleaner = urlsCleaner;
        this.names = names;
        this.wordHelper = wordHelper;
        this.acronym = acronym;
        this.htmlTagsRemoval = htmlTagsRemoval;
        this.rulePredicate = rulePredicate;
        this.preliminaryWordsToIgnore = wordsToIgnore;

    public SpellChecker(Language language, UrlsCleaner urlsCleaner, Names names, WordHelper wordHelper,
                        Acronym acronym, HtmlTagsRemoval htmlTagsRemoval, List<String> wordsToIgnore)
        this(language, urlsCleaner, names, wordHelper, acronym, htmlTagsRemoval, wordsToIgnore, rule -> false);

     * Prof-reading a given text and return the mistakes found for plain text.
     * @param text          the text to check.
     * @param wordsToIgnore array of words to ignore from spelling check (usually names). can be empty.
     * @return list of words with spelling or grammar mistakes.
    public List<String> checkPlainText(String text, List<String> wordsToIgnore)
        JLanguageTool languageTool = new JLanguageTool(language);

        addAcceptedTerms(languageTool, wordsToIgnore);

        deactivateRulesByPredicate(languageTool, rulePredicate);

        List<RuleMatch> matches = new ArrayList<>();
            matches = languageTool.check(text);
        catch (IOException e)
            logger.error(MessageFormat.format("Couldn't parse text:"
                            + System.lineSeparator() + "{0}"
                            + System.lineSeparator(), text)
                    , e);

        List<String> listOfGrammarMistakes =
                .filter(match -> !(match.getRule() instanceof SpellingCheckRule))
                .map(match -> text.substring(match.getFromPos(), match.getToPos()))

        List<String> potentialSpellingMistakes =
                .filter(match -> match.getRule() instanceof SpellingCheckRule)
                .map(match -> text.substring(match.getFromPos(), match.getToPos()))


        return listOfGrammarMistakes;

    private void deactivateRulesByPredicate(JLanguageTool languageTool, Predicate<Rule> rulePredicate)

    private void addAcceptedTerms(JLanguageTool languageTool, List<String> wordsToIgnore)
        List<String> fullDictionaryToIgnore = new ArrayList<>(preliminaryWordsToIgnore);

        if (!fullDictionaryToIgnore.isEmpty())
                    .filter(rule -> rule instanceof SpellingCheckRule)
                    .forEach(rule -> ((SpellingCheckRule) rule).acceptPhrases(fullDictionaryToIgnore));

    private List<String> cleanSpellingMistakes(List<String> listOfPossibleSpellingMistakes)
                .filter(word -> !wordHelper.containNumbers(word))
                .filter(word -> !names.isNameOrPlace(word))
                .filter(word -> !acronym.isAcronym(word))
                .filter(word -> !urlsCleaner.isURL(word))
                .filter(word -> !urlsCleaner.isEmail(word))
                .filter(word -> !word.contains("."))

     * Determine if a given word is compound or not.
     * @param word the word to check if it's a compound word or not.
     * @return true if the given word is a compound word; false otherwise.
    public boolean isCompoundWord(String word)
        if (Strings.isNullOrEmpty(word))
            return false;

        String wordLower = word.toLowerCase();

        return IntStream.range(1, wordLower.length())
                .mapToObj(index -> wordLower.substring(0, index) + " " + wordLower.substring(index, wordLower.length()))
                .anyMatch(string -> checkPlainText(string, Collections.emptyList()).isEmpty());

(Daniel Naber) #4

Your checkPlainText method re-creates JLanguageTool every time, so this won't be very fast. Try creating and setting it up only once.

(Idan Morad) #5

I need it to be threadsafe, this is why JLanguageTool is being re-creates every time.

(Idan Morad) #6

Is there a workaround for my problem?