Back to LanguageTool Homepage - Privacy - Imprint

Performance tuning

Hello,

I am new to LanguageTool. I’d like to learn more about how to improve the performance of LanguageTool. Any advice will be appreciate.
Here is my code structure:

private static Map<String, Language> availableLanguages = new a map;
for (Text text : article) {
  check if avaliableLanguages contains text.language.
  get Language from avaliableLanguages.get(text.language)
  new JLanguageTool()
  languageTool.check()
}

Will MultiThreadedJLanguageTool help? Should I initialize Language object on the fly instead of creating the map for all available languages?
Thanks in advance!

I suggest you use profiling in e.g. jvisualvm, it will tell you where the time is spent in your program.

Hi Daniel,
I did profiling my program and found the languageTool checking part took the most time. For pt-BR, new JLanguageTool() took 2.5s on average and languageTool.check() took 15s on average. Do you have any advice on improving the performance?
Thanks!

How long is the text you check? 15 seconds would only be expected if it’s quite long (several KB at least) or if it has many spelling errors (generating the suggestions for misspelled words can be quite slow).

It has ~1k words. The latency is unstable. The first time it took ~15s but later it took ~2s. The initialization took from 2s to 5s.
Here is my code:

for (Text text : texts) {
  resultBuilder.add(
      // executor will run the lambda function in different thread.
      executor.submit(
          () -> {
            String language = text.getLanguageCode();
            try {
              JLanguageTool languageTool = new JLanguageTool(AVAILABLE_LANGUAGES.get(language));
              List<RuleMatch> ruleMatches = languageTool.check(text.getText());
              return generateResult(ruleMatches);
            } catch (Throwable e) {
              logger.atWarning().withCause(e).log(
                  "LanguageTool failed for text: [%s]", text);
              return null;
            }
          }));
}

If the language for all text is the same, should I create one JLanguageTool for all threads? Do you have any suggestions to improve?

I also tried to use MultiThreadedJLanguageTool:

for (Text text : texts) {
  resultBuilder.add(
      executor.submit(
          () -> {
            String language = text.getLanguageCode();
            try {
              MultiThreadedJLanguageTool languageTool = new MultiThreadedJLanguageTool(AVAILABLE_LANGUAGES.get(language));
              List<RuleMatch> ruleMatches = languageTool.check(text.getText());
              return generateResult(ruleMatches);
            } catch (Throwable e) {
              logger.atWarning().withCause(e).log(
                  "LanguageTool failed for text: [%s]", text);
              return null;
            }
          }));
}

but I got the following error:

java.lang.RuntimeException: java.lang.InterruptedException
	at org.languagetool.MultiThreadedJLanguageTool.performCheck(MultiThreadedJLanguageTool.java:195)
	at org.languagetool.JLanguageTool.check(JLanguageTool.java:580)
	at org.languagetool.JLanguageTool.check(JLanguageTool.java:555)
	at org.languagetool.JLanguageTool.check(JLanguageTool.java:512)
	at org.languagetool.JLanguageTool.check(JLanguageTool.java:495)

Not sure what’s the issue here. Seems those threads are interrupting each other? but why? They didn’t share anything, right?

It’s expected that the first check is slower. You should create a new JLanguageTool for each thread. Creating JLanguageTool is fast, creating the language can be slow.

Could you post the complete stack trace?

Sorry the stack trace is truncated:

java.lang.RuntimeException: java.lang.InterruptedException
	at org.languagetool.MultiThreadedJLanguageTool.performCheck(MultiThreadedJLanguageTool.java:195)
	at org.languagetool.JLanguageTool.check(JLanguageTool.java:580)
	at org.languagetool.JLanguageTool.check(JLanguageTool.java:555)
	at org.languagetool.JLanguageTool.check(JLanguageTool.java:512)
	at org.languagetool.JLanguageTool.check(JLanguageTool.java:495)
	at internal code...
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at internal code...
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.InterruptedException
	at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:404)
	at java.util.concurrent.FutureTask.get(FutureTask.java:191)
	at java.util.concurrent.AbstractExecutorService.invokeAll(AbstractExecutorService.java:244)
	at org.languagetool.MultiThreadedJLanguageTool.performCheck(MultiThreadedJLanguageTool.java:190)
	... 12 more

The language is declared as a static variable.

private static final Map<String, Language> AVAILABLE_LANGUAGES = Map.of("pt-BR", new BrazilianPortuguese());

It should be created when the class is loaded. So I think creating JLanguageTool contributed the most latency.

I’m not sure about the stacktrace. Maybe your executor has a timeout that causes this?

About the performance: do you still consider this an issue? It will be tricky to debug here, I think one would need to see the whole code and profile it, preferably in a production-like setting.