Operating LanguageTool: Resources and Configuration

HannaP · October 28, 2025, 11:12am

Hey Community!

This is a follow-up to this topic: Fragen zur Optimierung des LanguageTool

We have forked languageTool and build our own image that we run it in our kubernetes cluster. We face issues with the memory consumption that we were not able to fix although we made quite a few experiments.

The current setup

we use the annotation HTTP API with possibly long texts in each call (can be 50kB, 100.000 characters total, 75.000 text, 25.000 annotations)
this is the Dockerfile: neuris-languagetool/Dockerfile.neuris at 5d84db583430b9a443e797efbeda05524d1265fa · digitalservicebund/neuris-languagetool · GitHub
we don’t use fasttext as we only process german text
this is the server.properties: neuris-languagetool/server.properties at 5d84db583430b9a443e797efbeda05524d1265fa · digitalservicebund/neuris-languagetool · GitHub
we have disabled suggestions as we don’t need them
this is the deployment configuration in kubernetes (3,5GB request, 4GB limit, 2 CPU request, 4 CPU limit):

resources:
  requests:
    cpu: 2
    memory: 3584Mi
  limits:
    cpu: 4
    memory: 4096Mi
    ephemeral-storage: 1Gi

Observation

What we observe is that languageTool gets OOM killed by the kernel (meaning that it exceeds the 4GB limit) after getting about 15 requests shortly after one another (each approx. 50kB, 100.000 characters total, 75.000 text, 25.000 annotations, 80 matches). Apart from this, there was no other load.

What we tried

we removed fasttext to make sure it doesn’t not add to the memory - no change
we tried to set the Xmx of the java process to 2GB or 3GB - no change
we tried to set the maxCheckThreads to 4 (equal to the max number of CPUs) and the maxWorkQueueSize size to 50 - no change

To me, it seems suspicious that this amout of requests exceed 4GB of RAM.

What we considered trying

giving it more memory - we try to avoid that
change or configure the garbage collector, e.g. setting the ratio of younf and old items with XX:NewRatio or trying ZGC or Shenandoah
configure metaspace e.g. -XX:MaxMetaspaceSize=256m
splitting the requests to LanguageTool to achieve a smaller size
up to now, I haven’t tried the combination of maxCheckThreads and setting Xmx 2GB/3GB but I expect no substancial change

What do you suggest? Is there anything else you would try?