LT OSS serving very slow requests when deployed to Kubernetes

hmans · January 8, 2025, 1:51pm

Hey everyone,

first, some context:

I’m deploying the standalone OSS version of LT to one of our Kubernetes clusters. I’ve tried a few of the various community-provided Docker images available, and they all exhibit the same issue:

When I run them on my local machine (Apple M2 Pro with 32 GB of RAM), LT is pretty fast (typically responding within a second.) When I start the same image inside our cluster (our nodes are comparatively modest 4 CPU/16 GB machines), it’s much slower, with the fastest (!) requests taking around 2s, and many others taking much longer (anything from 5s to 30s.)

I’m pretty confident that LT doesn’t have an opinion on Kubernetes as such. I’m also aware that the machines our cluster runs on are less capable than my local dev machine. However, I wasn’t expecting a difference this drastic, so I’m trying to find out what’s the issue here.

First and foremost, I’d love a pointer regarding LT’s scaling characteristics. I’m assuming it doesn’t need GPU cores (does it?), but is it CPU- or memory-bound? For memory, I’m currently launching LT with -Xmx800M. Should this be significantly higher?

If it’s CPU-bound, what kind of server spec would you recommend?

Since these are going to come up, let me answer them straight away:

I haven’t made sure that fasttext is added to the image. The documentation sounds like this is only needed for language detection, but we’re always setting the language parameter.
I also haven’t made an effort to add ngram data. As far as I understand, these are only required for some additional checks? I would still want to add them eventually, but I need to get the tool itself working reliably first.

If either of these was a mistake and possibly the reason for the slow performance, please let me know. Other than that, any input would be highly appreciated. Thanks!

Hendrik

hmans · January 8, 2025, 2:18pm

I’ve made some corrections to my initial post to reflect some new findings no my end. I’m not sure how, but I have no longer been able to reproduce the very fast response times of api.languagetool.org, the Homebrew-installable version of languagetool, or the community Docker images that I was seeing at the end of last year. It’s possible that these were a byproduct of me not setting the correct language parameter.

With language set to do de-DE, requests to api.languagetool.org seem to be typically processed in around 1s, with the version deployed to our cluster taking around 1.5s-2.5s per request. It’s still a noticeable difference, but at least within the same order of magnitude.

Can anyone confirm that these are the expected performance characteristics of LT? Additionally, any hints on further scaling this (more CPU? more RAM?) would still be highly appreciated.

dnaber · January 8, 2025, 3:09pm

Performance depends on text length and whether the text or parts of it have been cached (see the cache properties that java -jar languagetool-server.jar --help will show). Also, if the combination of language and other HTTP parameters (like disabled rules) hasn’t been used recently, this will cause LT to initialize, and this can take some time.

Memory usage depends on the number of languages being active, but I’d recommend using much more than 800M if you use several languages (more like 8GB or more).

hmans · January 8, 2025, 3:17pm

Thanks Daniel, it’s good to know that I’m not fundamentally doing anything wrong.

We’ve only ever been using de-DE, and the texts we’ve been passing to the API for testing have been short sentences of up to 10 words. Just for some added context.

I’ll configure the pods to be scheduled on nodes with 4+ GB of available RAM and configure the JVM accordingly. I’m hoping this is enough when we’re using de-DE only.

Thanks for the pointers!