Use a modern framework for embedded HTTP server discussion

oserikov · February 28, 2018, 3:30pm

Continuing the discussion from Spellchecker improvement discussion.

Are the requests and their execution time collected somewhere (in server logs maybe)?
That could help to reproduce the peak load moments to explore the behavior in detail.
Also there’s an idea to learn to predict the request execution time (hello, ML) and then to optimize the requests execution order.

dnaber · February 28, 2018, 4:29pm

Yes, input length, language, and execution time are logged.

Maybe. I’m not sure if it helps, we don’t have hundreds of request per second per machine.

oserikov · February 28, 2018, 6:36pm

So there are several machines and each one receives about less than hundred requests per second and the queries still being rejected sometimes?

oserikov · February 28, 2018, 7:25pm

I think that dealing with the overload is the case appearing only on languagetool.org’s server-side and is not needed for the consumers downloading the LT-server for their local use. I suggest to remove the requests queuing from the languagetool-server code and to implrment it separately only on the LT’s deployment via proxy-tool (nginx or haproxy or smart handcoded thing). The proxy also could deal with http/https resolution, ddos, balancing between nodes etc.

That will remove most low-level code from the languagetool-server and will keep only the api part if we use springrest or springboot.

What do you think about that approach?

dnaber · February 28, 2018, 7:32pm

No, they are slow sometimes, e.g. 1-2 seconds when <0.1 seconds would be okay. There are several machines, but this happens also on a single machine.

dnaber · February 28, 2018, 7:34pm

We don’t have access to the proxy / load balancer, it’s a standard load balancer running in the cloud.

oserikov · February 28, 2018, 7:41pm

And the instances running the LT-server, are they accessible? We could do messages queuing via separate tool on the instance level.

dnaber · March 1, 2018, 9:02am

I’m not sure what you mean by “accessible” - the servers are accessible via ssh.

oserikov · March 1, 2018, 3:01pm

That’s exactly what I meant. I’m going to load-test the original LT-server implementation along with the sparkjava and springboot proxy-driven implementations.
Now I’m not sure if I need the errors-containing data. Maybe I’ll artificially generate some errors, but anyway I’m interested in the ability to look at the pairs <original sentence, corrected sentence> collected by LT. Is it possible for LT to share that data?

dnaber · March 1, 2018, 3:35pm

Even though the users allow us to store their data, it might still contain personal information, so I cannot share it (or only examples which I have checked).

oserikov · March 1, 2018, 3:50pm

So I’ll proceed with the artifical generation of the errors.

arysin · March 3, 2018, 6:05pm

We could also look into reactive approach/frameworks, it’s getting very popular lately with serverless direction. I believe springframework has some support for this too.

oserikov · March 4, 2018, 3:34am

Sounds reasonable to me, I’ll give it a try. Thanks for the suggestion

dnaber · March 8, 2018, 10:15am

It turns out that the “peak” situations are less bad, it was at least partially a problem with the measurement. Use I used curl with the https URL, but the time this command takes includes the HTTPS setup time. A simple workaround is to use curl url1 url2, with url1 and url2 being on the same server. This way, the SSL overhead will only occur once (i.e., one needs to ignore the time for the first URL).

Anyway, I still consider switching to a lightweight framework a goal.

oserikov · March 11, 2018, 2:41pm

I’m planning to compare the framework implementations using the http calls cause I don’t think that https support can be the bottleneck.
Now I’m finishing my JMeter test plan and go then to compare SpringBoot and SparkJava (reactive approach mentioned by @arysin is in development) performance on the http load, but the test plan could be easily enhanced to support both http and https.
To simplify and clarify the tests I use default server configs and the LT calls are unauthorized. Is it Ok to load-test with these simplifications or you’d suggest some specific config settings?

dnaber · March 11, 2018, 2:57pm

I think it’s okay for now. How large is the input? This has a large influence on response time. I think the input length should be random, with a distribution we can find out from our log files.

oserikov · March 11, 2018, 3:45pm

I think that it’s convenient to take the input from the input file or set of files – that’s the easiest way to control the input size and other params.

oserikov · March 12, 2018, 2:36am

BTW, didn’t you plan to replace maven by gradle? The latter is more human-readable etc…

dnaber · March 12, 2018, 8:01am

Yes, but I got stuck because of the complexity (LibreOffice add-on, stand-alone, command-line, …). If you want to work on that, might be a nice part of a GSoC project.

oserikov · March 12, 2018, 5:44pm

I see the maven-site-plugin in plugins list, is it used somewhere in the project? Can’t find any reference neither in utility .sh scripts nor in the documentation.