Crash in line-by-line mode. bug?

Ruud_Baars · June 24, 2017, 1:35pm

Okay, the textfile is large. But in line by line mode, there should be no memory claim above line size, i guess…

ruud@ruud-laptop:~/Bureaublad/LanguageTool-3.7$ java -jar languagetool-commandline.jar -l nl --line-by-line testmateriaal_lt.txt
Expected text language: Dutch
Warning: running in line by line mode. Cross-paragraph checks will not work.

Working on testmateriaal_lt.txt…
Exception in thread “main” java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOfRange(Arrays.java:3664)
at java.lang.String.(String.java:207)
at java.lang.StringBuilder.toString(StringBuilder.java:407)
at org.languagetool.commandline.Main.handleLine(Main.java:279)
at org.languagetool.commandline.Main.runOnFileLineByLine(Main.java:272)
at org.languagetool.commandline.Main.main(Main.java:460)
ruud@ruud-laptop:~/Bureaublad/LanguageTool-3.7$

SkyCharger001 · June 24, 2017, 2:04pm

that’s input memory, but what about the memory for (temporary) results?
EG: 120 character line,
if only 120 bytes was allocated, then it would crash immediately when LT tries to determine the first word as the first letter would need to be put in the 121st byte, which would be out of scope.

dnaber · June 24, 2017, 2:14pm

Are you sure the file has correct line endings (\n on Linux)? Without that, LT might consider all lines to be one line.

Ruud_Baars · June 25, 2017, 8:45am

Yes, it has \n (just one) after every line.

SkyCharger001 · June 25, 2017, 9:03am

Does the command-line version have an option for reporting the line being done and its size?
EG: working on: line 220 of 3096, 120 characters.

such an option should make it easier to determine where the problem is coming from.

(personally I suspect the problem to be caused by flawed overhead handling.)

dnaber · June 25, 2017, 9:07am

In that case I’m running out of ideas - I tested with a 900MB file and it works for me (at least it starts, I didn’t wait for everything to be checked). Are you using the latest daily snapshot of LT?

Ruud_Baars · June 25, 2017, 12:43pm

No, not the daily snapshot. Have to rebuld the auto-download, auto-update, auto-replace grammar file all again.

Ruud_Baars · June 25, 2017, 12:47pm

But anyway, since I found out how to address the local server now, I only have to decypher the JSON into a workable array to speed up things.

jaumeortola · June 27, 2017, 10:19am

Have you tried something like this to limit the memory usage?

java -Xms1024m -Xmx2048m -jar languagetool-commandline.jar ...

Ruud_Baars · June 27, 2017, 10:35am

No, I have not. To be honest, not going to as well. Gout the server based implementation again, which works a lot better. Don see the need for large memory as well, when reading line by line, onless the file is read into memory, which is not needed in line by line mode. I will let this pass…