Back to LanguageTool Homepage - Privacy - Imprint

Crash in line-by-line mode. bug?


(Ruud Baars) #1

Okay, the textfile is large. But in line by line mode, there should be no memory claim above line size, i guess..

ruud@ruud-laptop:~/Bureaublad/LanguageTool-3.7$ java -jar languagetool-commandline.jar -l nl --line-by-line testmateriaal_lt.txt
Expected text language: Dutch
Warning: running in line by line mode. Cross-paragraph checks will not work.

Working on testmateriaal_lt.txt...
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOfRange(Arrays.java:3664)
at java.lang.String.(String.java:207)
at java.lang.StringBuilder.toString(StringBuilder.java:407)
at org.languagetool.commandline.Main.handleLine(Main.java:279)
at org.languagetool.commandline.Main.runOnFileLineByLine(Main.java:272)
at org.languagetool.commandline.Main.main(Main.java:460)
ruud@ruud-laptop:~/Bureaublad/LanguageTool-3.7$


(Lodewijk Arie van Brienen) #2

that's input memory, but what about the memory for (temporary) results?
EG: 120 character line,
if only 120 bytes was allocated, then it would crash immediately when LT tries to determine the first word as the first letter would need to be put in the 121st byte, which would be out of scope.


(Daniel Naber) #3

Are you sure the file has correct line endings (\n on Linux)? Without that, LT might consider all lines to be one line.


(Ruud Baars) #4

Yes, it has \n (just one) after every line.


(Lodewijk Arie van Brienen) #5

Does the command-line version have an option for reporting the line being done and its size?
EG: working on: line 220 of 3096, 120 characters.

such an option should make it easier to determine where the problem is coming from.

(personally I suspect the problem to be caused by flawed overhead handling.)


(Daniel Naber) #6

In that case I'm running out of ideas - I tested with a 900MB file and it works for me (at least it starts, I didn't wait for everything to be checked). Are you using the latest daily snapshot of LT?


(Ruud Baars) #7

No, not the daily snapshot. Have to rebuld the auto-download, auto-update, auto-replace grammar file all again.


(Ruud Baars) #8

But anyway, since I found out how to address the local server now, I only have to decypher the JSON into a workable array to speed up things.


(jaumeortola) #9

Have you tried something like this to limit the memory usage?

java -Xms1024m -Xmx2048m -jar languagetool-commandline.jar ...


(Ruud Baars) #10

No, I have not. To be honest, not going to as well. Gout the server based implementation again, which works a lot better. Don see the need for large memory as well, when reading line by line, onless the file is read into memory, which is not needed in line by line mode. I will let this pass...