Getting most out of a corpus

Ah, now I understand. I just tried to increase the number of threads, but that didn’t help. That would have been the easy solution. I assume you always need all rules to be active? Running only a few rules could speed things up.

Yes, I need all rules. Even if it makes things slower. Having spellcheck on will shrink the data even more. And remove more trash.

But still… having all rules on would make me expect higher cpu load, not lower. I am wondering what the idle time is going.

Maybe this helps a bit?

375681 ruud 20 0 21,8g 2,9g 22936 R 60,0 4,7 1140:19 java
375728 ruud 20 0 21,8g 2,9g 22936 S 20,0 4,7 306:40.72 ForkJoinPool-1-
375729 ruud 20 0 21,8g 2,9g 22936 S 20,0 4,7 320:04.28 ForkJoinPool-1-
375730 ruud 20 0 21,8g 2,9g 22936 S 20,0 4,7 311:40.73 ForkJoinPool-1-
375731 ruud 20 0 21,8g 2,9g 22936 S 20,0 4,7 321:04.31 ForkJoinPool-1-
375732 ruud 20 0 21,8g 2,9g 22936 S 20,0 4,7 318:23.99 ForkJoinPool-1-
375734 ruud 20 0 21,8g 2,9g 22936 S 20,0 4,7 319:53.53 ForkJoinPool-1-
375735 ruud 20 0 21,8g 2,9g 22936 S 20,0 4,7 321:11.40 ForkJoinPool-1-
375736 ruud 20 0 21,8g 2,9g 22936 S 20,0 4,7 319:08.46 ForkJoinPool-1-
375737 ruud 20 0 21,8g 2,9g 22936 S 20,0 4,7 309:33.54 ForkJoinPool-1-
375738 ruud 20 0 21,8g 2,9g 22936 R 20,0 4,7 320:44.35 ForkJoinPool-1-
375739 ruud 20 0 21,8g 2,9g 22936 R 20,0 4,7 320:02.02 ForkJoinPool-1-
375740 ruud 20 0 21,8g 2,9g 22936 S 20,0 4,7 319:38.42 ForkJoinPool-1-
375741 ruud 20 0 21,8g 2,9g 22936 R 20,0 4,7 320:24.65 ForkJoinPool-1-
375742 ruud 20 0 21,8g 2,9g 22936 R 20,0 4,7 318:39.43 ForkJoinPool-1-
375743 ruud 20 0 21,8g 2,9g 22936 R 20,0 4,7 319:04.74 ForkJoinPool-1-
375733 ruud 20 0 21,8g 2,9g 22936 S 13,3 4,7 321:10.75 ForkJoinPool-1-
375680 ruud 20 0 21,8g 2,9g 22936 S 0,0 4,7 0:00.00 java
375682 ruud 20 0 21,8g 2,9g 22936 S 0,0 4,7 0:52.07 GC Thread#0
375683 ruud 20 0 21,8g 2,9g 22936 S 0,0 4,7 0:00.00 G1 Main Marker
375684 ruud 20 0 21,8g 2,9g 22936 S 0,0 4,7 0:00.07 G1 Conc#0
375685 ruud 20 0 21,8g 2,9g 22936 S 0,0 4,7 0:00.01 G1 Refine#0
375686 ruud 20 0 21,8g 2,9g 22936 S 0,0 4,7 0:26.17 G1 Young RemSet
375687 ruud 20 0 21,8g 2,9g 22936 S 0,0 4,7 0:47.47 VM Thread
375688 ruud 20 0 21,8g 2,9g 22936 S 0,0 4,7 0:00.00 Reference Handl
375689 ruud 20 0 21,8g 2,9g 22936 S 0,0 4,7 0:00.00 Finalizer
375690 ruud 20 0 21,8g 2,9g 22936 S 0,0 4,7 0:00.00 Signal Dispatch
375691 ruud 20 0 21,8g 2,9g 22936 S 0,0 4,7 0:00.00 Service Thread
375692 ruud 20 0 21,8g 2,9g 22936 S 0,0 4,7 0:24.28 C2 CompilerThre
375693 ruud 20 0 21,8g 2,9g 22936 S 0,0 4,7 0:01.86 C1 CompilerThre
375694 ruud 20 0 21,8g 2,9g 22936 S 0,0 4,7 0:00.05 Sweeper thread
375696 ruud 20 0 21,8g 2,9g 22936 S 0,0 4,7 0:26.51 VM Periodic Tas
375697 ruud 20 0 21,8g 2,9g 22936 S 0,0 4,7 0:00.04 Common-Cleaner
375707 ruud 20 0 21,8g 2,9g 22936 S 0,0 4,7 0:51.91 GC Thread#1
375708 ruud 20 0 21,8g 2,9g 22936 S 0,0 4,7 0:52.03 GC Thread#2
375709 ruud 20 0 21,8g 2,9g 22936 S 0,0 4,7 0:51.91 GC Thread#3
375710 ruud 20 0 21,8g 2,9g 22936 S 0,0 4,7 0:51.93 GC Thread#4
375711 ruud 20 0 21,8g 2,9g 22936 S 0,0 4,7 0:52.01 GC Thread#5
375712 ruud 20 0 21,8g 2,9g 22936 S 0,0 4,7 0:51.98 GC Thread#6
375713 ruud 20 0 21,8g 2,9g 22936 S 0,0 4,7 0:51.88 GC Thread#7
375714 ruud 20 0 21,8g 2,9g 22936 S 0,0 4,7 0:51.92 GC Thread#8
375715 ruud 20 0 21,8g 2,9g 22936 S 0,0 4,7 0:52.14 GC Thread#9
375716 ruud 20 0 21,8g 2,9g 22936 S 0,0 4,7 0:52.10 GC Thread#10
375717 ruud 20 0 21,8g 2,9g 22936 S 0,0 4,7 0:52.10 GC Thread#11

Somebody would need to have a close look at the code. I don’t think there’s a quick-fix solution.

I would be ok with the LanguageTool command line outputting the whole sentence, not part of it.
For now, I manage to spellcheck the file using php