Getting more hits in testing the results

marcoagpinto · May 22, 2024, 6:44am

Heya,

While using:
java -Dfile.encoding=UTF-8 -Xmx4500M -jar languagetool-wikipedia.jar check-data -l pt-PT -r MORRER_PERECER_FALECER -f pt-BR.txt --max-sentences 900000 --context-size 100 >0.txt

It doesn’t get hits like:

Morra logo!

But they appear in the nightly results:
https://internal1.languagetool.org/regression-tests/via-http/2024-05-21/pt-BR/result_style_MORRER_PERECER_FALECER[1].html

Is there anything I can change in the command to get them?

PS→ Regarding the UTF-8 issue I posted the other day, it may be a Windows 11 bug, so I will wait for an update for Windows and try again.

Thanks!

marcoagpinto · May 23, 2024, 4:49pm

Maybe it is related to this, I believe.

What should I place here to get all hits?

Thanks!

marcoagpinto · May 23, 2024, 6:02pm

Ahhhh…

It is related to this:
Portuguese (Portugal): 56090 input lines ignored (e.g. not between 10 and 300 chars or at least 4 tokens)

How do I change to 1 token?

Thanks!