Could the full Tatoeba test results for a language also be made available somewhere?
You can create your own analysis like this:
- Download https://languagetool.org/download/temp/languagetool-dev-4.6-SNAPSHOT-20190417-shaded.jar
- Filter tatoeba for Dutch sentences:
grep "nld" sentences.csv >tatoeba-nl.csv
- Run:
java -cp languagetool-dev-4.6-SNAPSHOT-20190417-shaded.jar org.languagetool.dev.dumpcheck.SentenceSourceChecker -l nl -f tatoeba-nl.csv
I just thought… these are generated in the nightly run, so why not keep it for a day?
Is this shaded jar update daily? It reports ‘class not found’…
No, I created it manually. Please send the full error message and the way you call it.
Okay; got the error message solved; my mistake. I will check the entire Tatoeba output for false positives.
Found the sentences in Tatoeba to be partly quite ‘artificial’, like sentences created just to contain a word. But it will do nicely.
Thanks for helping.
I am now trying to remove all false positives generated on Tatoeba. From this, I am concluding Tatoeba is mostly used by informal Flemish people; the amount of examples having the word ‘ge’ or ‘gij’ in them is rather large; this is not used in any written communications any longer.
Most occurring false postives have been removed now. I will await tonight’s results.