More data for the nightly tests

Hi, I have increased the number of sentences for the nightly regression test (email subject " LanguageTool (open-source) nightly diff test"). This means the next email might contain many changes that are not caused by changes in the rules/code, but by the larger test data. It would be great if you could check it anyway to find false alarms.

I have increased the number of sentences once more, so tonight there will probably be many new matches.

Tonight’s check will again use more data (10,000 additional sentences).

Hi Daniel,

This increase is very welcomed.

There is only a thing I would ask. The Wikipedia dump you are using is somewhat old, isn’t it? I guess it is older than five years at least, because a lot of Wikipedia sentences have errors that were corrected long time ago. If you use a current dump, we will get sentences of better quality and less true positives in the tests.

While old errors on Wikipedia have been removed, new ones will have been added :slight_smile: But let me know, I can replace the Catalan Wikipedia dump. But it also means you’ll get one huge list the day I replace it. Let me know if you want me to do that.

Yes, I would like to update the Catalan Wikipedia dump. Thank you.

I’ve replaced the Catalan Wikipedia dump (which was from end of 2012, BTW) with a current one. Let’s see whether that will work tonight… please let me know if it doesn’t.

I have increased the number of test sentences for all languages by another 5000 per language.

I’ve done this once more for tonight’s check (we’re at 100,000 sentence now per language).

More data has been added again yesterday and is showing up in the emails now.