[nl] Dutch update

I had some unexpected warnings from GIT trying to update the rules and data files. But in the end I succeeded (I think) GIT is surely not my thing. So I hope the online version will work after tonights update …

How have you tested the synthesizer? DutchSynthesizerTest now fails, e.g. zwemmen doesn’t synthesize zwommen anymore, similar for the other synthesizer tests.

DutchTaggerTest doesn’t find tags for these words, is that expected? Dit, het, tje

How do I do these tests?

If these are Java programs… I dont’t do Java. It might be the case the code contains old tags to test. Since all tags have changed, these tests will fail.

You need a developer setup to run them, i.e. the Java JDK and Maven. Then you can run mvn test. But you don’t need to set this up if you don’t want to. I now see that the test fail because of the new tag set. I’ve made most of the synth test working again by replacing the tags like this (old -> new):

VBh -> WKW:VLT:INF
NN2 -> ZNW:MRV:DE_
VB3 -> WKW:TGW:3EP
VBp -> WKW:VTD:ONV

Does this look correct? What should the old tag VB:.* be replaced with?

Also, should Dit, het, and tje be tagged? They used to be tagged but are not after your latest update.

I was right, there are old tags in code of DutchSynthesizerTest.java

assertEquals("[zwommen]", Arrays.toString(synth.synthesize(dummyToken("zwemmen"), "VBh")));
assertEquals("[Afro-Surinamers]", Arrays.toString(synth.synthesize(dummyToken("Afro-Surinamer"), "NN2")));
assertEquals("[hebt, heeft]", Arrays.toString(synth.synthesize(dummyToken("hebben"), "VB3", true)));
//with regular expressions
assertEquals("[doorgeseind]", Arrays.toString(synth.synthesize(dummyToken("doorseinen"), "VBp", true)));    
assertEquals("[doorsein, doorseint, doorseinden, doorseinde, doorseinen, doorgeseind, doorgeseinde]", Arrays.toString(synth.synthesize(dummyToken("doorseinen"), "VB.*", true)));

VBh now is WKW:VLT:INF
NN2 is now ZNW:MRV:DE_
VB3 is now WKW:TGW:3EP
VBp is now WKW:VTT:ONV
VB.* now is WKW.* and will result in more values

Thanks, the Synthesizer test is working again, DutchTaggerTest not yet (see above).

About dit, het, de : These tags were of nu use at all, never used. These words have so many word types and applications, so I removed the tags of very frequent words that cause false alarms all the time. There is more to do on that.

Okay, I’ve now adapted the test and Dutch should be fine again. The new version should be online today at about 22:30 CEST on languagetool.org.

I will check on it tomorrow.