Back to LanguageTool Homepage - Privacy - Imprint

Sentence splitter

There must have been a change in sentence splitting (SRX) that affects Dutch unexpectedly.
A split is suddenly missed in
https://internal1.languagetool.org/regression-tests/via-http/2020-08-02/nl/index.html

Since I am not at home, I can not check what happened. Who changed the srx lately, or is it something else?

Close to the affected input there was a sentence with a soft hyphen, and handling those has changed. Should this occur again, please let me know.

Unexpected redults again. Looks like the sentence split has improved. But now there is a sentence whitespace giving an invalid suggestion. And morfologik that accepts a weird word suddenly.

Must be some side effects…

It might be caused by not all servers running the exactly same version of LT (they differed by 2 or 3 days). This should be fixed soon. So please keep an eye on it.

I am keeping an eye, but there seem to be no nighlies.

Tonights os nightly has a lot of entries. Mostly true positives. Did thectest input change?

No, but KORT_1 and KORT_2 have lower priority now, causing other (hopefully more specific matches) to appear.

Okay, that is good. Priorities are difficult to tune, since they are in Java, while most rules are xml. Would it be possible to have a rule priority number 0- 99 in rule xml e.g.?

Technically yes, but with the current approach all the priorities are in the same place, which makes it easy to compare them. Syntax-wise, it’s trivial to add priorities. The code is here:

Yes, but it assumes constant id’s, which is not strange, but not always true in maintenance. I will just do without it, until it proves essential. Java Code is something I do not feel competent with.

We shouldn’t change IDs. If we do, it breaks user’s configurations, i.e. rules they turned off will suddenly become active again. The only exception I can think of is a very new rule which was just introduced.

More unexpected true positives in the nightly. What id the csuse?

EINDE_ZIN_ONVERWACHT's priority has been decreased so it doesn’t hide more important errors.