Words detected by 2 rules

(Ruud Baars) #1

There is a rule for Dutch that catches etc without a period. But here is also a rule detecting double words, like ‘etc etc’.

When the input contains etc etc, only the second error is reported.


(Andriy) #2

We filter out overlapping rules.
I think some time ago REST API was returning overlapping errors and removing overlaps was done in UI, but recently we added the logic to remove overlaps in LT core code.
I actually have request from users to somehow indicate if there’s no than 1 error rather than pick one.

(Ruud Baars) #3

Hm. I understand overlapping errors is a challenge. But in this case, I had rather have the abbreviation rule reporte twicem than the double word. I can fix that by excluding etc from the double word list.

(Andriy) #4

Looks like it was done in 3.6 and you can potentially define priority for rules:

A new method for removing overlapping errors has been implemented. By default, it is enabled for the HTTP API and LibreOffice outputs, and disabled for the command-line output. If necessary, priorities for rules and categories can bet set in Language.getPriorityForId(String id). Default value is 0, positive integers have higher priority and negative integers have lower priority.

(Ruud Baars) #5

I read that once. But the priority method is not easy to implement.