Words detected by 2 rules

Ruud_Baars · August 24, 2018, 2:20pm

There is a rule for Dutch that catches etc without a period. But here is also a rule detecting double words, like ‘etc etc’.

When the input contains etc etc, only the second error is reported.

Why?

arysin · August 24, 2018, 2:42pm

We filter out overlapping rules.
I think some time ago REST API was returning overlapping errors and removing overlaps was done in UI, but recently we added the logic to remove overlaps in LT core code.
I actually have request from users to somehow indicate if there’s no than 1 error rather than pick one.

Ruud_Baars · August 24, 2018, 2:57pm

Hm. I understand overlapping errors is a challenge. But in this case, I had rather have the abbreviation rule reporte twicem than the double word. I can fix that by excluding etc from the double word list.

arysin · August 24, 2018, 3:22pm

Looks like it was done in 3.6 and you can potentially define priority for rules:

github.com

languagetool-org/languagetool/blob/master/languagetool-standalone/CHANGES.md

# LanguageTool Change Log

## 6.4-SNAPSHOT (release planned for 2024-03-28)

...

#### English
  * tagger and spelling dictionaries have been moved to an external dependency (english-pos-dict v 0.3)

#### Ukrainian
  * new words in the POS dictionary
  * new rules
  * tagging and disambiguation improvements


## 6.3 (2023-10-06)

#### Catalan
  * added and improved rules

This file has been truncated. show original

A new method for removing overlapping errors has been implemented. By default, it is enabled for the HTTP API and LibreOffice outputs, and disabled for the command-line output. If necessary, priorities for rules and categories can bet set in Language.getPriorityForId(String id). Default value is 0, positive integers have higher priority and negative integers have lower priority.

Ruud_Baars · August 24, 2018, 3:42pm

I read that once. But the priority method is not easy to implement.