Overlapping errors

LanguageTool rules sometimes create overlapping errors. This is annoying because different user interfaces show different results, and the best rule/suggestion is not always shown.

In addition, I have found a buggy behavior in LibreOffice (5.2.3.2 with LT 3.5). If there are overlapping errors, then the underlining of other errors in the same paragraph sometimes disappears.

I think we need a general solution: to set some rules for selecting the errors to be shown in every case and unburden the UI from this task. This has pros and cons.

So what do you think? What criteria could be used to set priorities? Longer strings vs. shorter strings, Java rules vs. XML rules, spelling vs. grammar/style…

So in our web-UI, the Chrome/Firefox extension, and Google Docs add-on we only leave the last overlapping rule. I think this is because it may be hard to show overlapping rules in web UI.
So I had to rearrange my Java rules with least important added first.
XML rules added after so they always “beat” Java rules.
Not sure how LibreOffice extension and standalone version behaves.
We could either try to show all rules in web-UIs or add “priority” property to rules so they can get some “weight”. Either will require non-trivial work.

In the web UI, for overlapping errors, the last listed one (in the API output) is selected and the rest are ignored. This method simplifies the generation of HTML results in the web editor. The errors are processed backwards so there is no need to track the offset position in the HTML.
In the other UI’s (Google Docs…), all overlapping errors are shown, which is a bit confusing for the user, I think.
I’ve seen this for the browser add-on, but it only works for exact overlap, not for partially overlapping errors.
In LibreOffice the behavior is messy and we have no control over it.
Some time ago @dnaber implemented an attribute to set rule priorities, but it was removed because nobody used it. I am not able to find the code now.

I think this was only for categories, not for individual rules.

In LibreOffice, we should have the same control as elsewhere, we just need to iterate over the matches and clean them up. Not sure if/how we do that now.

An attribute for categories seems a good solution.

Yes, you are right. We can do it.
Anyway, I’d rather do it also for the http API, so that most UI’s will benefit and will show the same results. The only downside I can anticipate is for development. In development sometimes it is preferable to see all the errors regardless of overlapping. We could keep the results as they are now for the command-line output.

What about a new option like --all-errors that needs to be activated to see all errors? I agree that all UIs should show the same errors, and that they should not need to care about the logic to clean up overlapping errors.

Then we can write a method that takes in a list of rule matches and returns a list of rule matches without overlapping, and use it when appropriate.

As for priorities, instead of an attribute for categories or rules, every language can have a list of priorities (with category IDs or even rule IDs if necessary). As this list will be probably short, we don’t need a file, it can be just a property in the language definition.

I have created a branch with a possible implementation of this idea. I think it’s easy and clear enough.

Now the cleaning is set to true by default, and it is set to false exceptionally for some tests.

I think the branch can be merged as it is now.

The cleanup of overlapping errors is now off by default in the command-line output. This makes sense, because there is no actual overlapping here and seeing all the errors is useful for testing and analyzing.

Regarding how to set the priorities, I think there is no need to make it more complicated (i.e. an attribute in xml rules). Probably this feature will be not much used by most of the languages.

Okay, I agree. Please remember to also document the change in CHANGES.md.

This nightly differences show the cleanup of overlapping errors. So everyone can see what is going on with the cleanup.

I will set it off by default in the Wikipedia check. In tests is better to see all the errors.

I found that joined errors (juxtaposed but not overlapping) are also removed. For example, in German: “TRGS - Technische Regeln für Gefahrstoffe”. This is not a desired behavior. I will change it.

In the GUI, I get unexpected results that I think are related to the rules for overlapping errors. Refer to the annotated screen shots.

Hi, Mike.

I made some changes on December 16. Try some snapshot after that day and check if it is fixed.

Hi @jaumeortola,

These are the results from snapshot 2016-12-21.

Aside: This is what I get with LT 35 when both rules are activated:


I would really like to show both errors in LT 3.6. Is it possible and if yes, how?

Okay, I see. We should set off by default the cleanOverlappingErrorsFilter for the stand-alone version. This UI is intended mostly for testing, and the overlapping errors can be shown without problems. I’ll do it right now.

On the other hand, from you example I discover a problem I had not anticipated, which can affect other UIs:

  • A and B are overlapping errors.
  • A is removed by the cleanOverlappingErrorsFilter.
  • The rule for B is disabled in the UI (but the LT checker knows nothing about it).
  • Result: no error is shown, although A is expected.

There is no solution to this problem, unless we send the information about disabled rules in the UI to the LT checker.

BTW, our browser add-on would have spotted that error (you/your), I can recommend it :slight_smile:

I was indeed using the add-on, but in the settings I had a server other than the default server. The problems of being a tester…

Jaume, thank you.

Hi Jaume, I understand why you want to show only one error from a set of overlapping errors. But, if overlapping errors exist, to not show an error seems like a design flaw. The rules in LT are not perfect. But, as best we can, if we know that there is an error, we should give a message to the user.

Mike, you are right. It is certainly a design flaw (my A&B example). It affects the web browser and the Google Docs add-ons, because they can disable rules without informing the LT server.

A possible solution is just to turn off the overlapping errors filter in the LT server. Then the only UI that will use the filter is LibreOffice. But this is not a good solution for the web UI.

Other solutions involve more changes:

  1. Make the web browser and Google Docs add-ons inform the server about disabled rules.
  2. Add a parameter to the LT API (removeOverlappingErrors?), which can be set to true or false from the UI.

Or we can ignore the issue because hopefully it will be very infrequent.

What do you think, @dnaber?