Sometimes wrong column number from command-line tool

matze · January 15, 2020, 11:48am

The command-line version languagetool-commandline.jar sometimes produces too small column numbers, if the problem arises on the first text line. This only happens at the first line, and it is not restricted to language de-DE.

Matthias

EDIT: JSON output is correct, version is 4.8.

In this example, column number should be 1:

$ cat t.txt 
das ist ein Test.
$
$ java -jar languagetool-commandline.jar --language de-DE t.txt 
Expected text language: German (Germany)
Working on t.txt...
1.) Line 1, column 0, Rule ID: UPPERCASE_SENTENCE_START
Message: Dieser Satz fängt nicht mit einem großgeschriebenen Wort an
Suggestion: Das
das ist ein Test. 
^^^               
Time: 792ms for 1 sentences (1.3 sentences/sec)

matze · January 16, 2020, 5:01pm

The problem seems to stem from file JLanguageTool.java. On original code line 1277

            if (match.getLine() == 0) {

the test aims to identify the first text line. Object ‘match’, however, has still the “uninitialised” line number -1. The test should probably use the object ‘newMatch’ whose line number has been set just before:

            if (newMatch.getLine() == 0) {

This change solves the problem indicated in the original post, but it breaks tests from

mvn clean test

In my own “working test cases”, now all works fine.

Matthias

SkyCharger001 · January 16, 2020, 5:42pm

the internal routines most likely start their counts at zero as this is easier for the computer to work with.
and given the number of rules, it wouldn’t surprise me if they forgot to adjust the displayed count to the start-count-at-one numbers that we humans normally use for some of the rules.

matze · January 16, 2020, 6:48pm

Likely seems to be more complex. The rule in question is internally handled a bit differently than, e.g., the rules for spelling errors and word repetitions. For those rules, all worked correctly already before.

Matthias

SkyCharger001 · January 16, 2020, 7:26pm

The fact that this rule is handled differently actually makes it more likely that we may be dealing with what I described as it could easily be skipping a step in translating the internal variables for display
and so does complexity as the more complex a rule is, the greater the chance of a (visual) bug being overlooked.

matze · January 17, 2020, 8:16am

Did create issue and pull request on GitHub. Fortunately, only one test had to be adapted.

Matthias

EDIT: see Issue #2341 on GitHub