@danielnaber @Yakov @marcoagpinto @matheuspoletto
Hope you are all having a great weekend.
Yesterday's regression tests were supposed to be calm but they have shown odd results that I am still trying to diagnose.
Making my usual 'find' query to the rules with changes, I have gotten the following results:
ESTAR_SEGURO_QUE +5 (LEGIT)
VERBO_DE_QUE +1 (LEGIT)
ESTAR_CLARO_DE_QUE +3 (LEGIT)
PROFANITY 0 (1 HIDDEN)
3 words fixed -1 (6 matches in the same change description)
Marco rules 0
Ellipsis change pushed by Yakov:
UPPERCASE_SENTENCE_START -111 (MOST LEGIT NEGATIVES OR HIDDEN)
Great results... but:
The regression output has 15.148 lines.
Assuming that each change produces 15 lines (it is actually quite less) that would produce a regression test with roughly 1.815 lines (121 * 15).
The regression verbosity is at least 8 times larger than expected.
The issue seams due to former changes appearing again, in duplicate, as both new detections and as dropped detections.
I do not know how to understand these extra results.
Can anyone with more experience explain this extra verbosity in results and what can be done avoid it?
PS - Considering the hidden query results, the summary fits:
-Portuguese: 4661 total matches
+Portuguese: 4643 total matches
Portuguese: ø0,12 rule matches per sentence
4.661 + (5 + 1 + 3 - 1 - 111) = 4.559