Hope you are all having a great weekend.
Yesterday’s regression tests were supposed to be calm but they have shown odd results that I am still trying to diagnose.
Making my usual ‘find’ query to the rules with changes, I have gotten the following results:
ESTAR_SEGURO_QUE +5 (LEGIT) VERBO_DE_QUE +1 (LEGIT) ESTAR_CLARO_DE_QUE +3 (LEGIT) INSISTIR_DE_QUE 0 PROFANITY 0 (1 HIDDEN) 3 words fixed -1 (6 matches in the same change description) Marco rules 0 Ellipsis change pushed by Yakov: UPPERCASE_SENTENCE_START -111 (MOST LEGIT NEGATIVES OR HIDDEN)
Great results… but:
The regression output has 15.148 lines.
Assuming that each change produces 15 lines (it is actually quite less) that would produce a regression test with roughly 1.815 lines (121 * 15).
The regression verbosity is at least 8 times larger than expected.
The issue seams due to former changes appearing again, in duplicate, as both new detections and as dropped detections.
I do not know how to understand these extra results.
Can anyone with more experience explain this extra verbosity in results and what can be done avoid it?
PS - Considering the hidden query results, the summary fits:
-Portuguese: 4661 total matches +Portuguese: 4643 total matches Portuguese: ø0,12 rule matches per sentence 4.661 + (5 + 1 + 3 - 1 - 111) = 4.559