Odd regression test results

@danielnaber @Yakov @marcoagpinto @matheuspoletto

Hi everybody,

Hope you are all having a great weekend.
Yesterday’s regression tests were supposed to be calm but they have shown odd results that I am still trying to diagnose.
https://languagetool.org/regression-tests/20161029/result_pt_20161029.html
Making my usual ‘find’ query to the rules with changes, I have gotten the following results:

ESTAR_SEGURO_QUE		+5	(LEGIT)
VERBO_DE_QUE			+1	(LEGIT)
ESTAR_CLARO_DE_QUE		+3	(LEGIT)
INSISTIR_DE_QUE			 0
PROFANITY			 0 	(1 HIDDEN)
3 words fixed			-1 	(6 matches in the same change description)	
Marco rules			 0
Ellipsis change pushed by Yakov:
UPPERCASE_SENTENCE_START      -111 	(MOST LEGIT NEGATIVES OR HIDDEN)

Great results… but:

The regression output has 15.148 lines.
Assuming that each change produces 15 lines (it is actually quite less) that would produce a regression test with roughly 1.815 lines (121 * 15).

The regression verbosity is at least 8 times larger than expected.

The issue seams due to former changes appearing again, in duplicate, as both new detections and as dropped detections.

I do not know how to understand these extra results.
Can anyone with more experience explain this extra verbosity in results and what can be done avoid it?

Cheers!

PS - Considering the hidden query results, the summary fits:

-Portuguese: 4661 total matches
+Portuguese: 4643 total matches
 Portuguese: ø0,12 rule matches per sentence

4.661 + (5 + 1 + 3 - 1 - 111) = 4.559

The regression test result is actually just the output of the Linux command diff. If error matches move around in the result, diff can lose track of them, which makes it look like they are removed at some place and added at another place. So this is nothing to be worried about. Making the output better (i.e. no duplication) would probably be quite some work.

The reason you see tons of differences is that we changed sentence tokenizer. Before the … was splitting the sentence and now it does not. And in the output the context is limited by the sentence so all the output that had sentences with … has changed.
This should be one-time deal (until you change the sentence tokenizing rules again).

Awesome!

I have reviewed all commits to try to figure out if there was any unintended change and everything was good.
This closes the case. :slight_smile:

I will up a new batch of rules in a few moments.

Thank you Daniel.

I think ellipsis sign and 3 dots are processed as expected now,

That is why I marked as legit negative. There are a few cases where uppercase was needed but they require complex grammar.xml rules.

I thank you for taking your time to look into these matters and helping us improve the portuguese correction.

Cheers!

Sorry Arysin for skipping your reply. The page did not update while I was replying.

If the tokenizer requires further changes, I will be aware of this consequence.
Thank you Arysin.

@dnaber

Today, I had again one of those odd test results.
https://languagetool.org/regression-tests/20170324/result_pt-PT_20170324.html

Considering that I had made some extensions to the general agreement rules and to the suggestions, I was expecting a cheerful regression test, but not a 12Mb load of fun.

Actually only 3 new positive appeared (search + Line) but something seems to have disabled part of the rules during this test. I checked LT portal and everything seems to be working as usual (slightly improved actually). The massive changes in disambiguation should not effect that test since they are related to spellchecking.

What was the change that triggered this event in the regression tests? Nothing in my changes or results points out to this type of result.

Sorry to bother with this again, but these odd events seem to repeat once in a while, and I really want to avoid them.

The process crashed with this exception:

Exception in thread "main" java.lang.RuntimeException: Check failed on sentence: Hamilton, Edith, Mythology, New York: Mentor, 1942
        at org.languagetool.dev.dumpcheck.SentenceSourceChecker.run(SentenceSourceChecker.java:189)
        at org.languagetool.dev.dumpcheck.SentenceSourceChecker.main(SentenceSourceChecker.java:80)
        at org.languagetool.dev.wikipedia.Main.main(Main.java:45)
Caused by: java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.IndexOutOfBoundsException: Index: 3, Size: 3
        at org.languagetool.MultiThreadedJLanguageTool.analyzeSentences(MultiThreadedJLanguageTool.java:169)
        at org.languagetool.JLanguageTool.check(JLanguageTool.java:562)
        at org.languagetool.JLanguageTool.check(JLanguageTool.java:532)
        at org.languagetool.JLanguageTool.check(JLanguageTool.java:497)
        at org.languagetool.JLanguageTool.check(JLanguageTool.java:480)
        at org.languagetool.dev.dumpcheck.SentenceSourceChecker.run(SentenceSourceChecker.java:179)
        ... 2 more
Caused by: java.util.concurrent.ExecutionException: java.lang.IndexOutOfBoundsException: Index: 3, Size: 3
        at java.util.concurrent.FutureTask.report(FutureTask.java:122)
        at java.util.concurrent.FutureTask.get(FutureTask.java:192)
        at org.languagetool.MultiThreadedJLanguageTool.analyzeSentences(MultiThreadedJLanguageTool.java:162)
        ... 7 more
Caused by: java.lang.IndexOutOfBoundsException: Index: 3, Size: 3
        at java.util.ArrayList.rangeCheck(ArrayList.java:653)
        at java.util.ArrayList.get(ArrayList.java:429)
        at org.languagetool.tagging.disambiguation.rules.DisambiguationPatternRuleReplacer.replace(DisambiguationPatternRuleReplacer.java:97)
        at org.languagetool.tagging.disambiguation.rules.DisambiguationPatternRule.replace(DisambiguationPatternRule.java:101)
        at org.languagetool.tagging.disambiguation.rules.XmlRuleDisambiguator.disambiguate(XmlRuleDisambiguator.java:60)
        at org.languagetool.tagging.disambiguation.pt.PortugueseHybridDisambiguator.disambiguate(PortugueseHybridDisambiguator.java:49)
        at org.languagetool.JLanguageTool.getAnalyzedSentence(JLanguageTool.java:769)
        at org.languagetool.MultiThreadedJLanguageTool$AnalyzeSentenceCallable.call(MultiThreadedJLanguageTool.java:236)
        at org.languagetool.MultiThreadedJLanguageTool$ParagraphEndAnalyzeSentenceCallable.call(MultiThreadedJLanguageTool.java:247)
        at org.languagetool.MultiThreadedJLanguageTool$ParagraphEndAnalyzeSentenceCallable.call(MultiThreadedJLanguageTool.java:240)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

I had 3 overlapping spelling rules related with New York (New York, New York Times and The New York Times). I am not sure if that is the meaning of Index: 3 Size: 3 in:

Caused by: java.lang.IndexOutOfBoundsException: Index: 3, Size: 3.

I have removed the redundancies.

Can this be the source of these problems (overlapping disambiguation rules)?

We’ll see tonight - if the diff is large again because all the removed matches have been added again, then that was the problem.

Many thanks for the prompt reply, Daniel.

Seems like that is something else missing. Did the error report changed or is it still throwing the same exception?
Can I get these test logs with a local WikiCheck test, like the one described by you in the link below?

http://wiki.languagetool.org/re-run-nightly-wikipedia-tatoeba-tests

Yes, still the same, but I’ve now added a workaround (fix?) so the issue shouldn’t occur again. I’ll now start the build and regression test manually.

Many thanks Daniel!

index 3? most arrays count from 0 not 1. (meaning that only indexes 0, 1 and 2 are valid)

It seems that it hasn’t solved yet the problem with regression tests, but I tested now the website and office extension and they are working properly. I am very busy with work related themes lately, so I haven’t done the WikiCheck test yet. Probably I will only be able to set it up (hopefully) next week.

I think it did, see the email “LanguageTool nightly diff test” at 12:17 yesterday, it contains a huge diff again.

If it is working, great! I have no problems with it. Apologies, but that e-mail did not arrive. I checked the SPAM folder on the webclient and there is nothing there either. Maybe some issue due to attachement size.
Anyway, from what I have seen in the Portuguese section all is good for release. I tested the wikipedia-tatoeba test file on LibreOffice and it doesn’t get stuck on checks. Probably tonight the regressions tests continue as usual, but I will not add anything but minor fixes today.