Odd regression test results

tiagosantos · October 30, 2016, 1:21pm

@danielnaber @Yakov @marcoagpinto @matheuspoletto

Hi everybody,

Hope you are all having a great weekend.
Yesterday’s regression tests were supposed to be calm but they have shown odd results that I am still trying to diagnose.
https://languagetool.org/regression-tests/20161029/result_pt_20161029.html
Making my usual ‘find’ query to the rules with changes, I have gotten the following results:

ESTAR_SEGURO_QUE		+5	(LEGIT)
VERBO_DE_QUE			+1	(LEGIT)
ESTAR_CLARO_DE_QUE		+3	(LEGIT)
INSISTIR_DE_QUE			 0
PROFANITY			 0 	(1 HIDDEN)
3 words fixed			-1 	(6 matches in the same change description)	
Marco rules			 0
Ellipsis change pushed by Yakov:
UPPERCASE_SENTENCE_START      -111 	(MOST LEGIT NEGATIVES OR HIDDEN)

Great results… but:

The regression output has 15.148 lines.
Assuming that each change produces 15 lines (it is actually quite less) that would produce a regression test with roughly 1.815 lines (121 * 15).

The regression verbosity is at least 8 times larger than expected.

The issue seams due to former changes appearing again, in duplicate, as both new detections and as dropped detections.

I do not know how to understand these extra results.
Can anyone with more experience explain this extra verbosity in results and what can be done avoid it?

Cheers!

PS - Considering the hidden query results, the summary fits:

-Portuguese: 4661 total matches
+Portuguese: 4643 total matches
 Portuguese: ø0,12 rule matches per sentence

4.661 + (5 + 1 + 3 - 1 - 111) = 4.559

dnaber · October 30, 2016, 1:42pm

The regression test result is actually just the output of the Linux command diff. If error matches move around in the result, diff can lose track of them, which makes it look like they are removed at some place and added at another place. So this is nothing to be worried about. Making the output better (i.e. no duplication) would probably be quite some work.

arysin · October 30, 2016, 1:49pm

The reason you see tons of differences is that we changed sentence tokenizer. Before the … was splitting the sentence and now it does not. And in the output the context is limited by the sentence so all the output that had sentences with … has changed.
This should be one-time deal (until you change the sentence tokenizing rules again).

tiagosantos · October 30, 2016, 1:52pm

Awesome!

I have reviewed all commits to try to figure out if there was any unintended change and everything was good.
This closes the case.

I will up a new batch of rules in a few moments.

Thank you Daniel.

Yakov · October 30, 2016, 2:13pm

I think ellipsis sign and 3 dots are processed as expected now,

tiagosantos · October 30, 2016, 2:35pm

That is why I marked as legit negative. There are a few cases where uppercase was needed but they require complex grammar.xml rules.

I thank you for taking your time to look into these matters and helping us improve the portuguese correction.

Cheers!

tiagosantos · October 30, 2016, 2:54pm

Sorry Arysin for skipping your reply. The page did not update while I was replying.

If the tokenizer requires further changes, I will be aware of this consequence.
Thank you Arysin.

tiagosantos · March 25, 2017, 12:12am

@dnaber

Today, I had again one of those odd test results.
https://languagetool.org/regression-tests/20170324/result_pt-PT_20170324.html

Considering that I had made some extensions to the general agreement rules and to the suggestions, I was expecting a cheerful regression test, but not a 12Mb load of fun.

Actually only 3 new positive appeared (search + Line) but something seems to have disabled part of the rules during this test. I checked LT portal and everything seems to be working as usual (slightly improved actually). The massive changes in disambiguation should not effect that test since they are related to spellchecking.

What was the change that triggered this event in the regression tests? Nothing in my changes or results points out to this type of result.

Sorry to bother with this again, but these odd events seem to repeat once in a while, and I really want to avoid them.

dnaber · March 25, 2017, 9:37am

The process crashed with this exception:

Exception in thread "main" java.lang.RuntimeException: Check failed on sentence: Hamilton, Edith, Mythology, New York: Mentor, 1942
        at org.languagetool.dev.dumpcheck.SentenceSourceChecker.run(SentenceSourceChecker.java:189)
        at org.languagetool.dev.dumpcheck.SentenceSourceChecker.main(SentenceSourceChecker.java:80)
        at org.languagetool.dev.wikipedia.Main.main(Main.java:45)
Caused by: java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.IndexOutOfBoundsException: Index: 3, Size: 3
        at org.languagetool.MultiThreadedJLanguageTool.analyzeSentences(MultiThreadedJLanguageTool.java:169)
        at org.languagetool.JLanguageTool.check(JLanguageTool.java:562)
        at org.languagetool.JLanguageTool.check(JLanguageTool.java:532)
        at org.languagetool.JLanguageTool.check(JLanguageTool.java:497)
        at org.languagetool.JLanguageTool.check(JLanguageTool.java:480)
        at org.languagetool.dev.dumpcheck.SentenceSourceChecker.run(SentenceSourceChecker.java:179)
        ... 2 more
Caused by: java.util.concurrent.ExecutionException: java.lang.IndexOutOfBoundsException: Index: 3, Size: 3
        at java.util.concurrent.FutureTask.report(FutureTask.java:122)
        at java.util.concurrent.FutureTask.get(FutureTask.java:192)
        at org.languagetool.MultiThreadedJLanguageTool.analyzeSentences(MultiThreadedJLanguageTool.java:162)
        ... 7 more
Caused by: java.lang.IndexOutOfBoundsException: Index: 3, Size: 3
        at java.util.ArrayList.rangeCheck(ArrayList.java:653)
        at java.util.ArrayList.get(ArrayList.java:429)
        at org.languagetool.tagging.disambiguation.rules.DisambiguationPatternRuleReplacer.replace(DisambiguationPatternRuleReplacer.java:97)
        at org.languagetool.tagging.disambiguation.rules.DisambiguationPatternRule.replace(DisambiguationPatternRule.java:101)
        at org.languagetool.tagging.disambiguation.rules.XmlRuleDisambiguator.disambiguate(XmlRuleDisambiguator.java:60)
        at org.languagetool.tagging.disambiguation.pt.PortugueseHybridDisambiguator.disambiguate(PortugueseHybridDisambiguator.java:49)
        at org.languagetool.JLanguageTool.getAnalyzedSentence(JLanguageTool.java:769)
        at org.languagetool.MultiThreadedJLanguageTool$AnalyzeSentenceCallable.call(MultiThreadedJLanguageTool.java:236)
        at org.languagetool.MultiThreadedJLanguageTool$ParagraphEndAnalyzeSentenceCallable.call(MultiThreadedJLanguageTool.java:247)
        at org.languagetool.MultiThreadedJLanguageTool$ParagraphEndAnalyzeSentenceCallable.call(MultiThreadedJLanguageTool.java:240)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

tiagosantos · March 25, 2017, 2:04pm

I had 3 overlapping spelling rules related with New York (New York, New York Times and The New York Times). I am not sure if that is the meaning of Index: 3 Size: 3 in:

Caused by: java.lang.IndexOutOfBoundsException: Index: 3, Size: 3.

I have removed the redundancies.

Can this be the source of these problems (overlapping disambiguation rules)?

dnaber · March 25, 2017, 2:17pm

We’ll see tonight - if the diff is large again because all the removed matches have been added again, then that was the problem.

tiagosantos · March 25, 2017, 2:28pm

Many thanks for the prompt reply, Daniel.

tiagosantos · March 26, 2017, 8:06am

Seems like that is something else missing. Did the error report changed or is it still throwing the same exception?
Can I get these test logs with a local WikiCheck test, like the one described by you in the link below?

http://wiki.languagetool.org/re-run-nightly-wikipedia-tatoeba-tests

dnaber · March 26, 2017, 9:11am

Yes, still the same, but I’ve now added a workaround (fix?) so the issue shouldn’t occur again. I’ll now start the build and regression test manually.

tiagosantos · March 26, 2017, 9:23am

Many thanks Daniel!

SkyCharger001 · March 26, 2017, 9:27am

index 3? most arrays count from 0 not 1. (meaning that only indexes 0, 1 and 2 are valid)

tiagosantos · March 26, 2017, 11:13pm

It seems that it hasn’t solved yet the problem with regression tests, but I tested now the website and office extension and they are working properly. I am very busy with work related themes lately, so I haven’t done the WikiCheck test yet. Probably I will only be able to set it up (hopefully) next week.

dnaber · March 27, 2017, 7:19am

I think it did, see the email “LanguageTool nightly diff test” at 12:17 yesterday, it contains a huge diff again.

tiagosantos · March 27, 2017, 8:20am

If it is working, great! I have no problems with it. Apologies, but that e-mail did not arrive. I checked the SPAM folder on the webclient and there is nothing there either. Maybe some issue due to attachement size.
Anyway, from what I have seen in the Portuguese section all is good for release. I tested the wikipedia-tatoeba test file on LibreOffice and it doesn’t get stuck on checks. Probably tonight the regressions tests continue as usual, but I will not add anything but minor fixes today.