[Solved (with workaround)] Antipattern is not working

Hi,

I need help with using antipattern feature. To test it I have created the rule, so it detects marker and I hope it is correct.

<rule id="test" name="test">
	<pattern>
		<token postag="ADJ:.*:T" postag_regexp='yes'>
			<exception postag="ADJ:.*:T"
			           postag_regexp='yes' negate_pos='yes'/>
		</token>
		<token postag="(NN|NNN):.*:Nom" postag_regexp='yes' />
	</pattern>
	<message>Test</message>
	<example correction="">Ставшие <marker>хрестоматийными слова</marker> «Природой здесь нам суждено в Европу прорубить окно», которые А. С. Пушкин вкладывает в уста Петра I — лишь риторически эффектная фраза.</example>
</rule>

The next step was to copie the token set to the antipattern of the real rule languagetool/grammar.xml at master · languagetool-org/languagetool · GitHub

However, it doesn`t work as expected. The same example (with removed correction and marker block) is commented out here languagetool/grammar.xml at master · languagetool-org/languagetool · GitHub

If I try to uncomment it I will get an error from ./testrules.sh ru

Running XML pattern tests...
Known languages: [English, English (US), English (GB), English (Australian), English (Canadian), English (New Zealand), English (South African), Persian, French, German, German (Germany), German (Austria), German (Swiss), Simple German, Polish, Catalan, Catalan (Valencian), Italian, Breton, Dutch, Portuguese, Portuguese (Portugal), Portuguese (Brazil), Russian, Asturian, Belarusian, Chinese, Danish, Esperanto, Galician, Greek, Icelandic, Japanese, Khmer, Lithuanian, Malayalam, Romanian, Slovak, Slovenian, Spanish, Swedish, Tamil, Tagalog, Ukrainian, Testlanguage]
Running XML validation for ru/ru-RU/grammar.xml...
No rule file found at /org/languagetool/rules/ru/ru-RU/grammar.xml in classpath
Running pattern rule tests for Russian... Exception in thread "main" java.lang.AssertionError: Russian: Did not expect error in:
  Ставшие хрестоматийными слова «Природой здесь нам суждено в Европу прорубить окно», которые А. С. Пушкин вкладывает в уста Петра I — лишь риторически эффектная фраза.
Matching Rule: Unify_Adj_NN_case[3]
	at org.junit.Assert.fail(Assert.java:88)
	at org.junit.Assert.assertTrue(Assert.java:41)
	at org.junit.Assert.assertFalse(Assert.java:64)
	at org.languagetool.rules.patterns.PatternRuleTest.testCorrectSentences(PatternRuleTest.java:472)
	at org.languagetool.rules.patterns.PatternRuleTest.testGrammarRulesFromXML(PatternRuleTest.java:272)
	at org.languagetool.rules.patterns.PatternRuleTest.runTestForLanguage(PatternRuleTest.java:198)
	at org.languagetool.rules.patterns.PatternRuleTest.runGrammarRulesFromXmlTestIgnoringLanguages(PatternRuleTest.java:149)
	at org.languagetool.rules.patterns.PatternRuleTest.main(PatternRuleTest.java:558)
Running disambiguator rule tests...
Running disambiguation tests for Russian...
25 rules tested (47ms)
Tests successful.
Running XML bitext pattern tests...
Running tests for Russian...
Tests successful.
Validating false-friends.xml...
Validation successfully finished.

Any ideas how to fix it? Or may be other proposals to add this two token exception to the role?

Best regards,
Konstantin Ladutenko

I’m sorry but your links lead to multiple antipatterns, and it’s difficult to say which was used. And it loads very, very slowly. Please copy and paste exactly your antipattern.

BTW: use markers in your antipatterns. There is a rare bug that I couldn’t fix, and without the marker tag, it may fail to work.

Best,
Marcin

Dear Marcin,

Thank you for you help, this is the full example, I have added the marker to antipatterns, and marked problematic antipattern and example with “begin/end problematic” comment.

	        <rule id="Unify_Adj_NN_case">
		        <antipattern> <!-- begin problematic antipattern -->
			        <marker>
				        <token postag="ADJ:.*:T" postag_regexp='yes'>
					        <exception postag="ADJ:.*:T"
					                   postag_regexp='yes' negate_pos='yes'/>
				        </token>
				        <token postag="(NN|NNN):.*:Nom" postag_regexp='yes' />
			        </marker>
		        </antipattern> <!-- end problematic antipattern -->
		        
		        <antipattern>
			        <marker>
				        <token inflected="yes" postag="ADJ:.*:Nom"
				               postag_regexp='yes'>cвойственный</token>
				        <token postag="(NN|NNN):.*:D" postag_regexp='yes' />
			        </marker>
		        </antipattern>
		        <antipattern>
			        <marker>
				        <token>,</token>
				        <token>а</token>
				        <token>позднее</token>
			        </marker>
		        </antipattern>
		        <pattern>
			        <marker>
				        <unify negate="yes">
					        <feature id="case"/>
					        <token postag="ADJ:.*|ADJ_Com:.*"
					               postag_regexp="yes">
						        <exception scope="previous"
						                   regexp="yes">,</exception>
						        <exception postag="PT:.*|ADV|NN:.*" postag_regexp="yes" />
						        <exception regexp="yes">полн.*|&nation;|&time;|выходн(ых|ые)|средн.*|арифметическ(ое|ие)|подобн.*|отличн.*|книжном|продуктовом|праздную|вещей|возможным|невозможным|собственн(ом|ый)|активн(ым|ыми)|сборной|большее|меньшее</exception>
						        <exception postag="ADJ:Masc:.*"
						                   postag_regexp="yes" regexp="yes">
							        &color;|мобильн.*
						        </exception>
						        <exception case_sensitive="yes">Божию</exception>
					        </token>
					        <token postag="NN:.*|NNN:.*" postag_regexp="yes">
						        <exception regexp="yes">друг|уже|уж|при|после|весь|мировой|молодой|куплю|байке|лажа|лазанью</exception>
						        <exception postag="VB:.*|CONJ|PREP|ADV|ADJ:.*" postag_regexp="yes" />
						        <!-- <and> -->
							    <!--     <exception scope="previous" -->
							    <!--            regexp="yes">&human;</exception> -->
							    <!--     <exception postag="(NN|NNN):.*:Nom" postag_regexp="yes" /> -->
						        <!-- </and> -->
<!-- 						        делала	делать VB:Past:Fem  -->
<!-- политически   -->
<!-- приемлемой	приемлемый  -->
<!-- ADJ:Fem:D  -->
<!-- ADJ:Fem:P  -->
<!-- ADJ:Fem:R  -->
<!-- ADJ:Fem:T  -->
<!-- отмену	отмена  -->
<!-- NN:Fem:Sin:V  -->
						        <!-- Слова без буквы ё имеют неправильный падеж -->
						        <exception regexp="yes">жилье|белье|щеткой|стерней|сырье</exception>
						        <exception case_sensitive="yes">Tом</exception>
						        <exception regexp="yes">&dual;</exception>
					        </token>
				        </unify>				        
			        </marker>
			        <or>
				        <token>
					        <exception regexp="yes">[\p{Punct}]</exception>
					        <exception postag="NN:.*" postag_regexp="yes"/>
					        <exception>щеткой</exception>
				        </token>
				        <token postag="SENT_END" />
			        </or>
		        </pattern>
		        <message>Прилагательное не согласуется с
		        существительным по падежу:
		        </message>
		        <!-- begin problematic example -->
		        <!-- <example>Ставшие хрестоматийными слова «Природой здесь нам суждено в Европу прорубить окно», которые А. С. Пушкин вкладывает в уста Петра I — лишь риторически эффектная фраза.</example> -->
		        <!-- end problematic example -->
		        <!-- <example correction="">Я терпеть не могу эту <marker>глупою женщину</marker>.</example> -->
		        <example>Среди них наибольшей известностью пользовались простые шутки.</example>
		        <example>Среди них большей известностью пользовались простые шутки.</example>
		        <example>Крепкие столы.</example>
		        <example>Крепкий стол.</example>
		        <example>Крепкая ограда.</example>
		        <example>Умные уже тут.</example>
		        <example>Мы чужие друг другу.</example>
		        <example>Белый стул.</example>
		        <example>Больных гриппом просим не входить.</example>
		        <example>Если загорелся зелёный машинам, то можно ехать.</example>
		        <example>Если загорелся зелёный кошке, то ехать нельзя.</example>
		        <example>Преподавать русский инострацам сложно.</example>
		        <example>Корзина была полная ягод.</example>
		        <example>Воспользуемся средним чисел а и б.</example>
		        <example>После выходных работник смотрелся не важно.</example>
		        <example correction=""><marker>Железная кроватью</marker>.</example>
                <example>Вдали виднелись красивые сосны.</example>
                <example correction="">Вдали виднелись <marker>красивой ёлка</marker>.</example>
                <example>Мебель сделана из красивой сосны.</example>
                <example>При императрице Екатерине Великой Россия успешно воевала с Турцией</example>
                <example>При Ярославе Мудром митрополит впервые был избран из числа русских священников</example>
                <example>В бассейне Нижней Таймыры есть такие объекты как Река Мамонта</example>

                <!-- <example correction="">Столы сделаны из <marker>красивых куски</marker> древесины.</example> -->
                <!-- <example correction="">Столы сделаны из <marker>красивых деревья</marker>.</example> -->

                <example>Был построен добротный дом.</example>
                <example correction="">Был построен <marker>добротный дому</marker>.</example>
                <example>Пушистая кошка грелась на солнышке.</example>
                <example correction="">Эта <marker>пушистая кошкой</marker> грелась.</example>
                <example>Вдали виднелось синее небо.</example>
                <example correction="">Вдали виднелось <marker>синее неба</marker>.</example>
                <example>Добротная усадьба.</example>
                <example correction=""><marker>Добротная усадьбе</marker>.</example>
                <example>Нет крепкого сна.</example>
                <example correction="">Нет <marker>крепкому сна</marker>.</example>
                <example>Вода стекала по мягкому жёлобу.</example>
                <example correction="">Вода стекала по <marker>мягкого жёлобу</marker>.</example>
                <example>Мы увидели бурную реку.</example>
                <example correction="">Мы увидели <marker>бурной реку</marker>.</example>
                <example>Они справились с бурной рекой</example>
                <example correction="">Они справились с <marker>бурную рекой</marker>.</example>
                <example>Они думали о родном доме.</example>
                <example correction="">Они думали о <marker>родного доме</marker>.</example>
</rule>

Apparently, you are right. There must be a bug somewhere.

Hard to say where… I’m afraid you need to run the debugger and set a breakpoint in the disambiguator – it’s used to immunize tokens locally, so check whether tokens are immunized by the rule 'Unify_Adj_NN_case". I would also try to modify the antipattern – remove the exception and see what happens.

I have simplified the rule, so it is possible to paste the directly into Check a LanguageTool XML rule for Russian language

#Rule 1

This one is just to test the pattern. It is correctly rendered with ruleEditor

<rule id="test" name="test">
	<pattern>
		<token postag="ADJ:.*:T" postag_regexp='yes'>
			<exception postag="ADJ:.*:T"
		   postag_regexp='yes' negate_pos='yes'/>
		</token>
		<token postag="(NN|NNN):.*:Nom" postag_regexp='yes' />
	</pattern>
	<message>Test</message>
	<example correction=""> <marker>хрестоматийными слова</marker>  «Природой </example>
</rule>

Rule 2

This one is to test antipattern behaviour. So I cope rule 1 pattern to be an antipattern here and a marked example to an unmarked one.

<rule id="Unify_Adj_NN_case">
	<antipattern> <!-- begin problematic antipattern -->
		<marker>
			<token postag="ADJ:.*:T" postag_regexp='yes'>
				<exception postag="ADJ:.*:T"
				           postag_regexp='yes' negate_pos='yes'/>
			</token>
			<token postag="(NN|NNN):.*:Nom" postag_regexp='yes' />
		</marker>
	</antipattern> <!-- end problematic antipattern -->
	<pattern>
		<marker>
	 	<token postag="ADJ:.*"	postag_regexp="yes"/>
		<token postag="NN:.*" postag_regexp="yes"/>
		</marker>
	</pattern>
	<message>test:
	</message>
	 <example> <marker>хрестоматийными слова</marker> «Природой </example> 
	<example correction=""> <marker>хрестоматийное слово</marker></example>
</rule>

This gives me an error:

There are problems with your rule:
The rule found an unexpected error in ' хрестоматийными слова «Природой '
Please make sure you selected the correct language for your rule - your selection was: Russian

The error will disappear if I use a typewriter quote " instead of French one «

So, this is the route of the problem. Do anyone can repair it in LT internals? (Sorry, Java is new for me and looks to be out of my professional scope)

Hm, this is really weird – the POS tags remain the same in both cases. Why should the quote change anything?

BTW, you can simplify the second regexp from the antipattern:

<token postag="NNN?:.*:Nom" postag_regexp='yes' />

This should be a little faster. Still, there must be some weird problem with the disambiguator. I’ll try to have a look.

Putting the example
хрестоматийными слова «Природой
into text analysys Text Analysis - LanguageTool
gives an expected set of POS tags with no disambiguations. I would expect that at least French is also effected with this issue :unamused: any other language that uses French quotes as a standard one…

The use of ? regexp interference with question sign found in natural language, so it is not used in Russian grammar.xml this way. I hope Java can optimize suboptimal regexps, without an explicit benchmarking there is no reason to make it harder for maintainers to read and write… The current performance of Russian xml looks to be quite enough for personal and server needs, I would propose to solve optimization problems as soon as it becomes noticeable…

Java does not really make any optimizations. The difference will not be huge but for me, this is really more difficult to read. Probably, this depends on how well-versed you are with regexes.

Anyway, I had a look on the test rule you left commented out in the file – it matches without any problem. You have an explicit requirement after the unification here:

<or> <token> <exception regexp="yes">[\p{Punct}]</exception> <exception postag="NN:.*" postag_regexp="yes"/> <exception>щеткой</exception> </token> <token postag="SENT_END" /> </or>

So, the normal quote differs from the « quotation mark. Quite obviously, " matches \p{Punct} (note: you don’t need brackets there, it’s already a class), while « does not. That’s the difference between the two cases you used.

I replaced the whole thing by the dummy:
<token>abc</token>

and then:

Железная кроватью abc

does match, and

хрестоматийными слова abc

does not. Which is nice, but still, I cannot understand why the antipattern does not work in another case…

OK, I found a workaround: just put the marker around the

<token postag="(NN|NNN):.*:Nom" postag_regexp='yes' />

So that the antipattern is like this:

<antipattern> <!-- begin problematic antipattern --> <token postag="ADJ:.*:T" postag_regexp='yes'> <exception postag="ADJ:.*:T" postag_regexp='yes' negate_pos='yes'/> </token> <marker> <token postag="NN.*:Nom" postag_regexp='yes' /> </marker> </antipattern> <!-- end problematic antipattern -->

This works fine. As I expected, there must be a problem somewhere in the Java code.

I went ahead and fixed the rule in two places. In general, the problem was twofold: first, unification was not matching properly because of the quotation mark « not in the regex in the exception, and second, because of some problems with too many things immunized…

Yes, I understand that \p{Punct} will match the " so in my simplified rule example ( which I referenced in my previous post as Rule 2) there are no exceptions in the pattern.

The workaround doesn`t seems to work:

It does work here:

Arghhh!!! :rage: It makes me crazy…

This is really strange, I mean we have exactly the same code (I have played with examples but all seem to work). Anyway, the rule I fixed in the grammar.xml passes the JUnit tests.

Please, try to add to the ruleCheck of your copy of Rule 2 the valid example
<example> хрестоматийными слова o природе </example>
it still doesn`t work for me…

It does work in this rule:

<rule id="bugbug"> <antipattern> <!-- begin problematic antipattern --> <token postag="ADJ:.*:T" postag_regexp='yes'> <exception postag="ADJ:.*:T" postag_regexp='yes' negate_pos='yes'/> </token> <marker> <token postag="NN.*:Nom" postag_regexp='yes' /> </marker> </antipattern> <!-- end problematic antipattern --> <pattern> <marker> <token postag="ADJ:.*" postag_regexp="yes"/> <token postag="NN:.*" postag_regexp="yes"/> </marker> </pattern> <message>test: </message> <example> <marker>хрестоматийными слова</marker> «Природой </example> <example> хрестоматийными слова o природе </example> <example correction=""> <marker>хрестоматийное слово</marker></example> </rule>

Please copy and paste and see what happens.

It is crazy, it works! This is my verbatim copy frome the ruleCheck
<rule id="Unify_Adj_NN_case"> <antipattern> <!-- begin problematic antipattern --> <token postag="ADJ:.*:T" postag_regexp='yes'> <exception postag="ADJ:.*:T" postag_regexp='yes' negate_pos='yes'/> </token> <marker> <token postag="NN.*:Nom" postag_regexp='yes' /> </marker> </antipattern> <!-- end problematic antipattern --> <pattern> <marker> <token postag="ADJ:.*" postag_regexp="yes"/> <token postag="NN:.*" postag_regexp="yes"/> </marker> </pattern> <message>test: </message> <example> <marker>хрестоматийными слова</marker> о природе </example> <example correction=""> <marker>хрестоматийное слово</marker></example> </rule>

if if remove any single symbol from “Unify_Adj_NN_case” it works…

I couldn’t believe, so I have run diff on our rules. There’s no difference except for the name and markers in examples. And I pasted your example. So really, it’s weird!

Oh, there may be a bug in the online interface, maybe it injects the rule into the existing Russian file and it already has a rule with this ID.