Back to LanguageTool Homepage - Privacy - Imprint

Antipattern with unification problem


(Konstantin Ladutenko) #1

I would like to extend existing rule with the antipattern and unification. The unification for verb tense is given here https://github.com/languagetool-org/languagetool/blob/master/languagetool-language-modules/ru/src/main/resources/org/languagetool/rules/ru/grammar.xml#L98 like:

    <unification feature="tense">
        <equivalence type="INF">
            <token postag=".*:INF(:.*)*" postag_regexp="yes"/>
        </equivalence>
        <equivalence type="IMP">
            <token postag=".*:IMP(:.*)*" postag_regexp="yes"/>
        </equivalence>
        <equivalence type="Fut">
            <token postag=".*:Fut(:.*)*" postag_regexp="yes"/>
        </equivalence>
        <equivalence type="Past">
            <token postag=".*:Past(:.*)*" postag_regexp="yes"/>
        </equivalence>
        <equivalence type="Real">
            <token postag=".*:Real(:.*)*" postag_regexp="yes"/>
        </equivalence>
    </unification>

I am working on rule <rule default="on" id="Verb_and_Verb" name="Глагол и глагол">
here https://github.com/languagetool-org/languagetool/blob/master/languagetool-language-modules/ru/src/main/resources/org/languagetool/rules/ru/grammar.xml#L5213

The problem is with the anipattern (commented at the moment here https://github.com/languagetool-org/languagetool/blob/master/languagetool-language-modules/ru/src/main/resources/org/languagetool/rules/ru/grammar.xml#L5329"

            <antipattern>
	            <unify>
		            <feature id="tense"/>
		            <marker>
			            <token postag="VB:.*" postag_regexp="yes"/>
			            <token postag="VB:.*" postag_regexp="yes"/>
		            </marker>
	            </unify>
            </antipattern>

It looks to be quite valid, however, it leads to an error

Running pattern rule tests for Russian... Exception in thread "main" java.lang.RuntimeException: Error analyzing sentence: '<S> Здесь[здесь/ADV,здесь/PRDC] будет[быть/VB:Fut:Sin:P3] строиться[строиться/VB:INF] всё[всё/ADV,весь/PADJ:Neut:Nom,весь/PADJ:Neut:V] больше[больше/ADV,больший/ADJ_Short:Neut,большой/ADJ_Sup] жилья[жильё/NN:Neut:Sin:R].[</S>]'
	at org.languagetool.rules.patterns.PatternRule.match(PatternRule.java:166)
	at org.languagetool.rules.patterns.PatternRuleTest.match(PatternRuleTest.java:492)
	at org.languagetool.rules.patterns.PatternRuleTest.testCorrectSentences(PatternRuleTest.java:470)
	at org.languagetool.rules.patterns.PatternRuleTest.testGrammarRulesFromXML(PatternRuleTest.java:272)
	at org.languagetool.rules.patterns.PatternRuleTest.runTestForLanguage(PatternRuleTest.java:198)
	at org.languagetool.rules.patterns.PatternRuleTest.runGrammarRulesFromXmlTestIgnoringLanguages(PatternRuleTest.java:149)
	at org.languagetool.rules.patterns.PatternRuleTest.main(PatternRuleTest.java:558)
Caused by: java.lang.NullPointerException
	at org.languagetool.rules.patterns.Unifier.isSatisfied(Unifier.java:130)
	at org.languagetool.rules.patterns.Unifier.isUnified(Unifier.java:418)
	at org.languagetool.rules.patterns.AbstractPatternRulePerformer.testUnificationAndGroups(AbstractPatternRulePerformer.java:154)
	at org.languagetool.rules.patterns.AbstractPatternRulePerformer.testAllReadings(AbstractPatternRulePerformer.java:95)
	at org.languagetool.tagging.disambiguation.rules.DisambiguationPatternRuleReplacer.replace(DisambiguationPatternRuleReplacer.java:94)
	at org.languagetool.tagging.disambiguation.rules.DisambiguationPatternRule.replace(DisambiguationPatternRule.java:101)
	at org.languagetool.rules.Rule.getSentenceWithImmunization(Rule.java:130)
	at org.languagetool.rules.patterns.PatternRule.match(PatternRule.java:162)
	... 6 more

if I replace this unified antipattern with a set of direcly composed antipatterns for each and every tense

            <antipattern>
		            <marker>
			            <token postag="VB:IMP:.*" postag_regexp="yes"/>
			            <token postag="VB:IMP:.*" postag_regexp="yes"/>
		            </marker>
            </antipattern>
            <antipattern>
		            <marker>
			            <token postag="VB:Fut:.*" postag_regexp="yes"/>
			            <token postag="VB:Fut:.*" postag_regexp="yes"/>
		            </marker>
            </antipattern>
            <antipattern>
		            <marker>
			            <token postag="VB:Real:.*" postag_regexp="yes"/>
			            <token postag="VB:Real:.*" postag_regexp="yes"/>
		            </marker>
            </antipattern>
            <antipattern>
		            <marker>
			            <token postag="VB:Past:.*" postag_regexp="yes"/>
			            <token postag="VB:Past:.*" postag_regexp="yes"/>
		            </marker>
            </antipattern>

everything goes OK.

Any ideas what I am doing wrong?


(Konstantin Ladutenko) #2

The previous example was for commit 452cd70e06b60130990bf7777b2f131783b611a8

Still the same problem can be found with the following antipattern for the same rule

            <antipattern>
	            <!-- TODO Should be fully unified!!! -->
 	            <!-- <unify> -->
		            <!-- <feature id="tense"/> -->
		            <!-- <feature id="number"/> -->
		            <!-- <feature id="person"/> -->
		            <marker>
			            <token postag="VB:.*" postag_regexp="yes"/>
			            <token inflected="yes" postag="VB:.*"
			                   postag_regexp="yes">давать</token>
		            </marker>
	            <!-- </unify> -->
            </antipattern>

when unification part is uncommented the error log is

Running XML pattern tests...
Known languages: [English, English (US), English (GB), English (Australian), English (Canadian), English (New Zealand), English (South African), Persian, French, German, German (Germany), German (Austria), German (Swiss), Simple German, Polish, Catalan, Catalan (Valencian), Italian, Breton, Dutch, Portuguese, Portuguese (Portugal), Portuguese (Brazil), Russian, Asturian, Belarusian, Chinese, Danish, Esperanto, Galician, Greek, Icelandic, Japanese, Khmer, Lithuanian, Malayalam, Romanian, Slovak, Slovenian, Spanish, Swedish, Tamil, Tagalog, Ukrainian, Testlanguage]
Running XML validation for ru/ru-RU/grammar.xml...
No rule file found at /org/languagetool/rules/ru/ru-RU/grammar.xml in classpath
Running pattern rule tests for Russian... Exception in thread "main" java.lang.RuntimeException: Error analyzing sentence: '<S> Здесь[здесь/ADV,здесь/PRDC] будет[быть/VB:Fut:Sin:P3] строиться[строиться/VB:INF] всё[всё/ADV,весь/PADJ:Neut:Nom,весь/PADJ:Neut:V] больше[больше/ADV,больший/ADJ_Short:Neut,большой/ADJ_Sup] жилья[жильё/NN:Neut:Sin:R].[</S>]'
	at org.languagetool.rules.patterns.PatternRule.match(PatternRule.java:166)
	at org.languagetool.rules.patterns.PatternRuleTest.match(PatternRuleTest.java:492)
	at org.languagetool.rules.patterns.PatternRuleTest.testCorrectSentences(PatternRuleTest.java:470)
	at org.languagetool.rules.patterns.PatternRuleTest.testGrammarRulesFromXML(PatternRuleTest.java:272)
	at org.languagetool.rules.patterns.PatternRuleTest.runTestForLanguage(PatternRuleTest.java:198)
	at org.languagetool.rules.patterns.PatternRuleTest.runGrammarRulesFromXmlTestIgnoringLanguages(PatternRuleTest.java:149)
	at org.languagetool.rules.patterns.PatternRuleTest.main(PatternRuleTest.java:558)
Caused by: java.lang.NullPointerException
	at org.languagetool.rules.patterns.Unifier.isSatisfied(Unifier.java:130)
	at org.languagetool.rules.patterns.Unifier.isUnified(Unifier.java:418)
	at org.languagetool.rules.patterns.AbstractPatternRulePerformer.testUnificationAndGroups(AbstractPatternRulePerformer.java:154)
	at org.languagetool.rules.patterns.AbstractPatternRulePerformer.testAllReadings(AbstractPatternRulePerformer.java:95)
	at org.languagetool.tagging.disambiguation.rules.DisambiguationPatternRuleReplacer.replace(DisambiguationPatternRuleReplacer.java:94)
	at org.languagetool.tagging.disambiguation.rules.DisambiguationPatternRule.replace(DisambiguationPatternRule.java:101)
	at org.languagetool.rules.Rule.getSentenceWithImmunization(Rule.java:130)
	at org.languagetool.rules.patterns.PatternRule.match(PatternRule.java:162)
	... 6 more

For this example the direct expansion is a real problem, as there are 5 tenses*3person*2number = 30 variants of antipattern....