I am developing a POS dictionary for the greek language which I’m trying to integrate it in LanguageTool and use it, but I’m experiencing some problems creating rules using the postags for matching.
The test rule I’m trying to make is this:
<rule id="EpOus_SingPlu_Agreement" name="Ασυμφωνία αριθμού">
<pattern>
<marker>
<token postag="Ep.*En.*" postag_regexp="yes"></token>
</marker>
<token postag="OusPl.*" postag_regexp="yes"></token>
</pattern>
<message>Ασυμφωνία αριθμού ανάμεσα στο επίθετο και στο ουσιαστικό. <suggestion><match no="1" postag="EpArsPlOnom"/></suggestion></message>
<short>Πρόβλημα συμφωνίας αριθμού</short>
<example correction="Μεγάλα" type="incorrect"><marker>Μεγάλο</marker> προβλήματα.</example>
<example type="correct">Μεγάλα προβλήματα</example>
</rule>
Running sh testrules.sh el gives the following error:
Running pattern rule tests for Greek… Exception in thread “main” junit.framework.AssertionFailedError: Greek rule EpOus_SingPlu_Agreement[1]:[/Ep.En., /OusPl.]:Ασυμφωνία αριθμού:
“Μεγάλο προβλήματα.”
Errors expected: 1
Errors found : 0
Message: Ασυμφωνία αριθμού ανάμεσα στο επίθετο και στο ουσιαστικό. \1
Analyzed token readings: [/SENT_START] Μεγάλο[EpArsEnAit/μεγάλος*,EpOuEnAit/μεγάλος*,EpOuEnKlit/μεγάλος*,EpOuEnOnom/μεγάλος*] [ /null*] προβλήματα[OusPlAit/πρόβλημα,OusPlKlit/πρόβλημα,OusPlOnom/πρόβλημα] .[./SENT_END*]
Matches: []
at junit.framework.Assert.fail(Assert.java:57)
at junit.framework.TestCase.fail(TestCase.java:227)
at org.languagetool.rules.patterns.PatternRuleTest.testBadSentences(PatternRuleTest.java:304)
at org.languagetool.rules.patterns.PatternRuleTest.testGrammarRulesFromXML(PatternRuleTest.java:252)
at org.languagetool.rules.patterns.PatternRuleTest.runTestForLanguage(PatternRuleTest.java:187)
at org.languagetool.rules.patterns.PatternRuleTest.runGrammarRulesFromXmlTestIgnoringLanguages(PatternRuleTest.java:144)
at org.languagetool.rules.patterns.PatternRuleTest.main(PatternRuleTest.java:521)
LanguageTool seems to tag each word correctly, but it doesn’t find the error.
What I want to achieve in the end, is when the following tags are encountered: Ep.En. OusPl.* the first word will get replaced with the corresponding Ep.Pl. tagged word (“En” should be replaced with “Pl”).
Thanks for your interest in LT. Indeed, your example looks good and I cannot tell what the problem is just by looking at the output. Do you have the code at github so I can check it out to try it?
I’m willing to licence my data under any licence needed to be as widely
used as possible. I don’t plan to put any restriction on it. I would like
to get mentioned if someone uses it, but not even that is a requirement…