Back to LanguageTool Homepage - Privacy - Imprint

Testing rules for new greek POS dictionary


(steve) #1

I am developing a POS dictionary for the greek language which I'm trying to integrate it in LanguageTool and use it, but I'm experiencing some problems creating rules using the postags for matching.

The test rule I'm trying to make is this:

<rule id="EpOus_SingPlu_Agreement" name="Ασυμφωνία αριθμού">    
            <pattern>
                <marker>
                <token postag="Ep.*En.*" postag_regexp="yes"></token>
                </marker>
                <token postag="OusPl.*" postag_regexp="yes"></token>
            </pattern>
            <message>Ασυμφωνία αριθμού ανάμεσα στο επίθετο και στο ουσιαστικό. <suggestion><match no="1" postag="EpArsPlOnom"/></suggestion></message>
            <short>Πρόβλημα συμφωνίας αριθμού</short>
            <example correction="Μεγάλα" type="incorrect"><marker>Μεγάλο</marker> προβλήματα.</example>
            <example type="correct">Μεγάλα προβλήματα</example>
        </rule>

Running sh testrules.sh el gives the following error:

Running pattern rule tests for Greek... Exception in thread "main" junit.framework.AssertionFailedError: Greek rule EpOus_SingPlu_Agreement /OusPl.*]:Ασυμφωνία αριθμού:
"Μεγάλο προβλήματα."
Errors expected: 1
Errors found : 0
Message: Ασυμφωνία αριθμού ανάμεσα στο επίθετο και στο ουσιαστικό. \1
Analyzed token readings: [/SENT_START*] Μεγάλο[EpArsEnAit/μεγάλος*,EpOuEnAit/μεγάλος*,EpOuEnKlit/μεγάλος*,EpOuEnOnom/μεγάλος*] [ /null*] προβλήματα[OusPlAit/πρόβλημα,OusPlKlit/πρόβλημα,OusPlOnom/πρόβλημα] .[./SENT_END*]
Matches: []
at junit.framework.Assert.fail(Assert.java:57)
at junit.framework.TestCase.fail(TestCase.java:227)
at org.languagetool.rules.patterns.PatternRuleTest.testBadSentences(PatternRuleTest.java:304)
at org.languagetool.rules.patterns.PatternRuleTest.testGrammarRulesFromXML(PatternRuleTest.java:252)
at org.languagetool.rules.patterns.PatternRuleTest.runTestForLanguage(PatternRuleTest.java:187)
at org.languagetool.rules.patterns.PatternRuleTest.runGrammarRulesFromXmlTestIgnoringLanguages(PatternRuleTest.java:144)
at org.languagetool.rules.patterns.PatternRuleTest.main(PatternRuleTest.java:521)

LanguageTool seems to tag each word correctly, but it doesn't find the error.

What I want to achieve in the end, is when the following tags are encountered: Ep.En. OusPl.* the first word will get replaced with the corresponding Ep.Pl. tagged word ("En" should be replaced with "Pl").

Any help is appreciated...


(Daniel Naber) #2

Thanks for your interest in LT. Indeed, your example looks good and I cannot tell what the problem is just by looking at the output. Do you have the code at github so I can check it out to try it?


(steve) #3

Code is at https://github.com/stevestavropoulos/elspell/tree/master/grammar/languagetool
I have only put there the files I changed and there is a README documenting what I did.

If there is anything more I could do to help, I would be happy to do it.


(pminos) #4

Hi Steve,

I have a question, do you plan to publish your data under an open license?

The Greek language module would benefit from the use of a POS Dictionary. I am also developing one, but yours seems to have more wordforms.

Regards,
Panagiotis Minos


(Daniel Naber) #5

The problem is that the data in languagetool.dict is in the wrong order. It needs to be fullform, baseform, POS-tag.


(steve) #6

Right! Thanks :smile:

On Wed, Jul 1, 2015 at 10:10 AM, dnaber [via LanguageTool User Forum]
ml-node+s2306527n4642957h49@n4.nabble.com wrote:

The problem is that the data in languagetool.dict is in the wrong order. It
needs to be fullform, baseform, POS-tag.


If you reply to this email, your message will be added to the discussion
below:
http://languagetool-user-forum.2306527.n4.nabble.com/Testing-rules-for-new-greek-POS-dictionary-tp4642953p4642957.html
To unsubscribe from Testing rules for new greek POS dictionary, click here.
NAML


(steve) #7

I'm willing to licence my data under any licence needed to be as widely
used as possible. I don't plan to put any restriction on it. I would like
to get mentioned if someone uses it, but not even that is a requirement...

If you are interested in collaborating, you can check out
https://github.com/stevestavropoulos/flexy which is the program used for
the creation of the POS dict and I would be happy to talk with you via
email. My email is steve at math . upatras . gr

On Jul 1, 2015 2:18 AM, "pminos [via LanguageTool User Forum]" <
ml-node+s2306527n4642956h18@n4.nabble.com> wrote:

Hi Steve,

I have a question, do you plan to publish your data under an open license?

The Greek language module would benefit from the use of a POS Dictionary.
I am also developing one, but yours seems to have more wordforms.

Regards,
Panagiotis Minos


If you reply to this email, your message will be added to the discussion
below:

http://languagetool-user-forum.2306527.n4.nabble.com/Testing-rules-for-new-greek-POS-dictionary-tp4642953p4642956.html
To unsubscribe from Testing rules for new greek POS dictionary, click
here
http://languagetool-user-forum.2306527.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4642953&code=c3RldmVAbWF0aC51cGF0cmFzLmdyfDQ2NDI5NTN8MTkzMTI1MjQ4MA==
.
NAML
http://languagetool-user-forum.2306527.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml


(Daniel Naber) #8

That's great! I hope Panagiotis will contact you about this (as it wouldn't make much sense for me to work on a language I don't speak).