[eo] Proposed rule

MBodin · August 14, 2017, 2:59pm

Hi.
I’m new to LanguageTool, but I do appreciate the page LanguageTool - Online Grammar, Style & Spell Checker for the Esperanto language. However, there seems to be mistakes that are not caught by the checker. I have tried to create the following rule using Create a new LanguageTool rule, then manually updating it using the expert mode. The checker said that the unit tests pass, and that it did not trigger false alarms on Wikipedia and Tatoeba. Great! Is this the right place to share it to add it to the tool?

<!-- Esperanto rule, 2017-08-14, by Martin Bodin -->
<rule id="AKUZATIVA_TIKORELATIVO_RILATA_AL_NEAKUZATIVA_SUBSTANTIVO" name="Akuzativan “Ti”-korelativo rilata al neakuzativa substantivo.">
 <pattern>
  <token postag='V.*tr.*' postag_regexp='yes'></token>
  <token postag='T.*akz.*' postag_regexp='yes'></token>
  <marker>
  <token postag='O.*nak.*' postag_regexp='yes'></token>
  </marker>
 </pattern>
 <message>Aspektas ke ‘<match no="2"/>’ rilatas al ‘<match no="3"/>’, sed ‘<match no="3"/>’ ne akuzativas: probable devintus esti ‘<suggestion><match no="3"/>n</suggestion>’.</message>
 <short>Mankas akuzativo</short>
 <example correction='frazon'>Mi ne ŝatas tiun <marker>frazo</marker>.</example>
 <example>Mi ne ŝatas tiun frazon.</example>
 <example correction='ideojn'>Kiu apogas tiajn <marker>ideoj</marker>?</example>
 <example>Kiu apogas tiajn ideojn?</example>
</rule>

I also wonder whether this is the right way to write a rule. For instance, I have writen the “postag=‘V.tr.’ postag_regexp=‘yes’” part manually, as the generator generated “postag=‘V’ posttag='tr”, which did not pass the XML checks (by the way, it then displays an error in German: is this wanted?). I would like to have your feedback about this rule before writing other similar rules.

Best,
Martin.

MBodin · August 14, 2017, 3:11pm

By the way, what is the best language I should use to propose such an entry in this forum? English or Esperanto?

Nun ke mi pensas prie, per kiu lingvo mi devintus proponi tian regulon en tiu ĉi forumo? Ĉu la angla aŭ Esperanto?

Martin.

Jan_Schreiber · August 14, 2017, 5:17pm

Many thanks for the rule and welcome to the forum!

Certainly not! Can you tell us the error message?

Please use English if possible. I’m not even sure our Esperanto contributors visit this forum on a regular basis.

MBodin · August 14, 2017, 5:32pm

Thank you for the answer!

I will answer for your question about the error message in a separate reply: I am now trying to actually make the pull request on Github for LanguageTool
Here is my pull request: [eo] Additional rules. by Mbodin · Pull Request #773 · languagetool-org/languagetool · GitHub

Interestingly, the sanity checks do not pass in https://community.languagetool.org/ruleEditor/expert… by showing an erronous sentence! I think that this proves my rule to be useful, as it as helped detect an mistake in Wikipedia or Tatoeba. The sentence is “Meti tiu libron en la poŝon ne eblas”, and should be “Meti tiun libron en la poŝon ne eblas”

I thus have to look for this sentence in Wikipedia and Tatobea now
Martin.

MBodin · August 14, 2017, 5:33pm

To help understand the issue, here is the offending rule:

    <rule id="NEAKUZATIVA_TIKORELATIVO_RILATA_AL_AKUZATIVA_SUBSTANTIVO" name="Neakuzativa “Ti”-korelativo rilata al akuzativa substantivo.">
        <pattern>
            <token postag='V.* tr.*' postag_regexp='yes'></token>
            <marker>
                <token postag='T.* nak.*' postag_regexp='yes'></token>
            </marker>
            <token postag='O.* akz.*' postag_regexp='yes'></token>
        </pattern>
        <message>Aspektas ke ‘<match no="2"/>’ rilatas al ‘<match no="3"/>’, sed ‘<match no="2"/>’ ne akuzativas: probable devintus esti ‘<suggestion><match no="2"/>n</suggestion>’.</message>
        <short>Mankas akuzativo</short>
        <example correction='tiun'>Mi ne ŝatas <marker>tiu</marker> frazon.</example>
        <example>Mi ne ŝatas tiun frazon.</example>
        <example correction='tiajn'>Kiu apogas <marker>tiaj</marker> ideojn?</example>
        <example>Kiu apogas tiajn ideojn?</example>
    </rule>

Martin.

MBodin · August 14, 2017, 5:46pm

The good news is that the offending sentence (Tatoeba’s sentence number 1450387 — I am not authorised to send the link here for some reasons) has already been corrected. But this means that the page https community languagetool org ruleEditor expert (sorry, I am not allowed to send the link for some reasons) is not up to date with Tatoeba This is frustrating…

@Jan_Schreiber: Sorry for the multiposting. Here is the error: « Error: XML validation failed: org.xml.sax.SAXParseException; lineNumber: 4; columnNumber: 34; Attribut “postag” wurde bereits für Element “token” angegeben. ». I indeed have German in my HTTP_HEADER field, but it is in a very low priority (Spanish, Esperanto, English, French, and Portuguese have higher priority than German in it…), so it is unlikely that the sentence has not been translated in any of the other languages with higher priority.

Here is how to reproduce. Go to https community languagetool org ruleEditor2 index (still not able to post a link… this is frustrating) and create a simple rule. This rule should have at least one token of the form “Part-of-speech” with more than one item in it. For instance “O akz” (a noun in the accusative form). This is translated in XML by “<token postag='O' postag='akz'></token>”, which does not validate.
A valid XML would be “<token postag='O akz'></token>”.

So there are two errors there: first in the XML generator (which should generate valid XML files when possible), second, in the XML checker, which should probably not display an error in German when not asked.

Hoping that it can help.
Martin.

P.S.: The build from Travis just finished and was accepted. So I guess that it really just is the database of the web interface which is not up to date.

Jan_Schreiber · August 14, 2017, 6:16pm

This is almost certainly a bug. @Knorr, can you fix this? I can’t.

dnaber · August 14, 2017, 6:50pm

Well, it’s not really supposed to be up-to-date, as we only use Tatoeba as a test corpus. In other words, we’re testing LT, not Tatoeba. The same is true for the Wikipedia data.