Help needed for a slightly complex rule

R_Savelli · April 8, 2016, 12:08pm

Hi, I am trying to write a rule that will catch discrepancies between articulated prepositions and nouns in Italian (for instance masculine followed by feminine or singular followed by plural). I have written a simple rule using the basic editor:

<!-- Italian rule, 2016-04-08 -->
<rule id="TEST" name="test">
 <pattern>
  <token postag='ARTPRE-M:p'></token>
  <token postag='NOUN-M:s'></token>
 </pattern>
 <message></message>
 <example correction=''><marker>ai sistema</marker></example>
 <example>al sistema</example>
</rule>

What I would like to achieve is to have the rule catch different wrong combinations, so:

<token postag='ARTPRE-M:p'></token>
<token postag='NOUN-M:s'></token>

OR

   <token postag='ARTPRE-M:s'></token>
   <token postag='NOUN-M:p'></token>

OR

<token postag='ARTPRE-F:p'></token>
<token postag='NOUN-F:s'></token>

etc.
I’m sure this is possible, but I don’t know how.

Jan_Schreiber · April 9, 2016, 3:04pm

That’s fairly easy. Just make your rule a rulegroup and add all the cases as sub-rules, like so:

<rulegroup id="TEST" name="test">
    <rule>
        <pattern>
            <token postag='ARTPRE-M:p'></token>
            <token postag='NOUN-M:s'></token>
        </pattern>
        <message>Forse cercavi <suggestion>al \2</suggestion>?</message>
        <example correction='al sistema'>PLEASE MAKE THIS <marker>ai sistema</marker> A FULL SENTENCE.</example>
        <example>al sistema</example>
    </rule>
    <rule>
        <pattern>
            <token postag='ARTPRE-M:s' />
            <token postag='NOUN-M:p' />
        </pattern>
        <message>Forse cercavi <suggestion>ai \2</suggestion>?</message>
        <example correction='ai xxx'><marker>ai xxx</marker></example>
        <example>ai xxx</example>
    </rule>
</rulegroup>

Basically, you could use the online rule creator to make a rule for each case you want to cover, copy them to a text file, remove the ids, and add <rulegroup id="TEST" name="test"> above and the closing tag </rulegroup> below.

EDIT: Corrected indentation. Thanks, Daniel.
Also, note that in the second pattern, I abbreviated the notation and made it <token postag='ARTPRE-M:s' /> I find that a lot easier to read and it is well-formed XML, but it’s a matter of taste.

dnaber · April 9, 2016, 3:30pm

Jan, you can use the </> button for markup (i.e. select the XML, click the button), it will then keep its indentation.

R_Savelli · April 12, 2016, 9:20am

Hi Jan,

thank you very much for your help. I think I now know how to use the rulegroup system. However, the “Check XML” function throws an error (even to the code you suggested):
Error: Found 4 rules in XML - please specify only one rule in your XML

Perhaps this happens because rulegroups are not supported by the XML check function, but how can I proceed from here? For instance for checking the rule against the corpus of sentences?

Thanks in advance for any help,

dnaber · April 12, 2016, 9:39am

Indeed, the online check cannot work on <rulegroup>, you’d have to test the <rule>s inside the <rulegroup> one by one if you want to use the online checker.

R_Savelli · April 12, 2016, 12:10pm

I have grouped the rules and analyzed them separately. Below is the complete rule, with a request for help at the end about avoiding a specific type of false positives:

<rulegroup id="ConcordanzaPreposizioneArticolata-Sostantivo" name="Concordanza preposizione articolata - sostantivo">
<rule>
    <pattern>
      <token postag='ARTPRE-M:p'></token>
      <token postag='NOUN-M:s'></token>
    </pattern>
    <message>La preposizione articolata non concorda con il sostantivo che la segue</message>
    <example type="correct">Il libro <marker>degli amici</marker>.</example>
    <example type="incorrect">Il libro <marker>degli amico</marker></example>
</rule>
<rule>
<pattern>
  <token postag='ARTPRE-M:s'></token>
  <token postag='NOUN-M:p'></token>
</pattern>
<message>La preposizione articolata non concorda con il sostantivo che la segue</message>
<example type="correct">Il libro <marker>dell'amico amici</marker>.</example>
<example type="incorrect">Il libro <marker>dello amici</marker></example>
</rule>
<rule>
<rule>
<pattern>
  <token postag='ARTPRE-F:p'></token>
  <token postag='NOUN-F:s'></token>
</pattern>
<message>La preposizione articolata non concorda con il sostantivo che la segue</message>
<example type="correct">Il libro <marker>dell'amica</marker>.</example>
<example type="incorrect">Il libro <marker>delle amica</marker></example>
</rule>
<rule>
<pattern>
  <token postag='ARTPRE-F:s'></token>
  <token postag='NOUN-F:p'></token>
</pattern>
<message>La preposizione articolata non concorda con il sostantivo che la segue</message>
<example type="correct">Il libro <marker>dell'amica</marker>.</example>
<example type="incorrect">Il libro <marker>della amiche</marker></example>
</rule>
</rulegroup>

In the rules above, I am getting a few false positives. Most of them relate to foreign words, which are unchanged in their singular and plural forms. For instance, according to the anlysis, “cinema” is both NOUN-M:s and NOUN-M:p in Italian. Can you suggest a procedure that will allow the rule to ignore nouns that are identical in their singular and plural forms?

dnaber · April 12, 2016, 5:45pm

I haven’t tested it, but you should be able to match words that are e.g. NOUN-M:s but not NOUN-M:p at the same time like this:

<token postag="NOUN-M:s"><exception postag="NOUN-M:p"/></token>

R_Savelli · April 18, 2016, 1:05pm

Hi Daniel,

thank you very much for your suggestion. I am still not familiar with the exception command, but I have added it to my rules and they now seem to work. I am pasting the whole rule below. Note that Microsoft Wordʼs grammar check offers this rule as standard, so I think it makes sense to add it to LanguageTool:

<rulegroup id="ConcordanzaPreposizioneArticolata-Sostantivo" name="Concordanza preposizione articolata - sostantivo">
<rule>
    <pattern>
      <token postag='ARTPRE-M:p'></token>
      <token postag='NOUN-M:s'><exception postag="NOUN-M:p"/></token>
    </pattern>
    <message>La preposizione articolata non concorda con il sostantivo che la segue</message>
    <example type="correct">Il libro <marker>degli amici</marker>.</example>
    <example type="incorrect">Il libro <marker>degli amico</marker></example>
</rule>
<rule>
    <pattern>
      <token postag='ARTPRE-M:s'></token>
      <token postag='NOUN-M:p'><exception postag="NOUN-M:s"/></token>
    </pattern>
    <message>La preposizione articolata non concorda con il sostantivo che la segue</message>
    <example type="correct">Il libro <marker>dell'amico amici</marker>.</example>
    <example type="incorrect">Il libro <marker>dello amici</marker></example>
</rule>
<rule>
<rule>
    <pattern>
      <token postag='ARTPRE-F:p'></token>
      <token postag='NOUN-F:s'><exception postag="NOUN-F:p"/></token>
    </pattern>
    <message>La preposizione articolata non concorda con il sostantivo che la segue</message>
    <example type="correct">Il libro <marker>dell'amica</marker>.</example>
    <example type="incorrect">Il libro <marker>delle amica</marker></example>
</rule>
<rule>
    <pattern>
      <token postag='ARTPRE-F:s'></token>
      <token postag='NOUN-F:p'><exception postag="NOUN-F:s"/></token>
    </pattern>
    <message>La preposizione articolata non concorda con il sostantivo che la segue</message>
    <example type="correct">Il libro <marker>dell'amica</marker>.</example>
    <example type="incorrect">Il libro <marker>della amiche</marker></example>
</rule>
</rulegroup>

dnaber · April 18, 2016, 3:31pm

Hi Roberto, thanks for the rules. I have added them to the “Grammatica - Preposizioni” category, please let me know if that doesn’t make sense. They will become available on languagetool.org later today at around 22:15 CEST.