Proposed grammar check for Italian

Hi, as a team of translators, we have started using LanguageTool to check our work.

One error that the tool misses (and which is caught by Wordʼs grammar corrector) is the wrong usage (due to a typo) of the preposition “dei/del” so that it does not adhere to the word following it.

So for instance we may write
Leader del prodotti (INCORRECT)
Leader del prodotto (CORRECT)
Leader dei prodotti (CORRECT)

I have tried to write a rule. It works OK on the online creation tool, but I am unable to make it work on the local Java tool. Here it is:

<!-- Italian rule, 2016-03-04 -->
<rule id="CONCORDANZA_PREPOSIZIONE_ARTICOLATA_MASCHILE_GENERE_PAROLA_SEGUENTE" name="Concordanza preposizione articolata maschile - genere parola seguente">
 <pattern case_sensitive='yes'>
  <marker>
  <token>del</token>
  <token regexp='yes'>.*i|[a-z].*a</token>
  </marker>
 </pattern>
 <message>La parola non concorda con la preposizione articolata che la precede</message>
 <example correction=''>Leader <marker>del prodotti</marker> per endpoint</example>
 <example>Leader dei prodotti per endpoint</example>
</rule>

Hi, I was able to properly insert this rule into my own grammar rules and it now works correctly. Now I hope it can be implemented as a standard rule (it would need some polishing though).

Thanks for your contribution! @Paolo_Bianchini will hopefully take care of this rule.

As there was no reply from Paolo, I just wanted to add this rule. There are some warnings (potential false alarms) when running this with Check a LanguageTool XML rule. Have you checked that? For example:

 Vogliamo parlare del fuorionda del presidente russo ?
 Video del proclama letto prima del suicidio
 Video del proclama letto prima del suicidio

If these are indeed errors, that’s okay. If these are false alarms, do you think you can improve the rule to avoid the false alarms?

Hi Daniel,

I have improved the rule to include more masculine articulated prepositions (del, al, dal, nel, sul). I have then added some exceptions relating to a few rare words that are masculine in Italian in spite of ending with an “a” (e.g. tema, problema, pianeta…).

The rule has passed the text without producing any false positives. I’m sure there are more words that could trigger a false positive, but they are, as I said, pretty rare.

Below is the rule:

                <!-- Italian rule, 2016-03-14 -->
<rule id="CONCORDANZA_PREPOSIZIONE_ARTICOLATA_MASCHILE_GENERE_PAROLA_SEGUENTE" name="Concordanza preposizione articolata maschile - genere parola seguente">
 <pattern case_sensitive='yes'>
  <token regexp='yes'>del|al|dal|nel|sul</token>
  <token regexp='yes'>[^A-Z].*i|[a-z].*a<exception>sistema</exception><exception>pianeta</exception><exception>programma</exception><exception>cinema</exception><exception>panorama</exception><exception>tema</exception><exception>patriarca</exception></token>
 </pattern>
 <message>Concordanza preposizione articolata maschile - genere parola seguente</message>
 <example correction=''>leader <marker>sul prodotti</marker></example>
 <example>leader sui prodotti</example>
</rule>

Thanks, I’ve added the rule, it will be part of the upcoming snapshots and it will be online at languagetool.org later tonight.

Hi Roberto,

our automatic regression check found some potential false alarms, maybe you can have a look. The rule id is CONCORDANZA_PREPOSIZIONE (I shortened it a bit):

https://languagetool.org/regression-tests/20160315/result_it_20160315.html

Hi Daniel,

as expected, the rule produces some false positives in case of masculine words ending with an “a”, with some foreign words and with some fairly unusual word combinations.

However, considering the high amount of false positives currently produced by LanguageTool for Italian (some of which, in my opinion, depend on a debatable interpretation of some grammar conventions), I think that this rule works quite well, in spite of having been put together without any real knowledge of the tool.

The only thing that could be done to improve it is adding more exceptions (words as “problema”, “tema” etc. that are masculine in spite of ending in “a”).

Regards,

Roberto Savelli

Hi Roberto, unfortunately not much is happening in Italian support in LT, so contributions (e.g. to fix false alarms) are very welcome.

Hi Daniel,

thanks for implementing the rule. Itʼs a pity that Italian does not get much support, since I think there is still ample space for improvement for the rules in this language. I will try to provide my feedback regularly.

Regards,

Roberto Savelli

Sorry guys, I did not have much time lately to work on the rules. Any help from whoever is interested is more than welcome.
In general the real problem is trying to add rules without generating too much false positives. Sometimes to fix an error in the rule you end up generating a lot of false alarms, therefore a balance needs to be found. Having said that, I took a look at the rule which looks good to me even if I don’t like that long list of exceptions. This might mean that there are many more and probably you might need to find an alternative way of inserting exceptions by using the pos tagger expression instead of explicitly indicating nouns.
Thanks a lot for you help.