European Portuguese (PT-PT) rule contributions

So far there is no topic to debate and gather contributions to the European Portuguese language.
I hope this can help keep the forum posts more organized.

<!-- Concordance error plural - AS > A --> <!-- Created by Tiago F. Santos, Portuguese rule, 2016-10-15 --> <rule id="ERRO_DE_CONCORDNCIA_DO_FEMININO_PLURAL_AS_A" name="Erro de concordância do feminino singular"> <pattern> <marker> <token postag='D[AI]0FP0|NCFP000|AQ0FP0' postag_regexp='yes'> <exception postag='CC|CS|RG|RN|SPS00' postag_regexp='yes'></exception></token> <token postag='NCFS000|AQ0FS0' postag_regexp='yes'> <exception postag='P[ID][0123][CFM][SP]000|CC|CS|RG|RN|SPS00' postag_regexp='yes'></exception></token> </marker> </pattern> <message>Erro de concordância do plural. <suggestion><match no="1" postag="(D[AI]0FS0|NCFS000|AQ0FS0)" postag_regexp="yes"/> <match no="2"/></suggestion> ou <suggestion><match no="1"/> <match no="2" postag="(NCFP000|AQ0FP0)" postag_regexp="yes"/></suggestion>. </message> <example correction='As vacas'><marker>As vaca</marker> são malhadas.</example> </rule>

<!-- Concordance error plural - A > AS --> <!-- Created by Tiago F. Santos, Portuguese rule, 2016-10-15 --> <rule id="ERRO_DE_CONCORDNCIA_DO_FEMININO_PLURAL_A_AS" name="Erro de concordância do feminino plural"> <pattern> <marker> <token postag='D[AI]0FS0|NCFS000|AQ0FS0' postag_regexp='yes'> <exception postag='CC|CS|RG|RN' postag_regexp='yes'></exception></token> <token postag='NCFP000|AQ0FP0' postag_regexp='yes'> <exception postag='P[ID][0123][CFM][SP]000|CC|CS|RG|RN|SPS00' postag_regexp='yes'></exception></token> </marker> </pattern> <message>Erro de concordância do plural: <suggestion><match no="1" postag="(D[AI]0FP0|NCFP000|AQ0FP0)" postag_regexp="yes"/> <match no="2"/></suggestion> ou <suggestion><match no="1"/> <match no="2" postag="(NCFS000|AQ0FS0)" postag_regexp="yes"/></suggestion>. </message> <example correction='As vacas'><marker>A vacas</marker> são malhadas.</example> </rule>

<!-- Concordance error plural - OS > O --> <!-- Created by Tiago F. Santos, Portuguese rule, 2016-10-15 --> <rule id="ERRO_DE_CONCORDNCIA_DO_MASCULINO_PLURAL_OS_O" name="Erro de concordância do masculino plural"> <pattern> <marker> <token postag='D[AI]0MP0|NCMP000|AQ0MP0' postag_regexp='yes'> <exception postag='CC|CS|RG|RN|SPS00' postag_regexp='yes'></exception></token> <token postag='NCMS000|AQ0MS0' postag_regexp='yes'> <exception postag='P[ID][0123][CFM][SP]000|CC|CS|RG|RN|SPS00' postag_regexp='yes'></exception></token> </marker> </pattern> <message>Erro de concordância do plural. <suggestion><match no="1" postag="(D[AI]0MS0|NCMS000|AQ0MS0)" postag_regexp="yes"/> <match no="2"/></suggestion> ou <suggestion><match no="1"/> <match no="2" postag="(NCMP000|AQ0MP0)" postag_regexp="yes"/></suggestion>. </message> <example correction='O cão'><marker>Os cão</marker> está no pasto.</example> </rule>

<!-- Concordance error plural - O > OS --> <!-- Created by Tiago F. Santos, Portuguese rule, 2016-10-15 --> <rule id="ERRO_DE_CONCORDNCIA_DO_PLURAL_O_OS" name="Erro de concordância do plural O-OS"> <pattern> <marker> <token postag='D[AI]0MS0|NCMS000|AQ0MS0' postag_regexp='yes'> <exception postag='CC|CS|RG|RN|SPS00' postag_regexp='yes'></exception></token> <token postag='NCMP000|AQ0MP0' postag_regexp='yes'> <exception postag='P[ID][0123][CFM][SP]000|CC|CS|RG|RN|SPS00' postag_regexp='yes'></exception></token> </marker> </pattern> <message>Erro de concordância do plural. <suggestion><match no="1" postag="(D[AI]0MP0|NCMP000|AQ0MP0)" postag_regexp="yes"/> <match no="2"/></suggestion> ou <suggestion><match no="1"/> <match no="2" postag="(NCMS000|AQ0MS0)" postag_regexp="yes"/></suggestion>. </message> <example correction='Os cães'><marker>O cães</marker> estão no pasto.</example> </rule>

I included the credits as a comment on the rules since I have not seen contributer credits in any other part of the file.
Please advise if there is a more streamlined way of doing it.

@tiagosantos

The rules give an error while testing them:

    <!-- Concordance error plural - AS > A -->
    <!-- Created by Tiago F. Santos, Portuguese rule, 2016-10-15 -->
    <rule id="ERRO_DE_CONCORDANCIA_DO_FEMININO_PLURAL-AS_A" name="Erro de concordância do feminino singular - AS > A">
    <pattern>
        <marker>
            <token postag='D[AI]0FP0|NCFP000|AQ0FP0' postag_regexp='yes'>
            <exception postag='CC|CS|RG|RN|SPS00' postag_regexp='yes'></exception></token>
            <token postag='NCFS000|AQ0FS0' postag_regexp='yes'>
            <exception postag='P[ID][0123][CFM][SP]000|CC|CS|RG|RN|SPS00' postag_regexp='yes'></exception></token>
        </marker>
    </pattern>
    <message>Erro de concordância do plural.
    <suggestion><match no="1" postag="(D[AI]0FS0|NCFS000|AQ0FS0)" postag_regexp="yes"/> <match no="2"/></suggestion> ou <suggestion><match no="1"/> <match no="2" postag="(NCFP000|AQ0FP0)" postag_regexp="yes"/></suggestion>.
    </message>
    <example correction=''><marker>As vaca</marker> são malhadas.</example>
    </rule>
    
    
    
    <!-- Concordance error plural - A > AS -->
    <!-- Created by Tiago F. Santos, Portuguese rule, 2016-10-15 -->
    <rule id="ERRO_DE_CONCORDANCIA_DO_FEMININO_PLURAL-A_AS" name="Erro de concordância do feminino plural - A > AS">
    <pattern>
        <marker>
            <token postag='D[AI]0FS0|NCFS000|AQ0FS0' postag_regexp='yes'>
            <exception postag='CC|CS|RG|RN' postag_regexp='yes'></exception></token>
            <token postag='NCFP000|AQ0FP0' postag_regexp='yes'>
            <exception postag='P[ID][0123][CFM][SP]000|CC|CS|RG|RN|SPS00' postag_regexp='yes'></exception></token>
        </marker>
    </pattern>
    <message>Erro de concordância do plural:
    <suggestion><match no="1" postag="(D[AI]0FP0|NCFP000|AQ0FP0)" postag_regexp="yes"/> <match no="2"/></suggestion> ou <suggestion><match no="1"/> <match no="2" postag="(NCFS000|AQ0FS0)" postag_regexp="yes"/></suggestion>.
    </message>
    <example correction=''><marker>A vacas</marker> são malhadas.</example>
    </rule>     
    
    
    
    <!-- Concordance error plural - OS > O -->
    <!-- Created by Tiago F. Santos, Portuguese rule, 2016-10-15 -->
    <rule id="ERRO_DE_CONCORDANCIA_DO_MASCULINO_PLURAL-OS_O" name="Erro de concordância do masculino plural - OS > O">
    <pattern>
        <marker>
            <token postag='D[AI]0MP0|NCMP000|AQ0MP0' postag_regexp='yes'>
            <exception postag='CC|CS|RG|RN|SPS00' postag_regexp='yes'></exception></token>
            <token postag='NCMS000|AQ0MS0' postag_regexp='yes'>
            <exception postag='P[ID][0123][CFM][SP]000|CC|CS|RG|RN|SPS00' postag_regexp='yes'></exception></token>
        </marker>
    </pattern>
    <message>Erro de concordância do plural.
    <suggestion><match no="1" postag="(D[AI]0MS0|NCMS000|AQ0MS0)" postag_regexp="yes"/> <match no="2"/></suggestion> ou <suggestion><match no="1"/> <match no="2" postag="(NCMP000|AQ0MP0)" postag_regexp="yes"/></suggestion>.
    </message>
    <example correction=''><marker>Os cão</marker> está no pasto.</example>
    </rule>    
    
    

    <!-- Concordance error plural - O > OS -->
    <!-- Created by Tiago F. Santos, Portuguese rule, 2016-10-15 -->
    <rule id="ERRO_DE_CONCORDANCIA_DO_PLURAL-O_OS" name="Erro de concordância do plural - O > OS">
    <pattern>
        <marker>
            <token postag='D[AI]0MS0|NCMS000|AQ0MS0' postag_regexp='yes'>
            <exception postag='CC|CS|RG|RN|SPS00' postag_regexp='yes'></exception></token>
            <token postag='NCMP000|AQ0MP0' postag_regexp='yes'>
            <exception postag='P[ID][0123][CFM][SP]000|CC|CS|RG|RN|SPS00' postag_regexp='yes'></exception></token>
        </marker>
    </pattern>
    <message>Erro de concordância do plural.
    <suggestion><match no="1" postag="(D[AI]0MP0|NCMP000|AQ0MP0)" postag_regexp="yes"/> <match no="2"/></suggestion> ou <suggestion><match no="1"/> <match no="2" postag="(NCMS000|AQ0MS0)" postag_regexp="yes"/></suggestion>.
    </message>
    <example correction=''><marker>O cães</marker> estão no pasto.</example>
    </rule>     

To verify it, download:
LanguageTool-20161017-snapshot.zip

Unzip it, and copy the grammar.xml with the rules above to:
\org\languagetool\rules\pt

Then, type: TESTRULES PT and it will give an error.

Could you have a look at it?

Thanks!

PS-> Please make changes to the embedded rules in this message since I changed the name slightly to be easier to find in the XML.

The rules that the rule editor generates need slight adaption to work: the correction attribute is never set, it needs to be set manually if there is at least one suggestion.

@dnaber

I thought the example parameter was irrelevant. I will add the correction straight away. Many thanks!

@marcoagpinto
I have done the rules, starting on the editor but changed them manually in XML as code. I tested all of them on my local copy (Languagetools 3.5 as a LibreOffice extension) along with many other rules (23 new rules in total).

Moments ago, after reading about the issue in Marco’s reply, I also tested them on the rules editor, with “Parse existing XML” option in [Rules editor] ago, after reading about the issue in Marco’s reply, I also tested them on the rules editor, with “Parse existing XML” option in Rules editor

All of them work fine in my copy, without ‘borking’ LanguageTools in any way, but I will add the correction parameter for automated rule testing.

I noticed while testing on the Rules editor, that the parsed rule output is different from the rule input.

For example, it removes correction parameter and it replaced in suggestions <match no="#" ...> with \#.

@tiagosantos

It still gives errors :frowning: when I type TESTRULES PT

@tiagosantos and guys,

You can check the file from my Dropbox:
https://dl.dropboxusercontent.com/u/30674540/grammar_v1_184.zip

using TESTRULES PT following the procedure I mentioned above.

@dnaber
Automated testing is very often inadequate. Apparently these are good examples. Another example that does not require Portuguese language skills (and can be ported to other languages):

<!-- DOUBLE FINAL STOP --> <!-- Created by Tiago F. Santos, Portuguese rule, 2016-10-17 --> <rule id="DOUBLE_FINAL_STOP" name="Ponto Final Duplo"> <regexp type="exact">[.]</regexp> <regexp type="exact">[.]</regexp> <message>Pontuação duplicada? <suggestion>.</suggestion> ou <suggestion>...</suggestion></message> <example>A situação financeira era estável, e a fazenda garantia um bom rendimento<marker>..</marker></example> </rule>

This rule offers the option of fixing duplicated final stops with ‘.’ or ‘…’.
It works flawlessly. I believe that Daniel and other are able to understand all that it does.

@marcoagpinto
Those rules have complex regex rules for suggestion. That is also why the Rules Editor XML Parser clips some parts of the rule. Some of my rules do not even parse on the XML Editor (like the example above), though they work on the local copy.

It still gives errors :frowning: when I type TESTRULES PT

Please, read the code and test the rules as I have mentioned. It is less time consuming for everybody to read and try to understand the code, copy-paste to XML editor for checking helps a bit and test in LanguageTools standalone ou LibreOffice.

Notice that my code posts have been edited with Daniels advice.

@dnaber
The only this I have not checked is file encoding. Can it be UTF-8 or does the testing tool require another encoding format?

@Yakov

Once again, sorry to disturb you, but you are one of the greatest experts in the subject.

Could you make the rules Tiago posted to work in TESTRULES PT without warnings?

Thanks!

Sorry to ask for your help all the time.

When I paste the “DOUBLE FINAL STOP” into the expert mode of the XML editor, I get this error: Invalid content was found starting with element 'regexp'. One of '{filter, message}' is expected. That’s because there can only be one regexp element per rule. However, when I fix that, a bug in the online rule editor shows up. The non-expert mode is really just to get people started with rule writing. As soon as you know the basics, the best approach is to edit grammar.xml directly and then run testrule.sh (Linux) or testrules.bat (Windows). This will check both XML syntax and the example sentences.

This file contain 2 problems:

  1. duplicate name of rule “ERRO_DE_CONCORDNCIA_DO_FEMININO_PLURAL”
  2. incorrect correction in some rules.
`	
<!-- Concordance error plural - AS > A -->
<!-- Created by Tiago F. Santos, Portuguese rule, 2016-10-15 -->
<rule id="ERRO_DE_CONCORDNCIA_DO_FEMININO_PLURAL1" name="Erro de concordância do feminino singular">
<pattern>
<marker>
<token postag='D[AI]0FP0|NCFP000|AQ0FP0' postag_regexp='yes'>
<exception postag='CC|CS|RG|RN|SPS00' postag_regexp='yes'></exception></token>
<token postag='NCFS000|AQ0FS0' postag_regexp='yes'>
<exception postag='P[ID][0123][CFM][SP]000|CC|CS|RG|RN|SPS00' postag_regexp='yes'></exception></token>
</marker>
</pattern>
<message>Erro de concordância do plural.
<suggestion><match no="1" postag="(D[AI]0FS0|NCFS000|AQ0FS0)" postag_regexp="yes"/> <match no="2"/></suggestion> ou <suggestion><match no="1"/> <match no="2" postag="(NCFP000|AQ0FP0)" postag_regexp="yes"/></suggestion>.
</message>
<example correction='A vaca|As vacas'><marker>As vaca</marker> são malhadas.</example>
</rule>
	
	
	
<!-- Concordance error plural - A > AS -->
<!-- Created by Tiago F. Santos, Portuguese rule, 2016-10-15 -->
<rule id="ERRO_DE_CONCORDNCIA_DO_FEMININO_PLURAL" name="Erro de concordância do feminino plural">
<pattern>
<marker>
<token postag='D[AI]0FS0|NCFS000|AQ0FS0' postag_regexp='yes'>
<exception postag='CC|CS|RG|RN' postag_regexp='yes'></exception></token>
<token postag='NCFP000|AQ0FP0' postag_regexp='yes'>
<exception postag='P[ID][0123][CFM][SP]000|CC|CS|RG|RN|SPS00' postag_regexp='yes'></exception></token>
</marker>
</pattern>
<message>Erro de concordância do plural:
<suggestion><match no="1" postag="(D[AI]0FP0|NCFP000|AQ0FP0)" postag_regexp="yes"/> <match no="2"/></suggestion> ou <suggestion><match no="1"/> <match no="2" postag="(NCFS000|AQ0FS0)" postag_regexp="yes"/></suggestion>.
</message>
<example correction='As vacas|A vaca'><marker>A vacas</marker> são malhadas.</example>
</rule> 
	
	
	
<!-- Concordance error plural - OS > O -->
<!-- Created by Tiago F. Santos, Portuguese rule, 2016-10-15 -->
<rule id="ERRO_DE_CONCORDNCIA_DO_MASCULINO_PLURAL" name="Erro de concordância do masculino plural">
<pattern>
<marker>
<token postag='D[AI]0MP0|NCMP000|AQ0MP0' postag_regexp='yes'>
<exception postag='CC|CS|RG|RN|SPS00' postag_regexp='yes'></exception></token>
<token postag='NCMS000|AQ0MS0' postag_regexp='yes'>
<exception postag='P[ID][0123][CFM][SP]000|CC|CS|RG|RN|SPS00' postag_regexp='yes'></exception></token>
</marker>
</pattern>
<message>Erro de concordância do plural.
<suggestion><match no="1" postag="(D[AI]0MS0|NCMS000|AQ0MS0)" postag_regexp="yes"/> <match no="2"/></suggestion> ou <suggestion><match no="1"/> <match no="2" postag="(NCMP000|AQ0MP0)" postag_regexp="yes"/></suggestion>.
</message>
<example correction='O cão|Os cães|Os cãos'><marker>Os cão</marker> está no pasto.</example>
</rule>
	
	

<!-- Concordance error plural - O > OS -->
<!-- Created by Tiago F. Santos, Portuguese rule, 2016-10-15 -->
<rule id="ERRO_DE_CONCORDNCIA_DO_PLURAL_OOS" name="Erro de concordância do plural O-OS">
<pattern>
<marker>
<token postag='D[AI]0MS0|NCMS000|AQ0MS0' postag_regexp='yes'>
<exception postag='CC|CS|RG|RN|SPS00' postag_regexp='yes'></exception></token>
<token postag='NCMP000|AQ0MP0' postag_regexp='yes'>
<exception postag='P[ID][0123][CFM][SP]000|CC|CS|RG|RN|SPS00' postag_regexp='yes'></exception></token>
</marker>
</pattern>
<message>Erro de concordância do plural.
<suggestion><match no="1" postag="(D[AI]0MP0|NCMP000|AQ0MP0)" postag_regexp="yes"/> <match no="2"/></suggestion></message>
<example correction='Oo cães|Os cães'><marker>O cães</marker> estão no pasto.</example>
</rule> 
	
`

Ok, so I will review this rules.

@Yakov

This set of rules trigger some false positives. On this set of rules it made less than 10 false positives per 5000 words, tested on: https://pt.wikipedia.org/wiki/Isaac_Newton.

Regarding the corrections, there is indeed, in some cases, a suggestion of “o” being changed to “oo” (not a portuguese dictionary word) among with correct answers such as “o” and “os”. The rules should not make it so.
Is it prefereble to remove the suggestions for now?

Rule names fixed on first posts.

@marcoagpinto
Found the the daily build. I will test further rules on it but let us deal with these ones first.

@Yakov
I’ve run the ./testrules.sh PT now.
Running pattern rule tests for Portuguese... Exception in thread "main" java.lang.AssertionError: Portuguese: Incorrect suggestions: [A vaca] != [A vaca, As vacas] for rule ERRO_DE_CONCORDNCIA_DO_FEMININO_PLURAL_AS_A[1] on input: As vaca são malhadas. expected:<[A vaca]> but was:<[A vaca, As vacas]>
Ok. So I have to mention all the viable suggestions. Changed accordingly.

@dnaber

It also finds “issues” in the second regex rule. But without the second <regexp type="exact">[.]</regexp> the rule does not work.

Is there a better way than to consider crippling a viable solution, considering that all rules I have posted work without issues “in the field”?