Falta a implementação correta de erros quanta a grafia do acentro grave conhecido como crase. Em frases como: “Fui à farmácia” o corretor não identifica a falta da acentuação.
O LanguageTool 5.0 vai ser lançado na sexta-feira.
Até lá não farei nenhuma alteração para não correr riscos.
I can’t fix this rule.
Here is the code:
<rule>
<antipattern>
<token spacebefore='no' regexp='yes'>&hifen;</token>
<token inflected='yes' spacebefore='no'>ir</token>
</antipattern>
<antipattern>
<token inflected='yes'>ir</token>
<token>a</token>
<token postag_regexp='yes' postag='A...P.+' min='0'/>
<token>eleições</token>
</antipattern>
<antipattern>
<token>assim</token>
<token inflected='yes' spacebefore='no'>ir</token>
<token>a</token>
<token>campanha</token>
</antipattern>
<pattern>
<marker>
<token inflected='yes' regexp='yes'>&requer_crase_verbos;
<exception inflected='yes'>ser</exception></token>
<token postag='R.' min='0' postag_regexp='yes'>
<exception postag='C.+' postag_regexp='yes'/></token>
<token regexp='yes'>as?</token>
</marker>
<token postag_regexp='yes' postag='N.F.+'>
<exception postag='D..F.+|R.+' postag_regexp='yes'/></token>
</pattern>
<message>Esta palavra rege-se com a preposição "a".</message>
<suggestion>\1 <match no='2' include_skipped='all'/> <match no='3' regexp_match='(a)(s?)' regexp_replace='à$2'/></suggestion>
<url>https://pt.wikipedia.org/wiki/Crase</url>
<short>Erro de crase</short>
<example correction='Vamos às'><marker>Vamos as</marker> compras no supermercado.</example>
<example correction='Iremos à'><marker>Iremos a</marker> escola, falar com os professores.</example>
<!--<example correction='foi à'>Ele <marker>foi a</marker> herdade.</example>--><!-- TODO Replace exception by 'ir' and 'ser' verbs disambiguation -->
<example correction='adere à'>A cola <marker>adere a</marker> folha.</example>
<example correction='Pertence provavelmente à'><marker>Pertence provavelmente a</marker> equipa dos seus amigos.</example>
<example>A causa de sua morte foi a pneumonia.</example>
<example>Os 6% restantes pertencem a outras nacionalidades.</example>
<example>…terminada esta seguir-se-ia a construção…</example>
<example>…viços de TV paga não se tornaram populares ou bem sucedidos enquanto as redes de televisão públicas ZDF e ARD oferecem um…</example>
</rule>
It works okay with “vamos”:
Vamos a farmácia.
Fomos a farmácia.
Fui a farmácia.
But with the verb “ir” it doesn’t trigger any error.
<!ENTITY requer_crase_verbos "(?:a(?:derir|gradar|ssistir)|comparecer|des(?:agradar|obedecer)|equivaler|ir|p(?:ertencer|roceder)|obedecer|re(?:agir|correr|sponder)|suceder)"><!-- accepts crase: avisar|limitar -->
As you can see, “ir” is there.
Why doesn’t it work?
Thanks!
“Fui” is also a form of verb “ser”, which is an exception.
<S> fui[ir/VMIS1S0,ser/VMIS1S0] a[a/SPS00] farmácia[farmácia/NCFS000,</S>]<P/>
We can disambiguate “fui a” as “ir”, if it doesn’t cause other problems, or we can repeat the rule for “ir” (even if it matches also “ser”).
Is “fui (ser) a + feminine noun” (eu fui a mulher) a correct and frequent structure? If it is frequent, then it will be difficult to disambiguate. I would try this: repeat the same rule, but only with the verb “ir” and without the exception “ser”, with default=temp_off, and we will see in the tests how many false alarms it causes.
Thanks, I will repeat the same rule.
It is in my TO-DO list for this weekend.
I still want to enhance some other rules today.
Each Wikipedia Tool check takes 10 minutes for 200 000 sentences, and I always do a “before.txt” and an “after.txt” after I improve antipatterns.
It produces warnings in TESTRULES PT
<!-- MARCOAGPINTO 2020-06-28 *START* -->
<!-- "Fui a farmácia." -->
<!-- "Fui a praia." -->
<!-- "Fui a casa da Ana." -->
<rule>
<pattern>
<marker>
<token inflected='yes' regexp='no'>fui</token>
<token regexp='yes'>as?</token>
</marker>
<token postag_regexp='yes' postag='N.F.+'>
<exception postag='D..F.+|R.+' postag_regexp='yes'/></token>
</pattern>
<message>Esta palavra rege-se com a preposição "a".</message>
<suggestion>\1 <match no='2' regexp_match='(a)(s?)' regexp_replace='à$2'/></suggestion>
<url>https://pt.wikipedia.org/wiki/Crase</url>
<short>Erro de crase</short>
<example correction='fui à'>De manhã <marker>fui a</marker> praia.</example>
<example>De manhã fui a farmácia.</example>
<example>De manhã fui a praia.</example>
<example>De manhã fui a casa da Ana.</example>
</rule>
<!-- MARCOAGPINTO 2020-06-28 *END* -->
What it warns:
2528 rules tested.
Exception in thread “main” org.languagetool.rules.patterns.PatternRuleTest$PatternRuleTestFailure: Test failure for rule CRASE_CONFUSION[8] in file /org/languagetool/rules/pt/grammar.xml: De manh? fui a praia."
Errors expected: 1
Errors found : 0
Message: Esta palavra rege-se com a preposiç?o “a”.
Analyzed token readings: [/SENT_START*] De[De manh?/RG*] [ /null*] manh?[De manh?/RG] [ /null*] fui[ir/VMIS1S0,ser/VMIS1S0] [ /null*] a[a/SPS00] [ /null*] praia[praia/NCFS000] .[./SENT_END*,./_PUNCT*]
Matches:
at org.languagetool.rules.patterns.PatternRuleTest.testBadSentences(PatternRuleTest.java:396)
at org.languagetool.rules.patterns.PatternRuleTest.testGrammarRulesFromXML(PatternRuleTest.java:318)
at org.languagetool.rules.patterns.PatternRuleTest.runTestForLanguage(PatternRuleTest.java:169)
at org.languagetool.rules.patterns.PatternRuleTest.runGrammarRulesFromXmlTestIgnoringLanguages(PatternRuleTest.java:152)
at org.languagetool.rules.patterns.PatternRuleTest.main(PatternRuleTest.java:683)
What could be wrong with the rule?
Thanks!
In <token inflected='yes' regexp='no'>fui</token>
, you should remove “inflected”.
But I would use this, so that we see what happens with all forms that are “ser” and “ir” concurrently:
<and>
<token inflected="yes">ir</token>
<token inflected="yes">ser</token>
</and>
Where would I place the:
<and>
<token inflected="yes">ir</token>
<token inflected="yes">ser</token>
</and>
Thanks!
Instead of:
<token inflected='yes' regexp='no'>fui</token>
If the above fails, I was thinking about creating the “ser” verb forms from:
https://conjuga-me.net/verbo-ser
EDIT: To add to the exceptions.
Maybe this is the solution?
It gives tons of warnings:
<!-- MARCOAGPINTO 2020-06-28 *START* -->
<!-- "Fui a farmácia." -->
<!-- "Fui a praia." -->
<!-- "Fui a casa da Ana." -->
<rule>
<pattern>
<marker>
<and>
<token inflected="yes">ir</token>
<token inflected="yes">ser</token>
</and>
<token regexp='yes'>as?</token>
</marker>
<token postag_regexp='yes' postag='N.F.+'>
<exception postag='D..F.+|R.+' postag_regexp='yes'/></token>
</pattern>
<message>Esta palavra rege-se com a preposição "a".</message>
<suggestion>\1 <match no='2' regexp_match='(a)(s?)' regexp_replace='à$2'/></suggestion>
<url>https://pt.wikipedia.org/wiki/Crase</url>
<short>Erro de crase</short>
<example correction='fui à'>De manhã <marker>fui a</marker> praia.</example>
<example>De manhã fui a farmácia.</example>
<example>De manhã fui a praia.</example>
<example>De manhã fui a casa da Ana.</example>
</rule>
<!-- MARCOAGPINTO 2020-06-28 *END* -->
I am going to try to add manually the verb forms to the exception.
I don’t understand your examples in this rule. “Fui a praia” is incorrect, but “fui a farmácia” is correct?
You also have “fui a praia” as both correct and incorrect. It is logically impossible to pass the tests.
I am stressed, so I can’t reason 100%.
Anyway, I will create this:
<!ENTITY requer_crase_verbo_ser "é|era|eram|éramos|eras|éreis|és|são|sede|seja|sejais|sejam|sejamos|sejas|ser|será|serão|serás|serdes|serei|sereis|serem|seremos|seres|seria|seriam|seríamos|serias|seríeis|sermos|sois|somos|sou">
And replace the “exception>ser< blah blah” with this entity.
Is it a better approach?
Thanks!
The entity approach seems to have worked.
I am going to check with 200 000 sentences.
Well, it produces hundreds of false positives
I am planning to give up on it.
EDIT:
@jaumeortola
Your suggestions passed TESTRULES PT:
<!-- MARCOAGPINTO 2020-06-28 *START* -->
<!-- "Fui a farmácia." -->
<!-- "Fui a praia." -->
<rule>
<pattern>
<marker>
<and>
<token inflected="yes">ir</token>
<token inflected="yes">ser</token>
</and>
<token regexp='yes'>as?</token>
</marker>
<token postag_regexp='yes' postag='N.F.+'>
<exception postag='D..F.+|R.+' postag_regexp='yes'/></token>
</pattern>
<message>Esta palavra rege-se com a preposição "a".</message>
<suggestion>\1 <match no='2' regexp_match='(a)(s?)' regexp_replace='à$2'/></suggestion>
<url>https://pt.wikipedia.org/wiki/Crase</url>
<short>Erro de crase</short>
<example correction='fui à'>De manhã <marker>fui a</marker> praia.</example>
<example>De manhã fui à farmácia.</example>
<example>De manhã fui à praia.</example>
</rule>
<!-- MARCOAGPINTO 2020-06-28 *END* -->
I am going to test it with a 200 000 sentences check right now.
A possible solution is to write a rule with a list of nouns (farmácia, praia, praça, rua, loja…) that need “à”. The results won’t be comprehensive, but perhaps they will be good enough.
It is a good idea, not sure when I will have the chance to do it though
Correção *identificar artigo e não intensificar