Sugestão para corrigir erros de crase

Falta a implementação correta de erros quanta a grafia do acentro grave conhecido como crase. Em frases como: “Fui à farmácia” o corretor não identifica a falta da acentuação.

O LanguageTool 5.0 vai ser lançado na sexta-feira.

Até lá não farei nenhuma alteração para não correr riscos.


I can’t fix this rule.

Here is the code:

      <token spacebefore='no' regexp='yes'>&hifen;</token>
      <token inflected='yes' spacebefore='no'>ir</token>
      <token inflected='yes'>ir</token>
      <token postag_regexp='yes' postag='A...P.+' min='0'/>
      <token inflected='yes' spacebefore='no'>ir</token>
      <token inflected='yes' regexp='yes'>&requer_crase_verbos;
        <exception inflected='yes'>ser</exception></token>
      <token postag='R.' min='0' postag_regexp='yes'>
        <exception postag='C.+' postag_regexp='yes'/></token>
      <token regexp='yes'>as?</token>
      <token postag_regexp='yes' postag='N.F.+'>
        <exception postag='D..F.+|R.+' postag_regexp='yes'/></token>
  <message>Esta palavra rege-se com a preposição "a".</message>
    <suggestion>\1 <match no='2' include_skipped='all'/> <match no='3' regexp_match='(a)(s?)' regexp_replace='à$2'/></suggestion>
  <short>Erro de crase</short>
  <example correction='Vamos às'><marker>Vamos as</marker> compras no supermercado.</example>
  <example correction='Iremos à'><marker>Iremos a</marker> escola, falar com os professores.</example>
<!--<example correction='foi à'>Ele <marker>foi a</marker> herdade.</example>--><!-- TODO Replace exception by 'ir' and 'ser' verbs disambiguation -->
  <example correction='adere à'>A cola <marker>adere a</marker> folha.</example>
  <example correction='Pertence provavelmente à'><marker>Pertence provavelmente a</marker> equipa dos seus amigos.</example>
  <example>A causa de sua morte foi a pneumonia.</example>
  <example>Os 6% restantes pertencem a outras nacionalidades.</example>
  <example>…terminada esta seguir-se-ia a construção…</example>
  <example>…viços de TV paga não se tornaram populares ou bem sucedidos enquanto as redes de televisão públicas ZDF e ARD oferecem um…</example>

It works okay with “vamos”:

Vamos a farmácia.
Fomos a farmácia.
Fui a farmácia.

But with the verb “ir” it doesn’t trigger any error.

<!ENTITY requer_crase_verbos "(?:a(?:derir|gradar|ssistir)|comparecer|des(?:agradar|obedecer)|equivaler|ir|p(?:ertencer|roceder)|obedecer|re(?:agir|correr|sponder)|suceder)"><!-- accepts crase: avisar|limitar -->

As you can see, “ir” is there.

Why doesn’t it work?


“Fui” is also a form of verb “ser”, which is an exception.
<S> fui[ir/VMIS1S0,ser/VMIS1S0] a[a/SPS00] farmácia[farmácia/NCFS000,</S>]<P/>
We can disambiguate “fui a” as “ir”, if it doesn’t cause other problems, or we can repeat the rule for “ir” (even if it matches also “ser”).


How can we disambiguate it?

Can you help?


Is “fui (ser) a + feminine noun” (eu fui a mulher) a correct and frequent structure? If it is frequent, then it will be difficult to disambiguate. I would try this: repeat the same rule, but only with the verb “ir” and without the exception “ser”, with default=temp_off, and we will see in the tests how many false alarms it causes.

Thanks, I will repeat the same rule.

It is in my TO-DO list for this weekend.

I still want to enhance some other rules today.

Each Wikipedia Tool check takes 10 minutes for 200 000 sentences, and I always do a “before.txt” and an “after.txt” after I improve antipatterns.


It produces warnings in TESTRULES PT

<!-- MARCOAGPINTO 2020-06-28 *START* -->
<!-- "Fui a farmácia." -->
<!-- "Fui a praia." -->
<!-- "Fui a casa da Ana." -->
          <token inflected='yes' regexp='no'>fui</token>
          <token regexp='yes'>as?</token>
          <token postag_regexp='yes' postag='N.F.+'>
            <exception postag='D..F.+|R.+' postag_regexp='yes'/></token>
      <message>Esta palavra rege-se com a preposição "a".</message>
        <suggestion>\1 <match no='2' regexp_match='(a)(s?)' regexp_replace='à$2'/></suggestion>
      <short>Erro de crase</short>
      <example correction='fui à'>De manhã <marker>fui a</marker> praia.</example>
      <example>De manhã fui a farmácia.</example>
	  <example>De manhã fui a praia.</example>
	  <example>De manhã fui a casa da Ana.</example>
<!-- MARCOAGPINTO 2020-06-28 *END* -->

What it warns:

2528 rules tested.
Exception in thread “main” org.languagetool.rules.patterns.PatternRuleTest$PatternRuleTestFailure: Test failure for rule CRASE_CONFUSION[8] in file /org/languagetool/rules/pt/grammar.xml: De manh? fui a praia."
Errors expected: 1
Errors found : 0
Message: Esta palavra rege-se com a preposiç?o “a”.
Analyzed token readings: [/SENT_START*] De[De manh?/RG*] [ /null*] manh?[De manh?/RG] [ /null*] fui[ir/VMIS1S0,ser/VMIS1S0] [ /null*] a[a/SPS00] [ /null*] praia[praia/NCFS000] .[./SENT_END*,./_PUNCT*]
at org.languagetool.rules.patterns.PatternRuleTest.testBadSentences(
at org.languagetool.rules.patterns.PatternRuleTest.testGrammarRulesFromXML(
at org.languagetool.rules.patterns.PatternRuleTest.runTestForLanguage(
at org.languagetool.rules.patterns.PatternRuleTest.runGrammarRulesFromXmlTestIgnoringLanguages(
at org.languagetool.rules.patterns.PatternRuleTest.main(

What could be wrong with the rule?


In <token inflected='yes' regexp='no'>fui</token>, you should remove “inflected”.

But I would use this, so that we see what happens with all forms that are “ser” and “ir” concurrently:

<token inflected="yes">ir</token>
<token inflected="yes">ser</token>

Where would I place the:

<token inflected="yes">ir</token>
<token inflected="yes">ser</token>


Instead of:

<token inflected='yes' regexp='no'>fui</token>


If the above fails, I was thinking about creating the “ser” verb forms from:

EDIT: To add to the exceptions.

Maybe this is the solution?

It gives tons of warnings:

<!-- MARCOAGPINTO 2020-06-28 *START* -->
<!-- "Fui a farmácia." -->
<!-- "Fui a praia." -->
<!-- "Fui a casa da Ana." -->
<token inflected="yes">ir</token>
<token inflected="yes">ser</token>
          <token regexp='yes'>as?</token>
          <token postag_regexp='yes' postag='N.F.+'>
            <exception postag='D..F.+|R.+' postag_regexp='yes'/></token>
      <message>Esta palavra rege-se com a preposição "a".</message>
        <suggestion>\1 <match no='2' regexp_match='(a)(s?)' regexp_replace='à$2'/></suggestion>
      <short>Erro de crase</short>
      <example correction='fui à'>De manhã <marker>fui a</marker> praia.</example>
      <example>De manhã fui a farmácia.</example>
	  <example>De manhã fui a praia.</example>
	  <example>De manhã fui a casa da Ana.</example>
<!-- MARCOAGPINTO 2020-06-28 *END* -->

I am going to try to add manually the verb forms to the exception.

I don’t understand your examples in this rule. “Fui a praia” is incorrect, but “fui a farmácia” is correct?
You also have “fui a praia” as both correct and incorrect. It is logically impossible to pass the tests.


I am stressed, so I can’t reason 100%.

Anyway, I will create this:

<!ENTITY requer_crase_verbo_ser "é|era|eram|éramos|eras|éreis|és|são|sede|seja|sejais|sejam|sejamos|sejas|ser|será|serão|serás|serdes|serei|sereis|serem|seremos|seres|seria|seriam|seríamos|serias|seríeis|sermos|sois|somos|sou">

And replace the “exception>ser< blah blah” with this entity.

Is it a better approach?


The entity approach seems to have worked.

I am going to check with 200 000 sentences.

Well, it produces hundreds of false positives :frowning:

I am planning to give up on it. :frowning:


Your suggestions passed TESTRULES PT:

<!-- MARCOAGPINTO 2020-06-28 *START* -->
<!-- "Fui a farmácia." -->
<!-- "Fui a praia." -->
<token inflected="yes">ir</token>
<token inflected="yes">ser</token>
          <token regexp='yes'>as?</token>
          <token postag_regexp='yes' postag='N.F.+'>
            <exception postag='D..F.+|R.+' postag_regexp='yes'/></token>
      <message>Esta palavra rege-se com a preposição "a".</message>
        <suggestion>\1 <match no='2' regexp_match='(a)(s?)' regexp_replace='à$2'/></suggestion>
      <short>Erro de crase</short>
      <example correction='fui à'>De manhã <marker>fui a</marker> praia.</example>
      <example>De manhã fui à farmácia.</example>
	  <example>De manhã fui à praia.</example>
<!-- MARCOAGPINTO 2020-06-28 *END* -->

I am going to test it with a 200 000 sentences check right now.

A possible solution is to write a rule with a list of nouns (farmácia, praia, praça, rua, loja…) that need “à”. The results won’t be comprehensive, but perhaps they will be good enough.

It is a good idea, not sure when I will have the chance to do it though :slight_smile:

Correção *identificar artigo e não intensificar