[pt] improve E_NO_COMECO

Hi everyone @jaumeortola @dnaber @marcoagpinto

I’ve been trying to improve the rule E_NO_COMECO. The goal is to avoid starting sentences with “E|Mas|Ou|Porém” in the picky mode. It currently looks like this:

    <rule id='E_NO_COMECO' name="Frase começa com 'E', 'Mas' ou 'Ou'" tags="picky">
      <antipattern>
        <token>ou</token>
        <token>seja</token>
      </antipattern>
      <pattern>
        <token postag='SENT_START'/>
        <marker>
          <token regexp="yes">E|Mas|Ou|Porém</token>
        </marker>
      </pattern>
      <message>Estilisticamente '\2' só deve ser utilizado no início da frase raramente, e para um efeito dramático.</message>
      <url>https://ciberduvidas.iscte-iul.pt/consultorio/perguntas/a-conjuncao-porem-no-inicio-de-frase/22597</url>
      <short>Problema de estilo</short>
      <example correction=""><marker>E</marker> sim, isto mostra problemas de estilo ao começar a frase com 'e'.</example>
      <example>Sim, mostra problemas de estilo começar a frase com 'e'.</example>
      <example correction=""><marker>Mas</marker> não é bom costume começar a frase com 'mas'.</example>
      <example>Não é um bom estilo de escrita começar uma frase com 'mas'.</example>
    </rule>

It has no suggestions and I want to add some. But so far, what I’ve been trying to do hasn’t been working. Here’s what I came up with:

      <pattern>
        <marker>
          <token regexp='yes' spacebefore='no'>[.?!]</token>
          <token regexp="yes" case_sensitive='yes' spacebefore='yes'>E|Mas|Ou|Porém</token>
        </marker>
      </pattern>
      <message>Estilisticamente '\2' só deve ser utilizado no início da frase raramente, e para um efeito dramático.</message>
      <suggestion><match no='1' regexp_match='[.?!]' regexp_replace=';'/> <match no='2' case_conversion='startlower'/></suggestion>
	  <suggestion><match no='1' regexp_match='[.?!]' regexp_replace='–'/> <match no='2' case_conversion='startlower'/></suggestion>
	  <suggestion><match no='1' regexp_match='[.?!]' regexp_replace=','/> <match no='2' case_conversion='startlower'/></suggestion>
      <suggestion>APAGAR</suggestion>
        <url>https://ciberduvidas.iscte-iul.pt/consultorio/perguntas/a-conjuncao-porem-no-inicio-de-frase/22597</url>
      <short>Problema de estilo</short>
      <example correction="; e|– e|, e|APAGAR">É um problema estilístico<marker>. E</marker> sim, isto mostra problemas de estilo ao começar a frase com 'e'.</example>
      <example correction="; mas|– mas|, mas|APAGAR">É um problema estilístico<marker>. Mas</marker> não é bom costume começar a frase com 'mas'.</example>
    </rule>

I want to either erase “E|Mas|Ou|Porém” or replace it with “; e”, “– e”, or “, e”. With SENT_START this doesn’t seem to be possible, so I added [.?!] instead. I’m sure the answer is right in front of me, but I’m cracking my head here to find a solution. How do I fix it? I’m sure your tips will be useful for other future suggestions :smiling_face:

@jaumeortola

You always know the answer to everything.

:heart:

1 Like

Here you have two sentences, split by the period. The pattern rule only ‘sees’ one sentence at a time. To give a suggestion that modifies two sentences, we need to write a text level rule in Java.
Can you think of other suggestions that don’t change the period?
Anyway, you can open a GitHub issue and I will write the Java rule when I have time.

I had a call with Stacy and we came to this same conclusion. I suppose we can find solutions that don’t require writing two example sentences. I’m gonna work on that and see how the rule performs. If it’s still bad and we really need a Java rule, I’ll let you know. Thanks a lot for all your help :smiling_face: