Back to LanguageTool Homepage - Privacy - Imprint

[PT] Object Pronoun in the middle of the verb

Hi, everyone!

I’ve been working on a rule that corrects the object pronoun (in the beginning of the sentence) when it is after a verb in the ‘future of the past’. This is the rule:

<rule id="MESOCLISE_FUTURO_DO_PRETERITO__OBJETO_DIRETO"  name="Mesoclise_futuro do preterito_2 conjugacao_objeto direto">
 <pattern>
  <token postag='SENT_START' postag_regexp='yes'></token>
  <marker>
  <token postag='VMIC(1|3)S.*' postag_regexp='yes'></token>
  <token>-</token>
  <token postag='PP3(M|F)(S|P)A.*' postag_regexp='yes'></token>
  </marker>
 </pattern>
 <message>Colocação pronominal inadequada. <suggestion><match no="2" regexp_match="(.+)[aeio]ria$" regexp_replace="$1"/>-l<match no="4"/>-ia</suggestion></message>
 <example correction=''><marker>Amaria-o</marker>. Venderia-o. Partiria-o. Atribuiria-o. Poria-o.</example>
 <example>Amá-lo-ia.</example>
 <example correction=''>Venderia-o.</example>
 <example>Vendê-lo-ia.</example>
 <example correction=''>Partiria-o.</example>
 <example>Partilo-ia.</example>
 <example correction=''>Atribuiria-o.</example>
 <example>Atribuí-lo-ia.</example>
 <example correction=''>Poria-o.</example>
 <example>Pô-lo-ia.</example>
</rule>

The problem is in the suggestion (regexp). In general, verbs ended in a, e, i, o receive the accent: á, ê, í, ô, respectivelly. I´m not being able to suggest using the accents. Can anybody help me out, please?

Are you familiar with Java? I think what you need cannot be done with a single regex. You’d probably need a simple filter implemented in Java that takes the rule match and modifies it by adding the special characters where needed.

What if I had other characters? For exemple: instead of using ’ á ', I would use a simple ’ Z '. Would you think I still would have to use Java?

Maybe my understanding of regex is not complete, but I think you can only replace one character by another with a single regex, not a character class by another character class (like [aei] by [áêí]). A workaround might be to have one rule for each special character.