I have been working on a rule that simplifies sentences:
Isto é para que saibam o que acontece. → Isto é para saberem o que acontece.
However, it displays all forms of a verb.
Also, it messes the verbs “ir” and “vir”:
Title: Apolo/-1782578262
1.) Line 1, column 86, Rule ID: PARA_QUE_VERB[1]
Message: Esta perífrase pode ser simplificada.
Suggestion: fossar; ir; irdes; irem; ires
Rule source: /org/languagetool/rules/pt/grammar.xml
Depois de ser incansavelmente perseguida por Apolo, Dafne suplicou para seu pai para que fosse transformada em um loureiro.
(in this example, “fosse”)
Here is the rule so far:
<!-- PARA QUE SAIBAM para saberem -->
<rule id='PARA_QUE_VERB' name="Para que + Verbo → Para + Verbo ">
<!-- Created by Marco A.G.Pinto, Portuguese rule 2021-04-19 (17-MAR-2021+) -->
<!--
Isto é para que saibam o que acontece. → Isto é para saberem o que acontece.
-->
<pattern>
<token>para</token>
<marker>
<token>que</token>
<token postag='VMM0..+' postag_regexp='yes'/>
</marker>
<and>
<token negate_pos="yes" postag='SPS.+' postag_regexp='yes'/>
<token negate="yes" inflected='yes'>ir</token>
</and>
</pattern>
<message>Esta perífrase pode ser simplificada.</message>
<suggestion><match no='3' postag='VMM0..+' postag_regexp="yes" postag_replace='VMN0..+'/></suggestion>
<example correction="saber|saberdes|saberem|saberes|sabermos">Isto é para <marker>que saibam</marker> o que acontece.</example>
</rule>
From what I see, this can be solved by being more specific. First of all, “saibam”, for example, is a subjunctive form (needed because of “para que”), not an imperative, so VMSP... should accommodate those forms. And if you add those three ... to the infinitivo pessoal (VMN0), it appears to work precisely. Can you work with this?
<!-- PARA QUE SAIBAM para saberem -->
<rule id='PARA_QUE_VERB' name="Para que + Verbo → Para + Verbo ">
<!-- Created by Marco A.G.Pinto, Portuguese rule 2021-04-19 (17-MAR-2021+) -->
<!--
Isto é para que saibam o que acontece. → Isto é para saberem o que acontece.
-->
<pattern>
<token>para</token>
<marker>
<token>que</token>
<token postag='VMSP...' postag_regexp='yes'/>
</marker>
<and>
<token negate_pos="yes" postag='SPS.+' postag_regexp='yes'/>
<token negate="yes" inflected='yes'>ir</token>
</and>
</pattern>
<message>Esta perífrase pode ser simplificada.</message>
<suggestion suppress_misspelled="yes"><match no='3' postag='(VMSP)(...)' postag_regexp="yes" postag_replace='VMN0$2'/></suggestion>
<example correction="saberem">Isto é para <marker>que saibam</marker> o que acontece.</example>
</rule>
Yes, those sometimes are puzzles, but they are fun to solve
I wasn’t sure in this case either. Plus, sometimes the tagger might not do what we expect it do you. What I did was checking what the tagger actually did with those sentences (Análise de Texto - LanguageTool).
These capturing groups are super useful and easy to use, even though they look barbarian.
Dividing your regexp up in capturing groups (using parentheses) and referring to them as $1, $2 etc. (starting with 1, for whatever reason ) is very handy (both for regexp_replacing postags and tokens).