[pt] Problems creating a rule - 2021-04-19

marcoagpinto · April 19, 2021, 8:35am

I have been working on a rule that simplifies sentences:

Isto é para que saibam o que acontece. → Isto é para saberem o que acontece.

However, it displays all forms of a verb.

Also, it messes the verbs “ir” and “vir”:

Title: Apolo/-1782578262
1.) Line 1, column 86, Rule ID: PARA_QUE_VERB[1]
Message: Esta perífrase pode ser simplificada.
Suggestion: fossar; ir; irdes; irem; ires
Rule source: /org/languagetool/rules/pt/grammar.xml
Depois de ser incansavelmente perseguida por Apolo, Dafne suplicou para seu pai para que fosse transformada em um loureiro.

(in this example, “fosse”)

Here is the rule so far:

	<!-- PARA QUE SAIBAM para saberem -->
    <rule id='PARA_QUE_VERB' name="Para que + Verbo → Para + Verbo ">
    <!--      Created by Marco A.G.Pinto, Portuguese rule 2021-04-19 (17-MAR-2021+)      -->
	<!--
Isto é para que saibam o que acontece. → Isto é para saberem o que acontece.
	-->
		<pattern>
			<token>para</token>
			<marker>
				<token>que</token>
				<token postag='VMM0..+' postag_regexp='yes'/>
			</marker>
			<and>
				<token negate_pos="yes" postag='SPS.+' postag_regexp='yes'/>
				<token negate="yes" inflected='yes'>ir</token>
			</and>
		</pattern>		
		<message>Esta perífrase pode ser simplificada.</message>
		<suggestion><match no='3' postag='VMM0..+' postag_regexp="yes" postag_replace='VMN0..+'/></suggestion>
		<example correction="saber|saberdes|saberem|saberes|sabermos">Isto é para <marker>que saibam</marker> o que acontece.</example>
    </rule>

Could you give some tips?

Thanks!

udomai · April 19, 2021, 9:17am

Hi Marco! That’s a beautiful rule idea!

From what I see, this can be solved by being more specific. First of all, “saibam”, for example, is a subjunctive form (needed because of “para que”), not an imperative, so VMSP... should accommodate those forms. And if you add those three ... to the infinitivo pessoal (VMN0), it appears to work precisely. Can you work with this?

	<!-- PARA QUE SAIBAM para saberem -->
    <rule id='PARA_QUE_VERB' name="Para que + Verbo → Para + Verbo ">
    <!--      Created by Marco A.G.Pinto, Portuguese rule 2021-04-19 (17-MAR-2021+)      -->
	<!--
Isto é para que saibam o que acontece. → Isto é para saberem o que acontece.
	-->
		<pattern>
			<token>para</token>
			<marker>
				<token>que</token>
				<token postag='VMSP...' postag_regexp='yes'/>
			</marker>
			<and>
				<token negate_pos="yes" postag='SPS.+' postag_regexp='yes'/>
				<token negate="yes" inflected='yes'>ir</token>
			</and>
		</pattern>		
		<message>Esta perífrase pode ser simplificada.</message>
		<suggestion suppress_misspelled="yes"><match no='3' postag='(VMSP)(...)' postag_regexp="yes" postag_replace='VMN0$2'/></suggestion>
		<example correction="saberem">Isto é para <marker>que saibam</marker> o que acontece.</example>
    </rule>

marcoagpinto · April 19, 2021, 10:00am

@udomai

Thank you!:

It is still hard to me to use such complex postags, but in time I will learn it.

udomai · April 19, 2021, 10:10am

Yes, those sometimes are puzzles, but they are fun to solve

I wasn’t sure in this case either. Plus, sometimes the tagger might not do what we expect it do you. What I did was checking what the tagger actually did with those sentences (Análise de Texto - LanguageTool).

marcoagpinto · April 19, 2021, 10:23am

yes, I use that URL all the time to get the POSes.

But I am still not an expert on complex regular expressions and such.

udomai · April 19, 2021, 10:43am

These capturing groups are super useful and easy to use, even though they look barbarian.

Dividing your regexp up in capturing groups (using parentheses) and referring to them as $1, $2 etc. (starting with 1, for whatever reason ) is very handy (both for regexp_replacing postags and tokens).