[pt] Easier/simpler way to write rule

marcoagpinto · March 31, 2021, 10:25am

I have been improving this rule.

“usa os termos”
“usa termos”
“usa-se os termos”
“usa-se termos”

Is there an easier way to check if there is a word before “termos?”?

Thanks!

<!-- USAR empregar -->
<rulegroup id='EMPREGAR_TERMO' name="usar → empregar" type="style" tags="picky">
<!--      Created by Marco A.G.Pinto, Portuguese rule 2021-03-31 (Enhanced) (17-MAR-2021+) *START* -->
 <rule>
  <pattern>
	<marker>
		<token regexp='no' inflected='yes'>usar</token>	
	</marker>
	<token/>
	<token regexp='yes'>termos?</token>
  </pattern>
  <message>Num contexto formal ou académico, é preferível escrever 'empregar um termo'.</message>
  <suggestion><match no='1' postag='V.+' postag_regexp='yes'>empregar</match></suggestion>
  <example correction="empregamo|empregamos">Na nossa investigação <marker>usamos</marker> o termo “terrorismo”.</example>
 </rule>
 <rule>
  <pattern>
	<marker>
		<token regexp='no' inflected='yes'>usar</token>	
	</marker>
	<token regexp='yes'>termos?</token>
  </pattern>
  <message>Num contexto formal ou académico, é preferível escrever 'empregar um termo'.</message>
  <suggestion><match no='1' postag='V.+' postag_regexp='yes'>empregar</match></suggestion>
  <example correction="emprega|empregues">A nossa investigação <marker>usa</marker> termos científicos.</example>
 </rule>	 
 
 <rule>
  <pattern>
	<marker>
		<token regexp='no' inflected='yes'>usar</token>	
		<token regexp='yes' spacebefore='no'>&hifen;</token>
		<token regexp='no' spacebefore='no'>se</token>			
	</marker>
	<token/>
	<token regexp='yes'>termos?</token>
  </pattern>
  <message>Num contexto formal ou académico, é preferível escrever 'empregar um termo'.</message>
  <suggestion><match no='1' postag='V.+' postag_regexp='yes'>empregar</match>-se</suggestion>
  <example correction="emprega-se|empregues-se">Na nossa investigação <marker>usa-se</marker> o termo “terrorismo”.</example>
 </rule>
 <rule>
  <pattern>
	<marker>
		<token regexp='no' inflected='yes'>usar</token>	
		<token regexp='yes' spacebefore='no'>&hifen;</token>
		<token regexp='no' spacebefore='no'>se</token>			
	</marker>
	<token regexp='yes'>termos?</token>
  </pattern>
  <message>Num contexto formal ou académico, é preferível escrever 'empregar um termo'.</message>
  <suggestion><match no='1' postag='V.+' postag_regexp='yes'>empregar</match>-se</suggestion>
  <example correction="emprega-se|empregues-se">Na nossa investigação <marker>usa-se</marker> termos científicos.</example>
 </rule>	 	 
</rulegroup>

Ruud_Baars · March 31, 2021, 10:39am

It is not clear to me what you exactly want. But I guess you could use
Token skip=5.
If you i d to chack for any token, just will do that.

marcoagpinto · March 31, 2021, 10:55am

@Ruud_Baars

The rule I am trying to improve can have or not a word before “termo”/“termos”.

“Vamos usar os termos do livro.”
“Vamos usar vários termos do livro.”
“Vamos usar termos caros.”

To code it I created 4 rules also to cope with the fact that one can use a “-se” after the verb:

“Em Portugal usa-se termos caros.”

I wondered if instead of 4 rules I could do the same with 2 just by accepting or not a word before “termo”/“termos”.

Thanks!

marcoagpinto · March 31, 2021, 12:36pm

@Ruud_Baars

I tried:

<!-- USAR empregar -->
<rulegroup id='EMPREGAR_TERMO' name="usar → empregar" type="style" > <!-- tags="picky"> -->
<!--      Created by Marco A.G.Pinto, Portuguese rule 2021-03-31 (Enhanced) (17-MAR-2021+) *START* -->
 <rule>
  <pattern>
	<marker>
		<token regexp='no' inflected='yes'>usar</token>	
	</marker>
    <token skip='1'>
        <exception scope="next" regexp='yes'>termos?</exception>
	</token>
  </pattern>
  <message>Num contexto formal ou académico, é preferível escrever 'empregar um termo'.</message>
  <suggestion><match no='1' postag='V.+' postag_regexp='yes'>empregar</match></suggestion>
  <example correction="empregamo|empregamos">Na nossa investigação <marker>usamos</marker> o termo “terrorismo”.</example>
  <example correction="empregamo|empregamos">Na nossa investigação <marker>usamos</marker> termos caros.</example>
 </rule> 
 <rule>
  <pattern>
	<marker>
		<token regexp='no' inflected='yes'>usar</token>	
		<token regexp='yes' spacebefore='no'>&hifen;</token>
		<token regexp='no' spacebefore='no'>se</token>			
	</marker>
    <token skip='1'>
        <exception scope="next" regexp='yes'>termos?</exception>
	</token>
  </pattern>
  <message>Num contexto formal ou académico, é preferível escrever 'empregar um termo'.</message>
  <suggestion><match no='1' postag='V.+' postag_regexp='yes'>empregar</match>-se</suggestion>
  <example correction="emprega-se|empregues-se">Na nossa investigação <marker>usa-se</marker> o termo “terrorismo”.</example>
  <example correction="emprega-se|empregues-se">Na nossa investigação <marker>usa-se</marker> termos caros.</example>
 </rule> 
</rulegroup>

But the standalone tool doesn’'t flag all the sentences I tried it with:

O Rui usou o termo bom!
A tese usa termos bons!
Na tese usou-se o termo bom!
Na tese usou-se termos bons!

Also, TESTRULES PT shows a lot of warnings:

Testing rule 2600…
Skipped 0 rules for variant language to avoid checking rules more than once
2688 rules tested.
Exception in thread “main” org.languagetool.rules.patterns.PatternRuleTest$PatternRuleTestFailure: Test failure for rule EMPREGAR_TERMO[2] in file /org/languagetool/rules/pt/grammar.xml: Na nossa investigaç?o usa-se o termo ?terrorismo?."
Errors expected: 1
Errors found : 0
Message: Num contexto formal ou académico, é preferível escrever ‘empregar um termo’.
Analyzed token readings: [/SENT_START*] Na[em+a/SPS00+DA*] [ /null*] nossa[nosso/DP1FSP] [ /null*] investigaç?o[investigaç?o/NCFS000] [ /null*] usa[usar/VMIP3S0,usar/VMM02S0] -[-/_PUNCT*] se[se/PP3CN000*] [ /null*] o[o/DA0MS0] [ /null*] termo[termo/AQ0CN0,termo/NCMS000] [ /null*] ?[?/_PUNCT] terrorismo[terrorismo/NCMS000*] ?[?/_PUNCT*] .[./SENT_END*,./_PUNCT*]
Matches:
at org.languagetool.rules.patterns.PatternRuleTest.addError(PatternRuleTest.java:313)
at org.languagetool.rules.patterns.PatternRuleTest.testBadSentences(PatternRuleTest.java:433)
at org.languagetool.rules.patterns.PatternRuleTest.lambda$testGrammarRulesFromXML$1(PatternRuleTest.java:342)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
org.languagetool.rules.patterns.PatternRuleTest$PatternRuleTestFailure: Test failure for rule EMPREGAR_TERMO[1] in file /org/languagetool/rules/pt/grammar.xml: Na nossa investigaç?o usamos o termo ?terrorismo?."
Errors expected: 1
Errors found : 0
Message: Num contexto formal ou académico, é preferível escrever ‘empregar um termo’.
Analyzed token readings: [/SENT_START*] Na[em+a/SPS00+DA*] [ /null*] nossa[nosso/DP1FSP] [ /null*] investigaç?o[investigaç?o/NCFS000] [ /null*] usamos[usar/VMIP1P0] [ /null*] o[o/DA0MS0] [ /null*] termo[termo/AQ0CN0,termo/NCMS000] [ /null*] ?[?/_PUNCT] terrorismo[terrorismo/NCMS000*] ?[?/_PUNCT*] .[./SENT_END*,./_PUNCT*]
Matches:
at org.languagetool.rules.patterns.PatternRuleTest.addError(PatternRuleTest.java:313)
at org.languagetool.rules.patterns.PatternRuleTest.testBadSentences(PatternRuleTest.java:433)
at org.languagetool.rules.patterns.PatternRuleTest.lambda$testGrammarRulesFromXML$1(PatternRuleTest.java:342)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Running disambiguator rule tests…
Running disambiguation tests for Portuguese…
100…
200…
290 rules tested (387ms)
Disambiguator tests successful.
Running XML bitext pattern tests…
Bitext pattern tests successful.
Validating false-friends.xml…
Validation successfully finished.

Can you help fixing it?

Anyone?

Thanks!

Ruud_Baars · March 31, 2021, 2:44pm

Hard to see on my phone, but I think there is no token in the pattern to catch termos?
Why is that in an exception?
Token inflected usar
Token skip 2
Token termos?

(Very rough, I know.)

marcoagpinto · March 31, 2021, 2:54pm

Hello!

I have been able to solve it myself after some thinking:

Thank you all for your time!