Back to LanguageTool Homepage - Privacy - Imprint

[pt] Easier/simpler way to write rule

Hello @udomai and @jaumeortola

I have been improving this rule.

“usa os termos”
“usa termos”
“usa-se os termos”
“usa-se termos”

Is there an easier way to check if there is a word before “termos?”?

Thanks!

<!-- USAR empregar -->
<rulegroup id='EMPREGAR_TERMO' name="usar → empregar" type="style" tags="picky">
<!--      Created by Marco A.G.Pinto, Portuguese rule 2021-03-31 (Enhanced) (17-MAR-2021+) *START* -->
 <rule>
  <pattern>
	<marker>
		<token regexp='no' inflected='yes'>usar</token>	
	</marker>
	<token/>
	<token regexp='yes'>termos?</token>
  </pattern>
  <message>Num contexto formal ou académico, é preferível escrever 'empregar um termo'.</message>
  <suggestion><match no='1' postag='V.+' postag_regexp='yes'>empregar</match></suggestion>
  <example correction="empregamo|empregamos">Na nossa investigação <marker>usamos</marker> o termo “terrorismo”.</example>
 </rule>
 <rule>
  <pattern>
	<marker>
		<token regexp='no' inflected='yes'>usar</token>	
	</marker>
	<token regexp='yes'>termos?</token>
  </pattern>
  <message>Num contexto formal ou académico, é preferível escrever 'empregar um termo'.</message>
  <suggestion><match no='1' postag='V.+' postag_regexp='yes'>empregar</match></suggestion>
  <example correction="emprega|empregues">A nossa investigação <marker>usa</marker> termos científicos.</example>
 </rule>	 
 
 <rule>
  <pattern>
	<marker>
		<token regexp='no' inflected='yes'>usar</token>	
		<token regexp='yes' spacebefore='no'>&hifen;</token>
		<token regexp='no' spacebefore='no'>se</token>			
	</marker>
	<token/>
	<token regexp='yes'>termos?</token>
  </pattern>
  <message>Num contexto formal ou académico, é preferível escrever 'empregar um termo'.</message>
  <suggestion><match no='1' postag='V.+' postag_regexp='yes'>empregar</match>-se</suggestion>
  <example correction="emprega-se|empregues-se">Na nossa investigação <marker>usa-se</marker> o termo “terrorismo”.</example>
 </rule>
 <rule>
  <pattern>
	<marker>
		<token regexp='no' inflected='yes'>usar</token>	
		<token regexp='yes' spacebefore='no'>&hifen;</token>
		<token regexp='no' spacebefore='no'>se</token>			
	</marker>
	<token regexp='yes'>termos?</token>
  </pattern>
  <message>Num contexto formal ou académico, é preferível escrever 'empregar um termo'.</message>
  <suggestion><match no='1' postag='V.+' postag_regexp='yes'>empregar</match>-se</suggestion>
  <example correction="emprega-se|empregues-se">Na nossa investigação <marker>usa-se</marker> termos científicos.</example>
 </rule>	 	 
</rulegroup>

It is not clear to me what you exactly want. But I guess you could use
Token skip=5.
If you i d to chack for any token, just will do that.

@Ruud_Baars

The rule I am trying to improve can have or not a word before “termo”/“termos”.

“Vamos usar os termos do livro.”
“Vamos usar vários termos do livro.”
“Vamos usar termos caros.”

To code it I created 4 rules also to cope with the fact that one can use a “-se” after the verb:

“Em Portugal usa-se termos caros.”

I wondered if instead of 4 rules I could do the same with 2 just by accepting or not a word before “termo”/“termos”.

Thanks!

@Ruud_Baars

I tried:

<!-- USAR empregar -->
<rulegroup id='EMPREGAR_TERMO' name="usar → empregar" type="style" > <!-- tags="picky"> -->
<!--      Created by Marco A.G.Pinto, Portuguese rule 2021-03-31 (Enhanced) (17-MAR-2021+) *START* -->
 <rule>
  <pattern>
	<marker>
		<token regexp='no' inflected='yes'>usar</token>	
	</marker>
    <token skip='1'>
        <exception scope="next" regexp='yes'>termos?</exception>
	</token>
  </pattern>
  <message>Num contexto formal ou académico, é preferível escrever 'empregar um termo'.</message>
  <suggestion><match no='1' postag='V.+' postag_regexp='yes'>empregar</match></suggestion>
  <example correction="empregamo|empregamos">Na nossa investigação <marker>usamos</marker> o termo “terrorismo”.</example>
  <example correction="empregamo|empregamos">Na nossa investigação <marker>usamos</marker> termos caros.</example>
 </rule> 
 <rule>
  <pattern>
	<marker>
		<token regexp='no' inflected='yes'>usar</token>	
		<token regexp='yes' spacebefore='no'>&hifen;</token>
		<token regexp='no' spacebefore='no'>se</token>			
	</marker>
    <token skip='1'>
        <exception scope="next" regexp='yes'>termos?</exception>
	</token>
  </pattern>
  <message>Num contexto formal ou académico, é preferível escrever 'empregar um termo'.</message>
  <suggestion><match no='1' postag='V.+' postag_regexp='yes'>empregar</match>-se</suggestion>
  <example correction="emprega-se|empregues-se">Na nossa investigação <marker>usa-se</marker> o termo “terrorismo”.</example>
  <example correction="emprega-se|empregues-se">Na nossa investigação <marker>usa-se</marker> termos caros.</example>
 </rule> 
</rulegroup>

But the standalone tool doesn’'t flag all the sentences I tried it with:

O Rui usou o termo bom!
A tese usa termos bons!
Na tese usou-se o termo bom!
Na tese usou-se termos bons!

Also, TESTRULES PT shows a lot of warnings:

Testing rule 2600…
Skipped 0 rules for variant language to avoid checking rules more than once
2688 rules tested.
Exception in thread “main” org.languagetool.rules.patterns.PatternRuleTest$PatternRuleTestFailure: Test failure for rule EMPREGAR_TERMO[2] in file /org/languagetool/rules/pt/grammar.xml: Na nossa investigaç?o usa-se o termo ?terrorismo?."
Errors expected: 1
Errors found : 0
Message: Num contexto formal ou académico, é preferível escrever ‘empregar um termo’.
Analyzed token readings: [/SENT_START*] Na[em+a/SPS00+DA*] [ /null*] nossa[nosso/DP1FSP] [ /null*] investigaç?o[investigaç?o/NCFS000] [ /null*] usa[usar/VMIP3S0,usar/VMM02S0] -[-/_PUNCT*] se[se/PP3CN000*] [ /null*] o[o/DA0MS0] [ /null*] termo[termo/AQ0CN0,termo/NCMS000] [ /null*] ?[?/_PUNCT] terrorismo[terrorismo/NCMS000*] ?[?/_PUNCT*] .[./SENT_END*,./_PUNCT*]
Matches: []
at org.languagetool.rules.patterns.PatternRuleTest.addError(PatternRuleTest.java:313)
at org.languagetool.rules.patterns.PatternRuleTest.testBadSentences(PatternRuleTest.java:433)
at org.languagetool.rules.patterns.PatternRuleTest.lambda$testGrammarRulesFromXML$1(PatternRuleTest.java:342)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
org.languagetool.rules.patterns.PatternRuleTest$PatternRuleTestFailure: Test failure for rule EMPREGAR_TERMO[1] in file /org/languagetool/rules/pt/grammar.xml: Na nossa investigaç?o usamos o termo ?terrorismo?."
Errors expected: 1
Errors found : 0
Message: Num contexto formal ou académico, é preferível escrever ‘empregar um termo’.
Analyzed token readings: [/SENT_START*] Na[em+a/SPS00+DA*] [ /null*] nossa[nosso/DP1FSP] [ /null*] investigaç?o[investigaç?o/NCFS000] [ /null*] usamos[usar/VMIP1P0] [ /null*] o[o/DA0MS0] [ /null*] termo[termo/AQ0CN0,termo/NCMS000] [ /null*] ?[?/_PUNCT] terrorismo[terrorismo/NCMS000*] ?[?/_PUNCT*] .[./SENT_END*,./_PUNCT*]
Matches: []
at org.languagetool.rules.patterns.PatternRuleTest.addError(PatternRuleTest.java:313)
at org.languagetool.rules.patterns.PatternRuleTest.testBadSentences(PatternRuleTest.java:433)
at org.languagetool.rules.patterns.PatternRuleTest.lambda$testGrammarRulesFromXML$1(PatternRuleTest.java:342)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Running disambiguator rule tests…
Running disambiguation tests for Portuguese…
100…
200…
290 rules tested (387ms)
Disambiguator tests successful.
Running XML bitext pattern tests…
Bitext pattern tests successful.
Validating false-friends.xml…
Validation successfully finished.

Can you help fixing it?

Anyone?

Thanks!

Hard to see on my phone, but I think there is no token in the pattern to catch termos?
Why is that in an exception?
Token inflected usar
Token skip 2
Token termos?

(Very rough, I know.)

Hello!

I have been able to solve it myself after some thinking:

Thank you all for your time!