[pt] Help to create rules "às/por" with comma

marcoagpinto · December 13, 2020, 7:32am

I have been trying to implement a comma rule but it gives an error with TESTRULES PT.

I am brain toasted and can’t spot the error:

Os números funcionam às vezes podem é estar em baixo.

I am placing it inside:
<rulegroup id='VERB_COMMA_CONJUNCTION' name="Locuções entre vírgulas: portanto, por exemplo, na verdade">

<!-- ÀS VEZES às vezes -->
<!--      Created by Marco A.G.Pinto, Portuguese rule 2020-12-13 (21-OCT-2020+)  *START*   -->
<rule>
  <pattern>
	<marker>
		<token postag='V.+' postag_regexp='yes'/>
		<token negate="yes" regexp='yes' spacebefore='no'>[,]</token>
	</marker>	
	<token regexp='yes'>às|por</token>
	<token postag='RG' postag_regexp='no'/>
	<token postag='V.+' postag_regexp='yes'/>
  </pattern>
  <message>Esta locução deve ser separada por vírgulas.</message>
    <suggestion>\1,</suggestion>
  <example correction="funcionam,">Os números <marker>funcionam</marker> às vezes podem é estar em baixo.</example>
  <example type='correct'>Os números <marker>funcionam,</marker> às vezes podem é estar em baixo.</example>
</rule>

It gives the errors:

Testing rule 2600…
Skipped 0 rules for variant language to avoid checking rules more than once
2645 rules tested.
Exception in thread “main” org.languagetool.rules.patterns.PatternRuleTest$PatternRuleTestFailure: Test failure for rule VERB_COMMA_CONJUNCTION[1] in file /org/languagetool/rules/pt/grammar.xml: Os números funcionam às vezes podem é estar em baixo."
Errors expected: 1
Errors found : 0
Message: Esta locuç?o deve ser separada por vírgulas.
Analyzed token readings: [/SENT_START*] Os[o/DA0MP0*] [ /null*] números[número/NCMP000] [ /null*] funcionam[funcionar/VMIP3P0] [ /null*] às[às vezes/RG] [ /null*] vezes[às vezes/RG] [ /null*] podem[poder/VMIP3P0,podar/VMM03P0,podar/VMSP3P0] [ /null*] é[ser/VMIP3S0] [ /null*] estar[estar/VMN0000,estar/VMN01S0,estar/VMN03S0] [ /null*] em[em baixo/RG] [ /null*] baixo[em baixo/RG] .[./SENT_END*,./_PUNCT*]
Matches:
at org.languagetool.rules.patterns.PatternRuleTest.addError(PatternRuleTest.java:310)
at org.languagetool.rules.patterns.PatternRuleTest.testBadSentences(PatternRuleTest.java:430)
at org.languagetool.rules.patterns.PatternRuleTest.lambda$testGrammarRulesFromXML$1(PatternRuleTest.java:339)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)

Do you know what is wrong with it?

Thanks!

Ruud_Baars · December 13, 2020, 8:59am

Ther are 2 tokens in the pattern, but just one marked in the example.

jaumeortola · December 13, 2020, 9:14am

Just remove this token:

<token negate="yes" regexp='yes' spacebefore='no'>[,]</token>

The next token <token regexp='yes'>às|por</token> already entails that it is not a comma.

marcoagpinto · December 13, 2020, 9:18am

It worked!

Thanks to both of you!

I am about to run it against a 200 000 corpus.

marcoagpinto · December 13, 2020, 10:22am

Added rule:

Thank you again!