[pt] Problem creating rule regarding foreign words

@tiff

Hello Christopher,

I have been improving the foreign rules detector.

For example:
“bullying” suggests italic or between ".

However, this would work kind of ugly with:
“e-bullying”.

So, I created an entity:

<!ENTITY barbarismosE
	"bullying|manuals?">

And an antipattern for Tiago’s rule:

  <antipattern>
    <token>e</token>
    <token>-</token>
    <token regexp='yes'>(?:&barbarismosE;)s?</token>
  </antipattern>	  

However, I tried to flag the “e-” words such as “e-bullying” by placing a new rule below:

  <rulegroup id='BARBARISMS' name='2. Estrageirismos sem tradução'>
    <!--      Created by Tiago F. Santos, Portuguese rule, 2017-02-09      -->
      <url>https://ciberduvidas.iscte-iul.pt/consultorio/perguntas/estrangeirismos-em-italico-ou-entre-aspas/7785</url>
      <short>Estrangeirismo</short>

The rule is:

<rule>
  <pattern>
      <marker>
		<token>e</token>
		<token>-</token>	  
		<token regexp='yes'>&barbarismosE;</token>          
	  </marker>
  </pattern>
  <message>Os estrangeirismos devem estar entre aspas ou ser italizados.</message>
    <suggestion>‘\1\2\3’</suggestion>		
	<suggestion>“\1\2\3”</suggestion>		
  <example correction='‘e-bullying’|“e-bullying"'>O <marker>e-bullying</marker> é cada vez mais comum.</example>		
</rule>	  

The rule works well with the standalone tool, but if I do a TESTRULES PT to check for errors, it triggers several errors.

What is wrong with it?

Thanks!

@marcoagpinto what errors do you see?

One of the quotes in this line is not the one from the suggestion:

<example correction='‘e-bullying’|“e-bullying"'>O <marker>e-bullying</marker> é cada vez mais comum.</example>

Ahhhh… well spotted!

Let me try the rule.

Ahhhh… on the stand-alone tool it works as expected, but on TESTRULES PT it gives tons of warnings.

<!-- MARCOAGPINTO 2020-05-12 *START* -->
<rule>
  <pattern>
      <marker>
		<token>e</token>
		<token>-</token>	  
		<token regexp='yes'>&barbarismosE;</token>          
	  </marker>
  </pattern>
  <message>Os estrangeirismos devem estar entre aspas ou ser italizados.</message>
    <suggestion>‘\1\2\3’</suggestion>		
	<suggestion>“\1\2\3”</suggestion>		
  <example correction='‘e-bullying’|“e-bullying”'>O <marker>e-bullying</marker> é cada vez mais comum.</example>		
</rule>	  
<!-- MARCOAGPINTO 2020-05-12 *END* -->

It gives tons of warnings:

Testing rule 2500…
Skipped 0 rules for variant language to avoid checking rules more than once
2524 rules tested.
Exception in thread “main” org.junit.internal.runners.model.MultipleFailureException: There were 5 errors:
org.languagetool.rules.patterns.PatternRuleTest$PatternRuleTestFailure(Test failure for rule BARBARISMS[1] in file /org/languagetool/rules/pt/grammar.xml: Incorrect input:
O e-bullying é cada vez mais comum.
Corrected sentence:
O ?e-bullying? é cada vez mais comum.
The correction triggered an error itself:
BARBARISMS[1]:3-13:Os estrangeirismos devem estar entre aspas ou ser italizados.
)
org.languagetool.rules.patterns.PatternRuleTest$PatternRuleTestFailure(Test failure for rule BARBARISMS[1] in file /org/languagetool/rules/pt/grammar.xml: Incorrect input:
O e-bullying é cada vez mais comum.
Corrected sentence:
O ?e-bullying? é cada vez mais comum.
The correction triggered an error itself:
BARBARISMS[1]:3-13:Os estrangeirismos devem estar entre aspas ou ser italizados.
)
org.languagetool.rules.patterns.PatternRuleTest$PatternRuleTestFailure(Test failure for rule BARBARISMS[4] in file /org/languagetool/rules/pt/grammar.xml: Incorrect suggestions: ?e-bullying?|?e-bullying" != ?e-bullying?|?e-bullying? on input: O e-bullying é cada vez mais comum.)
org.languagetool.rules.patterns.PatternRuleTest$PatternRuleTestFailure(Test failure for rule BARBARISMS[4] in file /org/languagetool/rules/pt/grammar.xml: Incorrect input:
O e-bullying é cada vez mais comum.
Corrected sentence:
O ?e-bullying? é cada vez mais comum.
The correction triggered an error itself:
BARBARISMS[4]:3-13:Os estrangeirismos devem estar entre aspas ou ser italizados.
)
org.languagetool.rules.patterns.PatternRuleTest$PatternRuleTestFailure(Test failure for rule BARBARISMS[4] in file /org/languagetool/rules/pt/grammar.xml: Incorrect input:
O e-bullying é cada vez mais comum.
Corrected sentence:
O ?e-bullying? é cada vez mais comum.
The correction triggered an error itself:
BARBARISMS[4]:3-13:Os estrangeirismos devem estar entre aspas ou ser italizados.
)
at org.junit.runners.model.MultipleFailureException.assertEmpty(MultipleFailureException.java:67)
at org.junit.rules.ErrorCollector.verify(ErrorCollector.java:39)
at org.languagetool.rules.patterns.PatternRuleTest$PatternRuleErrorCollector.check(PatternRuleTest.java:74)
at org.languagetool.rules.patterns.PatternRuleTest.main(PatternRuleTest.java:673)
Running disambiguator rule tests…
Running disambiguation tests for Portuguese…
371 rules tested (261ms)
Tests successful.
Running XML bitext pattern tests…
Tests successful.
Validating false-friends.xml…
Validation successfully finished.

Yes. It says “The correction triggered an error itself” that means that the correction, after being applied will trigger the same rule. Probably because there’s no antipattern for the word in quotes.

@tiff

I have made some slightly improvements in the xml and moved the position:

Now:

<!-- MARCOAGPINTO 2020-05-13 *START* -->
<rule>
  <pattern>
      <marker>
		<token>e</token>
		<token>-</token>	
		<token regexp='yes'>&barbarismosE;</token>          
	  </marker>
  </pattern>
  <message>Os estrangeirismos devem estar entre aspas ou ser italizados.</message>
    <suggestion>‘\1\2\3’</suggestion>		
	<suggestion>“\1\2\3”</suggestion>		
  <example correction='‘e-bullying’|“e-bullying”'>O <marker>e-bullying</marker> é cada vez mais comum.</example>		
</rule>	  
<!-- MARCOAGPINTO 2020-05-13 *END* -->

With TESTRULES PT gives the error:

Testing rule 2500…
Skipped 0 rules for variant language to avoid checking rules more than once
2523 rules tested.
Exception in thread “main” org.languagetool.rules.patterns.PatternRuleTest$PatternRuleTestFailure: Test failure for rule BARBARISMS[3] in file /org/languagetool/rules/pt/grammar.xml: O e-bullying é cada vez mais comum."
Errors expected: 1
Errors found : 0
Message: Os estrangeirismos devem estar entre aspas ou ser italizados.
Analyzed token readings: [/SENT_START*] O[o/DA0MS0*,o/PD0MS000*,o/PP3MSA00*] [ /null*] e[e/CC] -[-/_PUNCT*] bullying[bullying/NCMS000*] [ /null*] é[ser/VMIP3S0] [ /null*] cada[cada/RG] [ /null*] vez[vez/RG] [ /null*] mais[mais/NCMS000,mais/NCMN000,mais/RG,cada vez mais/] [ /null*] comum[comum/AQ0CS0,comum/NCMS000] .[./SENT_END*,./_PUNCT*]
Matches:
at org.languagetool.rules.patterns.PatternRuleTest.testBadSentences(PatternRuleTest.java:384)
at org.languagetool.rules.patterns.PatternRuleTest.testGrammarRulesFromXML(PatternRuleTest.java:306)
at org.languagetool.rules.patterns.PatternRuleTest.runTestForLanguage(PatternRuleTest.java:160)
at org.languagetool.rules.patterns.PatternRuleTest.runGrammarRulesFromXmlTestIgnoringLanguages(PatternRuleTest.java:143)
at org.languagetool.rules.patterns.PatternRuleTest.main(PatternRuleTest.java:671)
Running disambiguator rule tests…
Running disambiguation tests for Portuguese…
371 rules tested (372ms)
Tests successful.
Running XML bitext pattern tests…
Tests successful.
Validating false-friends.xml…
Validation successfully finished.

Any idea on how to fix it?

Thanks!

@tiff

Even this way it doesn’t work:

<!-- MARCOAGPINTO 2020-05-13 *START* -->
<rule>
  <pattern>
	    <token><exception negate="yes" regexp='yes'>‘|“</exception></token>
      <marker>		
		<token>e</token>
		<token>-</token>	
		<token regexp='yes'>&barbarismosE;</token>          
	  </marker>		
		<token><exception negate="yes" regexp='yes'>’|”</exception></token>
  </pattern>
  <message>Os estrangeirismos devem estar entre aspas ou ser italizados.</message>
    <suggestion>‘\1\2\3’</suggestion>		
	<suggestion>“\1\2\3”</suggestion>		
  <example correction='‘e-bullying’|“e-bullying”'>O <marker>e-bullying</marker> é cada vez mais comum.</example>		
</rule>	  
<!-- MARCOAGPINTO 2020-05-13 *END* -->

What’s the error you are seeing? Is the “&barbarismosE;” entity available where you are checking it?

Maybe using ‘spacebefore=“no”’ on - ad well as the nest token helps a little?

I have been able to fix it.

Thank you for the help!

what errors do you see?

As in my last comment: “I have been able to fix it”.

It is fixed.