[pt] Problem with antipattern – 2022-01-12

Hello @udomai @jaumeortola @tiff

Why doesn’t my antipattern work here?:

  <rulegroup id="SPACE_BEFORE_PUNCTUATION" name="Espaços antes da pontuação">
    <!-- Based on German grammar.xml, by Tiago F. Santos, 2017-07-08 -->
	
<!-- MARCOAGPINTO 2022-01-12 (1-JAN-2022+) *START* -->
<!--

HITS AGAINST A 600 000 CORPORA:
BEFORE:xxxx
 AFTER:xxxx
-->
      <antipattern>
		<token regexp='yes'>extensão|extensões|ficheiros?</token>
		<token spacebefore='yes' regexp='yes'>[.]</token>
		<token spacebefore='no' postag='NP.+|AQ.+|NC.+' postag_regexp='yes'/>
      </antipattern>
<!-- MARCOAGPINTO 2022-01-12 (1-JAN-2022+) *END* -->
	
    <rule>
      <regexp>\b([\p{L}\d]+) ([!?»”’,….])</regexp>
      <message>Remova o espaço antes deste sinal de pontuação.</message>
        <suggestion>\1\2</suggestion>
      <example correction="escapou!">Como é que isto me <marker>escapou !</marker></example>
    <!--example correction="escapou!">Como é que isto me <marker>escapou   !</marker></example-->
      <example correction="roda.">Existem duas estratégias possíveis: aproveitar o que existe ou reinventar a <marker>roda .</marker></example>
    </rule>
    <rule>
      <regexp>\b([\p{L}\d]+) ([:;])(?![\-o]?(?:[()/]|[DSP]\b))</regexp>
      <message>Remova o espaço antes deste sinal de pontuação.</message>
        <suggestion>\1\2</suggestion>
      <example correction="possíveis:">Existem duas estratégias <marker>possíveis :</marker> aproveitar o que existe ou reinventar a roda.</example>
      <example>Um sorriso :-)</example>
      <example>Um sorriso :)</example>
      <example>Um sorriso :(</example>
      <example>Um sorriso :-/</example>
      <example>Um sorriso :/</example>
      <example>Um sorriso :D</example>
      <example correction="Brasil;">Site de Instituto Ludwig von Mises <marker>Brasil ;</marker>Principais portais web</example>
    </rule>
  </rulegroup>

TESTRULES PT throws a lot of errors:

Running pattern rule tests for Portuguese (org.languagetool.language.Portuguese)…
Exception in thread “main” java.lang.RuntimeException: Could not activate rules
at org.languagetool.JLanguageTool.(JLanguageTool.java:334)
at org.languagetool.JLanguageTool.(JLanguageTool.java:293)
at org.languagetool.MultiThreadedJLanguageTool.(MultiThreadedJLanguageTool.java:94)
at org.languagetool.MultiThreadedJLanguageTool.(MultiThreadedJLanguageTool.java:84)
at org.languagetool.MultiThreadedJLanguageTool.(MultiThreadedJLanguageTool.java:67)
at org.languagetool.MultiThreadedJLanguageTool.(MultiThreadedJLanguageTool.java:51)
at org.languagetool.rules.patterns.PatternRuleTest.createToolForTesting(PatternRuleTest.java:175)
at org.languagetool.rules.patterns.PatternRuleTest.runTestForLanguage(PatternRuleTest.java:160)
at org.languagetool.rules.patterns.PatternRuleTest.runGrammarRulesFromXmlTestIgnoringLanguages(PatternRuleTest.java:153)
at org.languagetool.rules.patterns.PatternRuleTest.main(PatternRuleTest.java:737)
Caused by: java.io.IOException: Cannot load or parse input stream of ‘/org/languagetool/rules/pt/grammar.xml’
at org.languagetool.rules.patterns.PatternRuleLoader.getRules(PatternRuleLoader.java:80)
at org.languagetool.Language.getPatternRules(Language.java:641)
at org.languagetool.JLanguageTool.activateDefaultPatternRules(JLanguageTool.java:662)
at org.languagetool.JLanguageTool.(JLanguageTool.java:327)
… 9 more
Caused by: java.lang.RuntimeException: rules currently cannot be used together with . Rule id: SPACE_BEFORE_PUNCTUATION[1]
at org.languagetool.rules.patterns.PatternRuleHandler.createRules(PatternRuleHandler.java:648)
at org.languagetool.rules.patterns.PatternRuleHandler.endElement(PatternRuleHandler.java:408)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.endElement(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.dtd.XMLDTDValidator.endNamespaceScope(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.dtd.XMLDTDValidator.handleEndElement(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.dtd.XMLDTDValidator.endElement(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl.parse(Unknown Source)
at javax.xml.parsers.SAXParser.parse(Unknown Source)
at org.languagetool.rules.patterns.PatternRuleLoader.getRules(PatternRuleLoader.java:77)
… 12 more
Running disambiguator rule tests…
Running disambiguation tests for Portuguese…
Exception in thread “main” java.lang.RuntimeException: Could not activate rules
at org.languagetool.JLanguageTool.(JLanguageTool.java:334)
at org.languagetool.JLanguageTool.(JLanguageTool.java:293)
at org.languagetool.JLanguageTool.(JLanguageTool.java:353)
at org.languagetool.JLanguageTool.(JLanguageTool.java:259)
at org.languagetool.tagging.disambiguation.rules.DisambiguationRuleTest.testDisambiguationRulesFromXML(DisambiguationRuleTest.java:70)
at org.languagetool.tagging.disambiguation.rules.DisambiguationRuleTest.main(DisambiguationRuleTest.java:238)
Caused by: java.io.IOException: Cannot load or parse input stream of ‘/org/languagetool/rules/pt/grammar.xml’
at org.languagetool.rules.patterns.PatternRuleLoader.getRules(PatternRuleLoader.java:80)
at org.languagetool.Language.getPatternRules(Language.java:641)
at org.languagetool.JLanguageTool.activateDefaultPatternRules(JLanguageTool.java:662)
at org.languagetool.JLanguageTool.(JLanguageTool.java:327)
… 5 more
Caused by: java.lang.RuntimeException: rules currently cannot be used together with . Rule id: SPACE_BEFORE_PUNCTUATION[1]
at org.languagetool.rules.patterns.PatternRuleHandler.createRules(PatternRuleHandler.java:648)
at org.languagetool.rules.patterns.PatternRuleHandler.endElement(PatternRuleHandler.java:408)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.endElement(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.dtd.XMLDTDValidator.endNamespaceScope(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.dtd.XMLDTDValidator.handleEndElement(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.dtd.XMLDTDValidator.endElement(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl.parse(Unknown Source)
at javax.xml.parsers.SAXParser.parse(Unknown Source)
at org.languagetool.rules.patterns.PatternRuleLoader.getRules(PatternRuleLoader.java:77)
… 8 more
Running XML bitext pattern tests…

What is wrong with it?

Thanks!

As a last resort, I can rewrite the rule.

Isn’t simply a rule to detect spaces before _PUNCT and _QUOT?

After inserting the rule into the form on the website, an error appears
<regexp> rules currently cannot be used together with <antipattern>.
This means that “antipattern” cannot be used in rules with “regexp” tags.

To use “antipattern”, it is necessary to rewrite the expression written in the “regexp” tags through the “token” tags.

See also: Allow regexp in antipatterns · Issue #5493 · languagetool-org/languagetool · GitHub