Hello @udomai @jaumeortola @tiff
Why doesn’t my antipattern work here?:
<rulegroup id="SPACE_BEFORE_PUNCTUATION" name="Espaços antes da pontuação">
<!-- Based on German grammar.xml, by Tiago F. Santos, 2017-07-08 -->
<!-- MARCOAGPINTO 2022-01-12 (1-JAN-2022+) *START* -->
<!--
HITS AGAINST A 600 000 CORPORA:
BEFORE:xxxx
AFTER:xxxx
-->
<antipattern>
<token regexp='yes'>extensão|extensões|ficheiros?</token>
<token spacebefore='yes' regexp='yes'>[.]</token>
<token spacebefore='no' postag='NP.+|AQ.+|NC.+' postag_regexp='yes'/>
</antipattern>
<!-- MARCOAGPINTO 2022-01-12 (1-JAN-2022+) *END* -->
<rule>
<regexp>\b([\p{L}\d]+) ([!?»”’,….])</regexp>
<message>Remova o espaço antes deste sinal de pontuação.</message>
<suggestion>\1\2</suggestion>
<example correction="escapou!">Como é que isto me <marker>escapou !</marker></example>
<!--example correction="escapou!">Como é que isto me <marker>escapou !</marker></example-->
<example correction="roda.">Existem duas estratégias possíveis: aproveitar o que existe ou reinventar a <marker>roda .</marker></example>
</rule>
<rule>
<regexp>\b([\p{L}\d]+) ([:;])(?![\-o]?(?:[()/]|[DSP]\b))</regexp>
<message>Remova o espaço antes deste sinal de pontuação.</message>
<suggestion>\1\2</suggestion>
<example correction="possíveis:">Existem duas estratégias <marker>possíveis :</marker> aproveitar o que existe ou reinventar a roda.</example>
<example>Um sorriso :-)</example>
<example>Um sorriso :)</example>
<example>Um sorriso :(</example>
<example>Um sorriso :-/</example>
<example>Um sorriso :/</example>
<example>Um sorriso :D</example>
<example correction="Brasil;">Site de Instituto Ludwig von Mises <marker>Brasil ;</marker>Principais portais web</example>
</rule>
</rulegroup>
TESTRULES PT throws a lot of errors:
Running pattern rule tests for Portuguese (org.languagetool.language.Portuguese)…
Exception in thread “main” java.lang.RuntimeException: Could not activate rules
at org.languagetool.JLanguageTool.(JLanguageTool.java:334)
at org.languagetool.JLanguageTool.(JLanguageTool.java:293)
at org.languagetool.MultiThreadedJLanguageTool.(MultiThreadedJLanguageTool.java:94)
at org.languagetool.MultiThreadedJLanguageTool.(MultiThreadedJLanguageTool.java:84)
at org.languagetool.MultiThreadedJLanguageTool.(MultiThreadedJLanguageTool.java:67)
at org.languagetool.MultiThreadedJLanguageTool.(MultiThreadedJLanguageTool.java:51)
at org.languagetool.rules.patterns.PatternRuleTest.createToolForTesting(PatternRuleTest.java:175)
at org.languagetool.rules.patterns.PatternRuleTest.runTestForLanguage(PatternRuleTest.java:160)
at org.languagetool.rules.patterns.PatternRuleTest.runGrammarRulesFromXmlTestIgnoringLanguages(PatternRuleTest.java:153)
at org.languagetool.rules.patterns.PatternRuleTest.main(PatternRuleTest.java:737)
Caused by: java.io.IOException: Cannot load or parse input stream of ‘/org/languagetool/rules/pt/grammar.xml’
at org.languagetool.rules.patterns.PatternRuleLoader.getRules(PatternRuleLoader.java:80)
at org.languagetool.Language.getPatternRules(Language.java:641)
at org.languagetool.JLanguageTool.activateDefaultPatternRules(JLanguageTool.java:662)
at org.languagetool.JLanguageTool.(JLanguageTool.java:327)
… 9 more
Caused by: java.lang.RuntimeException: rules currently cannot be used together with . Rule id: SPACE_BEFORE_PUNCTUATION[1]
at org.languagetool.rules.patterns.PatternRuleHandler.createRules(PatternRuleHandler.java:648)
at org.languagetool.rules.patterns.PatternRuleHandler.endElement(PatternRuleHandler.java:408)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.endElement(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.dtd.XMLDTDValidator.endNamespaceScope(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.dtd.XMLDTDValidator.handleEndElement(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.dtd.XMLDTDValidator.endElement(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl.parse(Unknown Source)
at javax.xml.parsers.SAXParser.parse(Unknown Source)
at org.languagetool.rules.patterns.PatternRuleLoader.getRules(PatternRuleLoader.java:77)
… 12 more
Running disambiguator rule tests…
Running disambiguation tests for Portuguese…
Exception in thread “main” java.lang.RuntimeException: Could not activate rules
at org.languagetool.JLanguageTool.(JLanguageTool.java:334)
at org.languagetool.JLanguageTool.(JLanguageTool.java:293)
at org.languagetool.JLanguageTool.(JLanguageTool.java:353)
at org.languagetool.JLanguageTool.(JLanguageTool.java:259)
at org.languagetool.tagging.disambiguation.rules.DisambiguationRuleTest.testDisambiguationRulesFromXML(DisambiguationRuleTest.java:70)
at org.languagetool.tagging.disambiguation.rules.DisambiguationRuleTest.main(DisambiguationRuleTest.java:238)
Caused by: java.io.IOException: Cannot load or parse input stream of ‘/org/languagetool/rules/pt/grammar.xml’
at org.languagetool.rules.patterns.PatternRuleLoader.getRules(PatternRuleLoader.java:80)
at org.languagetool.Language.getPatternRules(Language.java:641)
at org.languagetool.JLanguageTool.activateDefaultPatternRules(JLanguageTool.java:662)
at org.languagetool.JLanguageTool.(JLanguageTool.java:327)
… 5 more
Caused by: java.lang.RuntimeException: rules currently cannot be used together with . Rule id: SPACE_BEFORE_PUNCTUATION[1]
at org.languagetool.rules.patterns.PatternRuleHandler.createRules(PatternRuleHandler.java:648)
at org.languagetool.rules.patterns.PatternRuleHandler.endElement(PatternRuleHandler.java:408)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.endElement(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.dtd.XMLDTDValidator.endNamespaceScope(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.dtd.XMLDTDValidator.handleEndElement(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.dtd.XMLDTDValidator.endElement(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl.parse(Unknown Source)
at javax.xml.parsers.SAXParser.parse(Unknown Source)
at org.languagetool.rules.patterns.PatternRuleLoader.getRules(PatternRuleLoader.java:77)
… 8 more
Running XML bitext pattern tests…
What is wrong with it?
Thanks!