Hello Ricardo,
I have just coded a rule that checks if depreciative words are being used:
<!ENTITY depreciativo "chinoca|ciganada|ciganagem|monhé|panasca|pretalhada">
<!-- TERMOS DEPRECIATIVOS -->
<rule id='DEPRECIATIVO' name="5. Termos depreciativos" type="style">
<!-- Created by Marco A.G.Pinto with Ricardo Joseh Lima suggestions, Portuguese rule 2022-04-18 (1-JAN-2022+) -->
<!--
Vou à loja do monhé. → Vou à loja do indiano.
-->
<pattern>
<token regexp='yes' inflected="yes">&depreciativo;</token>
</pattern>
<message>Este termo é depreciativo, pondere empregar um termo alternativo.</message>
<short>Termo depreciativo</short>
<example type="incorrect">Vou à loja do <marker>monhé</marker>.</example>
<example type="correct">Vou à loja do <marker>indiano</marker>.</example>
</rule>
Notice that in the rule, I use INFLECTED=“YES” to get all variants of the words.
The commit is here:
I have attached the check against 600 000 sentences there, but it only produced one hit, which happens because the entity only has a few words so far.
Do you have any suggestions for the entity words?
Thanks!