[pt] Help create rule - 2021-07-08

Hello @udomai @jaumeortola @Ruud_Baars

I have been thinking about a way to fix (antipattern) the false positives in headings on LibreOffice.

For example:

2.5. Teoria da Relatividade
2.5.1.1. Equações

The first one triggers an error at the end of the line, suggesting adding a space.

The second one suggests a comma (floating number)

My idea is to:

Z0CN0
. (dot with space_after="yes")

and then skip until the end of the paragraph and check if the end has a:

SENT_END
with a negate;
_PUNCT

What is the best way of doing it?

My main problem is how to skip until the SENT_END.

Thanks!

In which environment does this happen? Not the web interfaces…

In LibreOffice.

On the web environment and stand-alone tool, it only triggers the floating comma and not the paragraph end.

Maybe this should work?

<antipattern>
   <token postag="SENT_START" skip="-1"/></token>
   <token postag="SENT_END" spacebefore="yes" regexp="yes">'[^.?!]</token>
</antipattern>

Thanks!

@Ruud_Baars

What about this approach?:

  <antipattern>
       <token spaceafter="no" postag='Z0CN0' postag_regexp='no'/>
      <token spaceafter="no" regexp='yes'>[.]</token>
       <token spaceafter="no" postag='Z0CN0' postag_regexp='no'/>
      <token spaceafter="yes" regexp='yes'>[.]</token>
      <token/>
  </antipattern>

EDIT:
Added:

<token spaceafter="no" postag='Z0CN0' postag_regexp='no'/>

EDIT2:
Added:

<token/>

So that it has a word after the space.

EDIT3:
The first two conditions for cases such as “2.5.1. Equation”
(It gets “2.5” then dot then “1”.

Sorry, I lost you.

You could try to keep it to less tokens, so not a skip -1, but skip=4 e.g.
Headings tend to be short.
But the . In the numbers could be in the way.
For Dutch, having a . Between digits does not mean end of sentence. This can be tweaked in segment.srx.

I have just created the antipattern, and it removes tons of false positives, however the space at the end of the paragraph isn’t fixed since that rule isn’t in grammar.xml, it is a Java rule.