[pt] Rule: Simplificar: O que + V. → V. Gerúndio

Hello @rjlima

I came up with a shorter name for the rule we talked about last time.

But I have spent the whole day removing false positives and testing against 600 000 sentences.

BEFORE:

Portuguese (Portugal): 3780 total matches
Portuguese (Portugal): ø0.01 rule matches per sentence
Portuguese (Portugal): 17323 input lines ignored (e.g. not between 10 and 300 chars or at least 4 tokens)

CURRENTLY:

Portuguese (Portugal): 525 total matches
Portuguese (Portugal): ø0.00 rule matches per sentence

This rule will take a few more days, like two or three, to remove all the false positives.

1 Like

@rjlima

Hello!

I have improved the accuracy of the rule a lot.

You can see the generated results in GitHub (attachment there at the bottom):

Hi,
I have seen some problems when the verb is ‘ser’ or ‘estar’. Some cases wtih ‘ser’ are ‘o que é…?’ and in an interrogative it shouldn’t be replaced; ‘o que está’ is being replaced by ‘sendo’ when it should be ‘estando’.

@rjlima

Hello Ricardo,

Last night and today, I have fixed tons of false positives.

Here is the original state:
https://internal1.languagetool.org/regression-tests/via-http/2022-02-14/pt-PT/index.html

Tomorrow afternoon (can’t remember if these results come out at 5, 6 or 7am, so better to check later than before), check again:
https://internal1.languagetool.org/regression-tests/via-http/2022-02-15/pt-PT/index.html

It should have removed dozens of false positives.

Thanks!

@rjlima

A few more false positives removed:

Was there supposed to have a txt file attached for me to look at? Or is the code?

ahhhhhhh… I didn’t attach anything :slight_smile:

I was waiting for the results in the morning.

@rjlima

I have removed tons of false positives:

I have also attached there the results .txt.

1 Like