[pt] Rule: Simplificar: O que + V. → V. Gerúndio

Hello @rjlima

I came up with a shorter name for the rule we talked about last time.

But I have spent the whole day removing false positives and testing against 600 000 sentences.


Portuguese (Portugal): 3780 total matches
Portuguese (Portugal): ø0.01 rule matches per sentence
Portuguese (Portugal): 17323 input lines ignored (e.g. not between 10 and 300 chars or at least 4 tokens)


Portuguese (Portugal): 525 total matches
Portuguese (Portugal): ø0.00 rule matches per sentence

This rule will take a few more days, like two or three, to remove all the false positives.

I have improved the accuracy of the rule a lot.

You can see the generated results in GitHub (attachment there at the bottom):

I have seen some problems when the verb is ‘ser’ or ‘estar’. Some cases wtih ‘ser’ are ‘o que é…?’ and in an interrogative it shouldn’t be replaced; ‘o que está’ is being replaced by ‘sendo’ when it should be ‘estando’.


Hello Ricardo,

Last night and today, I have fixed tons of false positives.

Here is the original state:

Tomorrow afternoon (can’t remember if these results come out at 5, 6 or 7am, so better to check later than before), check again:

It should have removed dozens of false positives.



A few more false positives removed:

Was there supposed to have a txt file attached for me to look at? Or is the code?

ahhhhhhh… I didn’t attach anything :slight_smile:

I was waiting for the results in the morning.


I have removed tons of false positives:

I have also attached there the results .txt.

