[pt] Status of rule creation — 2022-07-04

Hello @rjlima and team,

I am attaching here an .odt with all the current rules in the main grammar.xml using colours.

The colour orange means that the rules have been improved, or that there isn’t much more to improve on them and are alphabetically sorted.

The rules in green mean that they use the new tagger dictionary (SPS00:blah blah blah) and all my current knowledge, and are also sorted alphabetically.

Also, I need to rewrite or improve some orange rules and turn them to green. All rules must use the new tagger and knowledge.

Thanks!

Kind regards from,
>Marco A.G.Pinto
----------------------

Rules Structure 20220704.odt (63.9 KB)

1 Like

Ahhhh… @rjlima

https://internal1.languagetool.org/regression-tests/via-http

If you want to see which rules produce more or less hits, select to show the whole pt-PT, and it will be sorted from higher hits to lower hits.

That’s a lot of work! Any place where I should start contributing?
I’ve seen some orange rule (as acentuação vogal enclise and tera/terá) that could already be green? These two are working correctly at online LT.

I have been writing down ideas for rules.

I wanted to ask for your suggestions regarding two rules which may be very useful while writing theses and other scientific papers:

O bit é o nível mais baixo existente nos computadores, por isso existe em maior quantidade.
por isso existe=existindo
por isso existem=existindo

for any verb, “por isso há”=“havendo”, etc. Just the verb changes.

por + isso + verb = verb_gerund

Other rule:

e não sabiam nada=e nada sabiam/nada sabia
O Rui não sabia nada de matemática=o Rui nada sabia de matemática
não + verb + nada = nada + verb

I have been writing them down and very eager to test them on my thesis… I hope they will have several hits.

@rjlima
Do you have good names IDs, description and suggestion messages for the two rules above?

Look at the list of possible suggestions I have written down so far (for all rules):


      <message>Em certos contextos, esta perífrase pode ser simplificada.</message>
	  <message>Enriqueça a linguagem para causar mais impacto ao leitor.</message>
	  <message>Esta perífrase pode ser simplificada.</message>
	  <message>Esta perífrase poderá ser simplificada.</message>
	  <message>Expressão vulgar, pondere empregar:</message>
	  <message>Possível confusão de termos.</message>
	  <message>Se for um texto académico, pondere melhorar a linguagem.</message>
	  <message>Se for um texto académico/científico, pondere melhorar a linguagem.</message>
	  <message>Se for um texto académico/científico, pondere empregar o termo 'imprecisão'.</message>
	  <message>Se for um texto académico/científico, pondere empregar o termo 'exato'.</message>
	  <message>Se for uma tese de doutoramento, verifique se o 'tom' de redação é o apropriado.</message>
	  <message>Se estiver a referir-se a fármacos ou afins, empregue o termo 'embalagem'.</message>
	  <message>Se estiver a referir-se a fármacos ou afins, empregue o termo 'tomar'.</message>

I was revising my thesis with the latest nightly, and LibreOffice hangs most of the time I right-click in marked words… then the Lenovo app was crashing… then I rebooted and Windows 10 was detecting my USB connected printer as being connected and disconnected (McAfee: “an external drive has been connected, scan the disk blah blah blah?” the printer has an usb slot).

Tomorrow I can implement the two rules above and then I hope tomorrow’s nightly won’t crash LibreOffice so that I can test it all over my text.

:smile: :smile: :smile:

Thanks!

Hi @marcoagpinto ,
On the ‘existindo’ rule, take a look at:
É proibido fazer isso, por isso existe uma multa muito grande → É proibido fazer isso, existindo uma multa muito grande.
I’m not sure if ‘existindo’ fits here. The difference is that in your example the subject of ‘existe’ is the same of the main clause, and in my example it is not.
On the ‘nada’ rule, at first, it seems a nice idea!

@rjlima

Hello!

Thanks, I will start with the “nada” rule tomorrow at 5am.

Then, I will make some tests with the other rule idea just to see what happens… maybe one can create some exceptions to certain cases.

Do you have good IDs, rule names and suggestion messages for them?

Thanks!

:smile:

:smile: :smile: :smile: :smile: :smile: :smile: :smile:

Ahhhh… the notes I have written down:

<rule id='DEVE_DE' name="Deve">

It produces tons of false positives… I have been too lazy to read the article that explains the use of “deve de”, I guess if you have a “deve de ser” we change to “deve ser”, but is it for any verb form or just the infinitive? I guess it shouldn’t accept any other possible POS instead of infinitive? nouns, adjectives, etc.?

Sorry for being a lazy arse on this rule :smile: :smile: :smile:

I have been writing down most of the problems that appear while I use Thunderbird with the LanguageTool add-on, and also checking posts from Facebook.

:smile:

@rjlima

I am still working on the rule: SIMPLIFICAR_NÃO_VERBO_NADA_NADA_VERBO

What seemed like a very simple rule, turned out to be very complicated:
BEFORE:

Portuguese (Portugal): 2550 total matches
Portuguese (Portugal): ø0.00 rule matches per sentence
Portuguese (Portugal): 17349 input lines ignored (e.g. not between 10 and 300 chars or at least 4 tokens)

Current:

Portuguese (Portugal): 272 total matches
Portuguese (Portugal): ø0.00 rule matches per sentence
Portuguese (Portugal): 17349 input lines ignored (e.g. not between 10 and 300 chars or at least 4 tokens)

@rjlima

Hello!

I believe it is okay now, but if you find any false positives, please let me know.

Only at 5am, I will be less stressed to fix them.

I have attached the 18.txt results file in the commit.

Thanks!

@rjlima

Tons of fixes in the NADA rule: