[pt] Portuguese rule contribution/discussion

@tiagosantos

Hello!

Did you get the funding to improve the disambiguator?

If so, can I start checking my thesis from the first page in order to see the issues?

Thanks!

Any computer assistance needs critical thinking to be used. I would add, like anything else.
This is no exception, and you will need it to evaluate the suggestions.

Wikipedia Portuguese committe/bureaucrats prefer “minimal corrections”, and that grant was foiled. They are using only the same replacement scripts they have been using for years (“dumb replacements”). They are similar to this:

Regarless, a great deal of work has been done on that area, although I am not working on a pseudo-chunker via disambiguation.

Other teams, after a decade of development still remove false positives. If you need extra precision, maybe better to check back in 10 years from now.

This can be done regardless.

@tiagosantos

Yesterday I spent hours working on my medical process.

Here are two possible false positives there:

  1. Todos os documentos que passaram pela minha mão foram digitalizados e constam neste “processo médico”.

  2. Tenho ao longo dos anos comprado livros e equipamentos para realizar esse sonho, mas agora passou apenas a ser uma ilusĂŁo.

Are they easy to fix?

Thanks!

Kind regards from your friend,

Todos os documentos que passaram pela minha mão foram digitalizados e constam neste “processo médico”.

Needs a chunker/pseudochunker to recognize ‘que passaram pela minha mão’.

Tenho ao longo dos anos comprado livros e equipamentos para realizar esse sonho, mas agora passou apenas a ser uma ilusĂŁo.

Passives (comprado) are already disambiguated, but ‘ao longo dos anos’ adverbial phrase needs to be tagged or a pseudochunker. This specific case is trivial enough to solve, and this tagging will help other rules as well, so I will push a fix for this this week.

@tiagosantos

I believe I have found a false positive.

I am not sure if it can be fixed since we will enter feature freeze, but here it is:
“O pai diz que vai ser operado amanhã e que não se pode constipar.”

It says: “Possível gerundismo”

Could you have a look at it?

Thanks!

Kind regards,

@@ -22627,7 +22627,7 @@ TODO Write better examples
   <rulegroup id='IR_AS_FUTURE_AUXILIARY' name="IR (Presente/Futuro/Condicional) + Infinitivo" default='on'>
     <!--    Created by Tiago F. Santos, 2017-04-12     -->
       <url>https://ciberduvidas.iscte-iul.pt/consultorio/perguntas/o-verbo-ir-como-auxiliar-de-si-proprio/5679</url>
-      <short>PossĂ­vel gerundismo.</short>
+      <!--short>PossĂ­vel gerundismo.</short-->
     <rule>
       <pattern>
           <token postag_regexp='yes' postag='V..[CF].+' inflected='yes'>ir</token>

This will ‘fix’ it.

How’s that going?

@tiagosantos

:frowning:

I haven’t had the chance of doing it yet… so many things going on… I have only checked partially my thesis and found a few false positives, but I didn’t write them down.

I guess my report won’t be posted before this release of LanguageTool.

I have been redoing the whole PhD project simulations and have been updating the thesis with the new data, but it is basically graphics and data.

I still have to do a full grammar check from top to bottom, but not sure when :frowning:

@tiagosantos

Hello!

“O pai diz que vai ser operado amanhã e que não se pode constipar.”

still says “possível gerundismo” with the nightly :frowning:

@tiagosantos

Hello!

I found a false positive:
“Por em quanto é o que importa.”
It suggests “pôr”.

Could you have a look at it?

Thanks!

Kind regards,

EDIT:
@tiagosantos

Sorry, what I wanted to do was to create the rule:
“por em quanto” > “por enquanto”

Maybe you should add a pattern?

Thanks!

  <rulegroup id='CONFUSION_POR' name="ConfusĂŁo: por - pĂ´r">
    <!--      Created by Marco A.G.Pinto, 2017-07-08      -->
    <rule>
      <antipattern>
          <token>por</token>
          <token regexp='yes'>&adverbios_lugar;</token>
          <token postag_regexp='yes' postag='V.+'/>
      </antipattern>
      <pattern>
          <token>de</token>
          <token>por</token>
      </pattern>
      <message>Se se refere ao verbo, substitua por <suggestion>de pĂ´r</suggestion>.</message>
      <example correction="de pĂ´r">Temos <marker>de por</marker> os pratos na mesa!</example>
      <example>A sua importância Ê atestada pelo facto de por lå passar uma das vias romanas.</example>
    </rule>
    <rule>
      <pattern>
          <token>por</token>
          <token>em</token>
      </pattern>
      <message>Substitua por <suggestion>pĂ´r em</suggestion>.</message>
      <example correction="pĂ´r em">Temos de <marker>por em</marker> prĂĄtica tudo o que aprendemos!</example>
    </rule>
  </rulegroup>

Last time I checked, you were a LanguageTool member.
I have already fixed of your rules already. From now on, each rule I have to remodel, I will disable one of your rules and create my own version.
Want me to fix it?

Yes, it is okay.

Could you just add a comment such as “It is a fix of Marco A.G.Pinto rule”?

I am not greedy, I don’t mind losing rules if that helps the users.

After all, I am doing it to help the community.

This is not about greediness. It is about lazyness.

And what exactly is that?

Anyone can see false positives. If you looked at git log you would see that false positives are fixed everyday by me. Many of them in your rules.
So, if you are a maintainer, do what a maintainer is supposed to do. You have commit permissions.

:frowning: :frowning: :frowning:

@tiagosantos

I found some incorrect suggestions for the sentence:
“Tu precisas dos tuas queridas amigas.”

It should suggest “das tuas”.

:slight_smile:

Also, I will try to be less lazy and add more suggestions for 4.0. Also notice that I have been involved on LibreOffice’s autocorrect, adding tons of words by hand and on the British dictionary, adding an average of ~400 words by month.

On Sunday I bought another grammar book.

Every year, during the school starting time, the supermarket where I work always sells a new grammar book, so I have been buying one just about every year.

But you are right, I am a lazy ass :frowning:

Too much work for too little effect. The user is informed correcly and half of the time he has a valid suggestion.
This agreement rules are rigid because conjunctions need a better tokenization mechanism, which I haven’t implemented yet.

Do what you can. Nobody ask more than that. Just change the attitude from “What can you do?” to “What can I do?” and then do it.

@tiagosantos

Help!

I am trying to create this rule, but TESTRULES PT gives errors:

   <rule id='PÔR_UMA_PERGUNTA' name="fazer uma(s) pergunta(s)">
    <!--      Created by Marco A.G.Pinto, Portuguese rule      -->
      <pattern>
          <token inflected='yes'>pĂ´r</token>
          <token regexp="yes">umas?</token>
		  <token regexp="yes">perguntas?</token>
      </pattern>
      <message>Substitua por <suggestion><match no='1' postag='V.+'>fazer</match></suggestion>.</message>
        <suggestion><match no='1' postag='V.+'>fazer</match> \2 \3</suggestion>
      <example correction="fazer uma pergunta">Vou <marker>pĂ´r uma pergunta</marker> pertinente.</example>
    </rule>	

What is wrong with it?

Thank you!

Kind regards,

PS-> maybe I should also replace<token regexp="yes">umas?</token> with postags?

<message>Substitua por <suggestion><match no='1' postag='V.+'>fazer</match></suggestion>.</message>

needs to be:

<message>Substitua por <match no='1' postag='V.+'>fazer</match>.</message>

You can add postag=‘D.+’ postag_regexp=‘yes’ min=‘0’. I think that would generalize with causing false positives.

@tiagosantos

I have committed the rule!

Thank you very much for the help!

PS->I know I have been a lazy ass, but since 3.9 is out that I have already created three
or four rules. I will try to continue doing it.