[pt] Portuguese rule contribution/discussion

@tiagosantos

What if I remove the “o que” and just check for the pronoun and the verb?

Is it a better approach?
“Eu quero ganhar o Euromilhões” -> “Quero ganhar o Euromilhões”
“Eu tenho um grande carro” -> “Tenho um grande carro”
?

Thank you!

Hi Marco,

Looks good, but this may be referred as a style rule. It would be enough to add it to the Style category, and add to the message 'Para melhorar o estilo de escrita, …'
It would be better with some URL or reference attached to it.
You can generalize by removing ‘o que’, and by adding wildcards to the postag. For example:

<token postag_regexp="yes" postag="V...1S.+"></token>

To avoid false positives, you may wish to exclude all ambiguous words. Try this:

<token postag_regexp="yes" postag="V...1S.+"><exception negate_pos='yes' postag_regexp="yes" postag="V...1S.+"/></token>

Hello Tiago,

Here it is the commit:

I have no URL for it since the idea came while I was writing an e-mail or a document (I can’t remember).

:+1:

@tiagosantos

I was watching the news and came up with an idea to a new rule:
“o mesmo período do ano anterior” -> “o período homólogo”

What is the best approach to do it? Shall I create a rule or add to redundancy?

Thanks!

Nice. I believe this fits in wordiness.txt.

Done:

@tiagosantos

Hello!

On Sunday I bought another grammar book at the supermarket.

I made a quick view of it and in one page it had a profession example (but I can’t find that page any more).

I was wondering if professions have been implemented in LT and, if not, do you know a place from where I can get them?

Example:
“o médico do coração” -> “o cardiologista”

If there is such a page, in which file should I add the rules?

Thank you!

Kind regards,

Hello Marco,

Try this:
http://pt.conscienciopedia.org/index.php/Lista_de_profissões

You will only have to type in the definitions like this:

médico do coração=cardiologista
médicos do coração=cardiologistas
médico dos olhos=oftalmologista

There are probably better lists, but this was the first decent one I found.
I can be added to wordiness.txt, but please, add as a separate section and later I’ll make a specific java rule for this list, with a better link and example.

@tiagosantos

Hello!

Is this a good starting approach?:

Looks good. I haven’t found a good way to express this type of error. It is a mix between an eggcorn and a wordy expression. Do you have a good definition for this type of errors?

Ahhhhh… sorry Tiago, only now I noticed this post.

Maybe something like: “Incorrect names of professions” or “names of professions” or “addressing names of professions”?

This is the closest I can come up with.

Edit: or just “professions.txt”?
Edit2: “Incorrect usage of professions”?

@tiagosantos

I was reading the grammar book.

It is written there that some adjectives don’t use degree.

It gives an example with:
o MAIS principal/prévio/semanal.

So, I went to the page of text analysis and I got:
‘AQ0CS0|AQ0MS0’

Just wondering if I could commit this rule:

<rule id='ADJETIVO_SEM_GRAU' name="Adjetivo sem grau">
<!--      Created by Marco A.G.Pinto, Portuguese rule      -->
  <pattern>
    <token>o</token>
	  <token>mais</token>
    <token postag='AQ0CS0|AQ0MS0' postag_regexp='yes'></token>
  </pattern>
  <message>Este adjetivo não admite grau: <suggestion>\1 \3</suggestion>.</message>
<suggestion>\1 \3</suggestion>
  <example correction="o principal">Isto é <marker>o mais principal</marker> a ter em conta.</example>
</rule>

Does it recognise all the adjectives without degree?

Thank you!

Kind regards,

No problem.

Makes sense and fits, but given that we are trying to follow technical terms used by linguists, it would be better to identify the type of error better. It is a barbarism, but that applies to any type of non-standard Portuguese expression. When I find something appropriate, I’ll get back to you.

This would detect regular adjectives as well. There is no POS information for this type of words, so you have to name then individually like: token regexp='yes'>principal/prévio/semanal<

@tiagosantos

Hello!

I was attempting to create a rule but TESTRULES PT gives several errors:

<rule id='TEM_TÊM' name="Confusão: tem - têm">
  <pattern>
      <token postag='DI0MP0|DI0FP0' postag_regexp='yes'></token>
    <marker>
  	<token>tem</token>
    </marker>
  </pattern>
  <message>Substitua por <suggestion>têm</suggestion>.</message>
  <example correction="têm">Uns <marker>tem</marker> sorte no jogo.</example>
</rule>	

This was supposed to work with:
uns
umas
outros
outras
alguns
algumas

I must be stressed as I look at it and can’t find any typo.

Could you help?

Thank you!

Hi Marco,

The rule is perfect. No worries.
This is the disambiguation getting in the way. In this situation Uns acts as the subject (a noun NCMP000) or a pronoun (PI0MP000) when next to a verb.
Opening exceptions for DIs on this rule (P_V) would create too many false positives in other rules. I believe you named all or almost all of them so may I suggest enumerating the options in this situation?

Best regards,

Tiago Santos

Thanks!

I have added it:

Hello @tiagosantos

I was trying to create a rule:
“por em quanto” -> “por enquanto”

But “por em” triggers a rule:
“Vamos fazer isto por em quanto.”

Could you add an exception so that I may code the new rule?

Thank you!

Kind regards,

Hi Marco,
you can push the rule and then I change the priority for that one.
Anyway, that rule should be done as a sub-rule of the main “por em” rule group. If so, create an antipattern on the relevant rule.

@tiagosantos

Could you improve it and insert in the xml?

    <rule id='POR_EM_QUANTO' name="Por enquanto">
<!--      Created by Marco A.G.Pinto and improved by Tiago F. Santos, Portuguese rule      -->
  <pattern>
	<token>por</token>
    <marker>
      <token>em</token>
	  <token>quanto</token>
	</marker>		
  </pattern>
  <message>Substitua por <suggestion>enquanto</suggestion>.</message>
  <example correction="enquanto">Por <marker>em quanto</marker> está a correr bem.</example>
</rule>

Thanks!

Kind regards,