[pt] Phrase without verb

marcoagpinto · October 28, 2019, 9:55am

Hello!

Tiago’s rule says some times that certain sentences lack a verb.

However, some female nouns are said in the Priberam speller to be derivates of a verb and also a noun.

For example, if I write:
“Assim uma boa compensação.”
It says it has no verb, but “compensação” is a noun and a derivate of verb:
https://dicionario.priberam.org/compensação

How do I change Tiago’s rule to accept as a verb:
NCFS000 ending with “ção”.

The rule from grammar.xml is this:

<rule id="NO_VERB" name="Síntaxe: ausência de verbo">
<!-- Created by Tiago F. Santos, Portuguese rule, 2017-07-16 -->
  <antipattern><!-- long holophrases -->
      <token postag='SENT_START'/>
      <token max='2'/>
      <token regexp='yes'>[.!?]</token>
  </antipattern>
  <antipattern><!-- long holophrases -->
      <token postag='SENT_START'/>
      <token postag_regexp='yes' postag="_PUNCT|''"/>
      <token max='3'/>
      <token regexp='yes'>[.!?]</token>
  </antipattern>
  <antipattern><!-- long holophrases -->
      <token postag='SENT_START'/>
      <token/>
      <token postag_regexp='yes' postag="_PUNCT|''"/>
      <token max='2'/>
      <token regexp='yes'>[.!?]</token>
  </antipattern>
  <antipattern><!-- simple holophrases -->
      <token postag='SENT_START'/>
      <token postag_regexp='yes' postag="_PUNCT|''"/>
      <token/>
      <token postag_regexp='yes' postag="_PUNCT|''"/>
      <token max='2'/>
      <token regexp='yes'>[.!?]</token>
  </antipattern>
  <antipattern><!-- long holophrases -->
      <token postag='SENT_START'/>
      <token max='4'/>
      <token>!</token>
  </antipattern>
  <antipattern><!-- Short questions e.g. A querida Maria? -->
      <token postag='SENT_START'/>
      <token postag_regexp='yes' postag="D.+"/>
      <token postag_regexp='yes' postag="[DAN].+"/>
      <token postag_regexp='yes' postag="[AN].+"/>
      <token>?</token>
  </antipattern>
  <antipattern><!-- XXX LibreOffice specific antipattern -->
<!-- TODO Libreoffice plug-in tokenizes enumerations differently, so, it triggers an error that does not exist in the LT-server, nor LT-standalone. Verify. -->
      <token postag='SENT_START'/>
    <marker>
      <token regexp='yes'>\d{1,2}|\p{L}</token>
      <token regexp='yes'>[\.\)]</token>
      <token max='3'/>
      <token regexp='yes'>[.!?]</token>
    </marker>
  </antipattern>
  <pattern>
      <token postag='SENT_START'/>
    <marker>
      <token min='2'>
        <exception postag='V.+|UNKNOWN' postag_regexp='yes'/></token>
      <token skip='-1'>
        <exception postag='V.+|UNKNOWN' postag_regexp='yes'/>
        <exception scope="next" postag='V.+|UNKNOWN' postag_regexp='yes'/></token>
    <and>
      <token regexp='yes'>[.?!…”»]|&quot;</token>
      <token postag='SENT_END'/>
    </and>
    </marker>
  </pattern>
  <message>Esta frase não tem verbo. Confirme que não é uma holófrase.</message>
  <url>https://letratura.blogspot.pt/2007/03/apostila-ao-ciberdvidas-frase-sem-verbo.html</url><!--_XXX_artigo_do_Ciberdúvidas_está_offline_-->
  <short>Frase sem verbo</short>
  <example type="incorrect"><marker>Isto um exemplo.</marker></example>
<!--example type="incorrect"><marker>Isto um exemplo!</marker></example-->
  <example type="correct"><marker>Isto é um exemplo.</marker></example>
  <example><marker>Isto -  Um exemplo</marker></example><!-- XXX Possível título -->
  <example>Tem marcação com o Dr. Mário?</example>
  <example type="incorrect"><marker>Não marcação com o Dr. Mário.</marker></example>
  <example type="correct">Não tem marcação com o Dr. Mário.</example>
  <example>Temos muito tempo.</example>
  <example>Sim, senhor.</example>
  <example>— Sim, senhor Bastos.</example>
  <example>Basta!</example>
  <example>Ó Meu Deus!</example>
  <example>Ó Deus do Céu!</example>
  <example type="incorrect"><marker>Este exemplo também.</marker></example>
  <example type="correct"><marker>Este exemplo também está.</marker></example>
  <example>Tu tinhas um bocado de tempo.</example>
  <example>Quem compra este tipo de arte?</example>
  <example>Esses dois campos representam a união de disciplinas de astronomia e química.</example>
  <example>Não vale a pena.</example>
  <example>Certo, isto está bem. Este exemplo está correto. Certo que este também.</example>
  <example>Este exemplo está correto. Este segundo também. Este terceiro exemplo não.</example>
  <example>Os melhores cumprimentos,</example>
  <example><marker>A.1.</marker> Primeiro Anexo</example>
  <example><marker>9.7.1.</marker> Subcapítulo</example>
  <example>A) Primeiro Anexo.</example>
  <example>a) Primeiro Anexo.</example>
  <example>Cuide da sua alimentação.</example>
  <example>Junte a cebola.</example>
  <example>Mande lembranças minhas para seus pais.</example>
  <example>A guerra trouxe a ruína para o país.</example>
</rule>

Thank you!

tiff · October 28, 2019, 12:38pm

Matching a word that has POS “NCFS000” and that ends with “ção”:

<token postag="NCFS000" regexp="yes">.+ção</token>

Does this help?

marcoagpinto · October 28, 2019, 1:52pm

@tiff

Sorry for the silly question:
Exactly where do I place that line in the code above?

It is too complex for me.

Thank you!

tiff · October 29, 2019, 3:49pm

Sure. I will have a look within the next days.

marcoagpinto · October 29, 2019, 5:54pm

@tiff

I believe the therm means “verbal noun”

A user the other day used that meaning in other topic.

So, the “ção” female nouns are verbal nouns.

marcoagpinto · November 2, 2019, 4:24pm

@tiff

I believe I have managed to do it:

It was a lot easier than I thought.

marcoagpinto · November 3, 2019, 10:04am

@tiff

Hello!

I was seeing the nightly diff and I guess I could ignore some dozens of female nouns which are in fact not really used as verbal nouns.

I noticed that Tiago has at the start of the grammar.xml several lists of words.

How can I create such a list of nouns to ignore in this rule and how to insert them into the existing code in my previous comment?

Thanks!

marcoagpinto · November 3, 2019, 11:21am

@tiff

And tried to do:

<marker>
  <token min='2'>
    <exception postag='V.+|UNKNOWN' postag_regexp='yes'/>
		<exception postag="NCFS000" regexp="yes">.+ção</exception>	
		<exception negate="yes" regexp='yes'>&substantivos_nao_verbais;</exception>			
	  </token>
  <token skip='-1'>
    <exception postag='V.+|UNKNOWN' postag_regexp='yes'/>
    <exception scope="next" postag='V.+|UNKNOWN' postag_regexp='yes'/>
		<exception postag="NCFS000" regexp="yes">.+ção</exception>
		<exception scope="next" postag="NCFS000" regexp="yes">.+ção</exception>
		<exception negate="yes" regexp='yes'>&substantivos_nao_verbais;</exception>
		<exception scope="next" negate="yes" regexp='yes'>&substantivos_nao_verbais;</exception>
	  </token>
<and>
  <token regexp='yes'>[.?!…”»]|&quot;</token>
  <token postag='SENT_END'/>
</and>
</marker>

But TESTRULES PT gives warnings in the examples.

marcoagpinto · November 5, 2019, 5:34am

@Yakov
@dnaber

Do you know the answer?

Thanks!

tiff · November 6, 2019, 7:31am

Can you post the entire rule and the warning that you see?
Sorry for the late response

marcoagpinto · November 6, 2019, 1:41pm

@tiff

Hello!

The original rule didn’t look for verbal nouns (female nouns ending in “ção”), so I have added it.

However, some female nouns with “ção” aren’t verbal verbs, so I created a list of the hits from the nightly diff.

This is the current rule:

<rule id="NO_VERB" name="Síntaxe: ausência de verbo">
<!-- Created by Tiago F. Santos, Portuguese rule, 2017-07-16 -->
  <antipattern><!-- long holophrases -->
      <token postag='SENT_START'/>
      <token max='2'/>
      <token regexp='yes'>[.!?]</token>
  </antipattern>
  <antipattern><!-- long holophrases -->
      <token postag='SENT_START'/>
      <token postag_regexp='yes' postag="_PUNCT|''"/>
      <token max='3'/>
      <token regexp='yes'>[.!?]</token>
  </antipattern>
  <antipattern><!-- long holophrases -->
      <token postag='SENT_START'/>
      <token/>
      <token postag_regexp='yes' postag="_PUNCT|''"/>
      <token max='2'/>
      <token regexp='yes'>[.!?]</token>
  </antipattern>
  <antipattern><!-- simple holophrases -->
      <token postag='SENT_START'/>
      <token postag_regexp='yes' postag="_PUNCT|''"/>
      <token/>
      <token postag_regexp='yes' postag="_PUNCT|''"/>
      <token max='2'/>
      <token regexp='yes'>[.!?]</token>
  </antipattern>
  <antipattern><!-- long holophrases -->
      <token postag='SENT_START'/>
      <token max='4'/>
      <token>!</token>
  </antipattern>
  <antipattern><!-- Short questions e.g. A querida Maria? -->
      <token postag='SENT_START'/>
      <token postag_regexp='yes' postag="D.+"/>
      <token postag_regexp='yes' postag="[DAN].+"/>
      <token postag_regexp='yes' postag="[AN].+"/>
      <token>?</token>
  </antipattern>
  <antipattern><!-- XXX LibreOffice specific antipattern -->
<!-- TODO Libreoffice plug-in tokenizes enumerations differently, so, it triggers an error that does not exist in the LT-server, nor LT-standalone. Verify. -->
      <token postag='SENT_START'/>
    <marker>
      <token regexp='yes'>\d{1,2}|\p{L}</token>
      <token regexp='yes'>[\.\)]</token>
      <token max='3'/>
      <token regexp='yes'>[.!?]</token>
    </marker>
  </antipattern>
  <pattern>
      <token postag='SENT_START'/>
    <marker>
      <token min='2'>
        <exception postag='V.+|UNKNOWN' postag_regexp='yes'/>
		<exception postag="NCFS000" regexp="yes">.+ção</exception>			
	  </token>
      <token skip='-1'>
        <exception postag='V.+|UNKNOWN' postag_regexp='yes'/>
        <exception scope="next" postag='V.+|UNKNOWN' postag_regexp='yes'/>
		<exception postag="NCFS000" regexp="yes">.+ção</exception>
		<exception scope="next" postag="NCFS000" regexp="yes">.+ção</exception>
	  </token>
    <and>
      <token regexp='yes'>[.?!…”»]|&quot;</token>
      <token postag='SENT_END'/>
    </and>
    </marker>
  </pattern>
  <message>Esta frase não tem verbo. Confirme que não é uma holófrase.</message>
  <url>https://letratura.blogspot.pt/2007/03/apostila-ao-ciberdvidas-frase-sem-verbo.html</url><!--_XXX_artigo_do_Ciberdúvidas_está_offline_-->
  <short>Frase sem verbo</short>
  <example type="incorrect"><marker>Isto um exemplo.</marker></example>
<!--example type="incorrect"><marker>Isto um exemplo!</marker></example-->
  <example type="correct"><marker>Isto é um exemplo.</marker></example>
  <example><marker>Isto -  Um exemplo</marker></example><!-- XXX Possível título -->
  <example>Tem marcação com o Dr. Mário?</example>
<!--      <example type="incorrect"><marker>Não marcação com o Dr. Mário.</marker></example>-->
  <example type="correct">Não tem marcação com o Dr. Mário.</example>
  <example>Temos muito tempo.</example>
  <example>Sim, senhor.</example>
  <example>— Sim, senhor Bastos.</example>
  <example>Basta!</example>
  <example>Ó Meu Deus!</example>
  <example>Ó Deus do Céu!</example>
  <example type="incorrect"><marker>Este exemplo também.</marker></example>
  <example type="correct"><marker>Este exemplo também está.</marker></example>
  <example>Tu tinhas um bocado de tempo.</example>
  <example>Quem compra este tipo de arte?</example>
  <example>Esses dois campos representam a união de disciplinas de astronomia e química.</example>
  <example>Não vale a pena.</example>
  <example>Certo, isto está bem. Este exemplo está correto. Certo que este também.</example>
  <example>Este exemplo está correto. Este segundo também. Este terceiro exemplo não.</example>
  <example>Os melhores cumprimentos,</example>
  <example><marker>A.1.</marker> Primeiro Anexo</example>
  <example><marker>9.7.1.</marker> Subcapítulo</example>
  <example>A) Primeiro Anexo.</example>
  <example>a) Primeiro Anexo.</example>
  <example>Cuide da sua alimentação.</example>
  <example>Junte a cebola.</example>
  <example>Mande lembranças minhas para seus pais.</example>
  <example>A guerra trouxe a ruína para o país.</example>
</rule>

But I tried to make it work and it gave errors.

How do I do it?

Also, there is still another improvement that needs to be done in this rule, but for now I will stay with this.

Thanks!