POS of numbers with % and º

@tiff @jaumeortola

Hello!

As you noticed, I have been fixing tons of false positives in the Portuguese number agreement rules.

This morning I made some tests with % and º and I failed because the morphologic tool on-line:
https://community.languagetool.org/analysis/index?lang=pt
states that they are nouns.

So, if I type there 25% or 10º the morphologic tool claims that each one is a noun and doesn’t separate the symbols.

Is there a way of making the separation work?

I need it to fix rules that will deal with:

O arquipélago situa-se no nordeste do Oceano Atlântico entre os 36º e os 43º de latitude Norte e os 25º e os 31º de longitude Oeste.

A ancestralidade africana foi de 80,4%, a europeia 10,8% e a indígena 8,8%.

Thanks

There are two ways to handle this:

  1. split the number and the sign in word tokenizer
  2. leave it joined and update the tagger to tag it as a number
    In Ukrainian module we do the 2nd approach (this allows the rules that react to numbers to also react to percent and degree with no additional logic)

How do I do 1) ?

Thanks!

Or 2)

You could do this in the disambiguator. Use a regexp for the special chars, remove the postag and/or add the new one. It is in the disambiguator wiki how to exactly

Ahhh… thank you, I have found out how to do it.

:slight_smile: