POS of numbers with % and º

marcoagpinto · June 3, 2020, 11:53am

Hello!

As you noticed, I have been fixing tons of false positives in the Portuguese number agreement rules.

This morning I made some tests with % and º and I failed because the morphologic tool on-line:
https://community.languagetool.org/analysis/index?lang=pt
states that they are nouns.

So, if I type there 25% or 10º the morphologic tool claims that each one is a noun and doesn’t separate the symbols.

Is there a way of making the separation work?

I need it to fix rules that will deal with:

O arquipélago situa-se no nordeste do Oceano Atlântico entre os 36º e os 43º de latitude Norte e os 25º e os 31º de longitude Oeste.

A ancestralidade africana foi de 80,4%, a europeia 10,8% e a indígena 8,8%.

Thanks

arysin · June 5, 2020, 4:07pm

There are two ways to handle this:

split the number and the sign in word tokenizer
leave it joined and update the tagger to tag it as a number
In Ukrainian module we do the 2nd approach (this allows the rules that react to numbers to also react to percent and degree with no additional logic)

marcoagpinto · June 5, 2020, 10:34pm

How do I do 1) ?

Thanks!

marcoagpinto · June 5, 2020, 10:35pm

Or 2)

Ruud_Baars · June 6, 2020, 5:46am

You could do this in the disambiguator. Use a regexp for the special chars, remove the postag and/or add the new one. It is in the disambiguator wiki how to exactly

marcoagpinto · June 6, 2020, 2:44pm

Ahhh… thank you, I have found out how to do it.