POSTAGs are a feature of the morphological dictionary that catalogues the words by their use in the language and tags them in relation to one another or to a common base form. Some rules are based on these tags instead of words, so they can apply automatically to all words (or word relations) of one group.
All datasets have errors and this is no exception. The most ubiquitous POSTAG errors have been found and fixed but there a still a few hundred there to be found. That is not much (considering the hundreds of thousands postags), but is a too daunting task to systematically find all these misbehaving items, so, any error reporting of this kind is welcome.
If you found one or you have a rule that should be working and it is not:
Great find Marco.
puseste pôr VMIS2P0 (just the plural, right?)
I will add later in the daily dictionary fix commit, as well as other improvements to yesterday’s rules commit.
Tiago, how do you convert:
SPS00|VMN0000|VMIP1S0|VMIP2S0|VMIP3S0|VMIP1P0|VMIP2P0|VMIP3P0|VMIS1S0|VMIS2S0|VMIS3S0|VMIS1P0|VMIS2P0|VMIS3P0
into [bla blah][blah blah]?
I tried several approaches but the stand-alone tool doesn’t recognise them:
SPS00|VMN0000|VM[IP1][IP2][IP3]S0|VM[IP1][IP2][IP3]P0|VM[IS1][IS2][IS3]S0|VM[IS1][IS2][IS3]P0
Verb “pôr”.
The first is “para”, “até”, etc.
The second is the infinitive of a verb.
Rule: “ai” > “aí”
When you make [ ] each letter inside the brackets is an option. I also found while porting spanish rules a great simplification the .
For example, you want all verb forms that are in the indicative and that are only singular. You can write something like:
V.I…S. and that would work.
From your example you want all propositions and all verb forms (many postags missing but I believe that is it).
That would be just SPS00|V.*
If somebody notices something wrong in my interpretation, just correct me. I am still figuring out the sintaxe and all new tricks are welcome.