[pt] Portuguese tagset file

@tiagosantos

Hello Tiago,

As I have been adding POSs to words, I have been creating a list of the codes based on words of the same kind.

For example: if the Priberam speller says that “gato” is a “s.m.” and “gato” had no POS, I would look in the LT Morphological database for a word of the same kind “copo” and copy the POS from it.

Can I (or you) add the list of the current POS to the file: tagset_PT.txt

Also, should it state that it is based on Priberam?

Here is what I gathered so far:

adj. 2 g.
informacional | adj. 2 g.
AQ0CS0
masc. e fem. pl. de informacional
AQ0CP0


adj. 2 g. 2 núm.
unissexo | adj. 2 g. 2 núm.
AQ0CN0


adj. 2 g. s. 2 g.
budista | adj. 2 g. s. 2 g.
AQ0CS0
NCCS000


adj. s. f.
tomadora | adj. s. f.
AQ0FS0
NCFS000
tomadoras | fem. pl. de tomador
AQ0FP0
NCFP000


adj. s. m.
tomador | adj. s. m.
AQ0MS0
NCMS000
tomadores | masc. pl. de tomador
AQ0MP0
NCMP000


adv.
sintaticamente | adv.
RG


gerúndio de verbo transitivo
bebendo | gerúndio de beber
VMG0000


prep.
por | prep.
SPS00


s. f.
garrafa | s. f.
NCFS000
garrafas | fem. pl. de garrafa
NCFP000


s. m.
frasco | s. m.
NCMS000
frascos | masc. pl. de frasco
NCMP000


v. tr.
beber | v. tr
VMN0000
VMN01S0
VMN03S0
VMSF1S0
VMSF3S0 


v. tr. | v. pron.
reduzir | v. tr. | v. pron.
VMN0000
VMSF1S0
VMSF3S0


################################################
################################################
################################################
################################################

There is already a tagset that details how POS are encoded. No need for a second one.

For word identification and classification, please use the ‘Vocabulário Ortográfico Comum da Língua Portuguesa’, which is the one with open use credential and it is the one that other dictionaries have to abide by.

If you wish to reference it, do it as:
Ferreira, José Pedro; Correia, Margarita; Almeida, Gladis de Barcellos (eds.) (2017). Vocabulário Ortográfico Comum da Língua Portuguesa . Praia: Instituto Internacional da Língua Portuguesa / Comunidade dos Países de Língua Portuguesa.
Don’t add words to spelling before crosschecking and confirming the words exist in all variants.
For one example:
http://voc.cplp.org/index.php?action=lemma&id=26482