I’m thinking maybe we should add double-worded conjunctions to the added.txt to detect conjunctions like:
sendo assim CC
com isso CC
por tudo isso CC
nesse sentido CC
logo CC
em suma CC
em síntese CC
diante do exposto CC
diante disso CC
diante disto CC
desse modo CC
deste modo CC
por isso CC
por isto CC
logo CC
por conseguinte CC
por consequência CC
isto é CC
ou seja CC
no entanto CC
não obstante CC
The question is, can we add more than one word as a single tag? What do you think about this?
Yes. This can be done in the multiwords.txt file. This tagging will remove all other previous tags. It is not a good solution if the expression is ambiguous and has different possible taggings. If there is ambiguity, you need to write rules in disambiguation.xml.
@susanabatto that’s a good idea! I would pay attention to some situations:
‘logo’ can be adverb in ‘vou embora logo’
‘isto é’ can be another thing (‘gosto de feijão e isto é bom para a saúde’)
even ‘ou seja’ deserves attention (‘seja amarelo ou seja azul eu gosto do boneco’)
I guess that analyzing the hits and false positives there can be a way to circumvent the cases above.