[pt] Coordinating conjunctions

Hi @marcoagpinto and @rjlima

I’m thinking maybe we should add double-worded conjunctions to the added.txt to detect conjunctions like:

sendo assim CC
com isso CC
por tudo isso CC
nesse sentido CC
logo CC
em suma CC
em síntese CC
diante do exposto CC
diante disso CC
diante disto CC
desse modo CC
deste modo CC
por isso CC
por isto CC
logo CC
por conseguinte CC
por consequência CC
isto é CC
ou seja CC
no entanto CC
não obstante CC

The question is, can we add more than one word as a single tag? What do you think about this?

It can be easily done, but more than one word should be added to the file:
multiwords.txt

It is in the folder of: “pre-reform-compounds.txt” (search for this file in the Windows explorer, and you will find the folder).

1 Like

Susana,

If you wish, you don’t need to sort them alphabetically, if it gives too much work.

I will sort it myself in a few days using an app for that.

Yes. This can be done in the multiwords.txt file. This tagging will remove all other previous tags. It is not a good solution if the expression is ambiguous and has different possible taggings. If there is ambiguity, you need to write rules in disambiguation.xml.

@susanabatto that’s a good idea! I would pay attention to some situations:

  • ‘logo’ can be adverb in ‘vou embora logo’
  • ‘isto é’ can be another thing (‘gosto de feijão e isto é bom para a saúde’)
  • even ‘ou seja’ deserves attention (‘seja amarelo ou seja azul eu gosto do boneco’)
    I guess that analyzing the hits and false positives there can be a way to circumvent the cases above.

But will it remove the tags separately? For example, the current tags of the words “sendo” and “assim” when they are not together?

Here are the words ready to insert in multiwords.txt
susana_multiwords_20220718.txt (324 Bytes)

You are right, “logo” is actually already tagged as “CS”. And it’s not a double-worded conjunction, so I wouldn’t add it there anyway :smile:

I’ll try to add disambiguation rules for the ones with double meaning, if not then I’ll leave them out

1 Like

@susanabatto

The words added to multiwords need to use a TAB character before the POS.

The easiest way to do it is to use my tool:
https://proofingtoolgui.org/#downloads

Thanks :heart:

Multiwords.txt will remove the tags only when “sendo assim” are together.

1 Like