Hyphen in foreign words

Hello @jaumeortola @tiff @danielnaber

<!ENTITY barbarismos3 "aliens?|applets?|backbones?|bits?|bluetooth|bookmarks?|bullying|burnouts?|carjacking|crackers?|cracking|cyberbullying|exabits?|exabytes?|feeders?|führers?|gigabits?|gigabytes?|homebanking|homepages?|hosts?|idem|kilobits?|kilobytes?|loops?|maydays?|megabits?|megabytes?|milkshakes?|newsgroups?|nicknames?|overflows?|petabits?|petabytes?|roadmaps?|royalty|royalties|screenshots?|smartphones?|sockets?|stalking|strings?|tablets?|tags?|terabits?|terabytes?|toolkits?|tweeters?|Unicode|webpages?|wireless|woofers?|yottabits?|yottabytes?|zettabits?|zettabytes?">

If I add [-] will it accept words with hyphen?
“e[-]learning”
?

Thanks!

If the word is already in the Portuguese tagger dictionary (like ‘e-learning’: <S> e-learning[e-learning/NCMS000,</S>]<P/>), yes. Otherwise, no, because it will be tokenized in three tokens (e-book: <S> e[e/CC]-[-/_PUNCT]book[</S>]<P/>). You will need to add “e-book” to added.txt, which I think it is a good idea. Afterwards, you can do whatever you want with “e-book” (allow the word or suggest alternatives).

@jaumeortola

Thanks!

Tomorrow I will restart adding more foreign words to LT :slight_smile:

I didn’t add a ton of them months ago because they had hyphens.