Spellchecking and postagging combined

Ruud_Baars · August 29, 2017, 12:41pm

As far as I understand, spellchecking and postagging are quite different processes in LT now.
Suppose we would integrate it in a way that checking the words list logic would result in as well as a postag, as a spell checking status.
That way, an uncompounder, built in the checking routine, could also deliver both.

What about this idea?

dnaber · August 29, 2017, 3:13pm

Most (all?) languages in LT don’t have their own spell checking, they simply use hunspell one way or the other, sometimes with extensions of the dictionary. Thus, integrating this into the POS tagger isn’t so easy I think, as we still want to be able to update to more recent hunspell dictionaries in the future. Also, the uncompounder isn’t relevant to that many languages, and German already has one (jWordSplitter) that works well enough. It’s indeed unfortunate that jWordSplitter has yet another dictionary, but I’m not sure it makes sense to spend time on changing that.

Ruud_Baars · August 29, 2017, 3:45pm

I think I will integrate thing in my database, and generate wat is needed for several functions.