Any progress on multi word spell checking?

Ruud_Baars · March 9, 2019, 6:46am

Some time ago @danielnaber reported someone was working on this. Any news?

dnaber · March 9, 2019, 8:12am

I’m not sure what exactly you are referring to, could you explain?

Ruud_Baars · March 9, 2019, 9:06am

I was suggesting to start some efforts on getting data for ML on groups of words for probability of corrections where all words are in itself correctly spelled, but the group as a whole is not.
Then you made a remark someone was already working on this, and it could take a week or two…

dnaber · March 9, 2019, 9:18am

We’re currently looking at pre-trained language models. If a pre-trained model can be used, no data needs to be collected, or the amount of data we need is smaller. If you’d like to collect data, that’s great, but I currently cannot make any promises about how or when we’d use that data.

Ruud_Baars · March 10, 2019, 2:01pm

I am collecting language data anyway. So if there is use for it, dara for teaining language models for languages that have none yet, is available.