That’s interesting, but pure spelling errors are those errors that LT can already detect. So it’s only interesting if the suggestions are better than what we have now. We also don’t want to be limited to English, so would this work for other languages with fewer data? Are the examples from some prototype or are these just examples?
Currently, the detection of errors is based on hunspell dictionaries. As this is a simple and easily maintainable approach, we should stick to it. For suggestions, I’m not sure I have understood how your approach works. Will it simply suggest the most probably sequence of characters, given an input? What data is the billion word corpus, will it, for example, also work for colloquial style?
The dataset is based on news articles. But getting it to work for colloquial style won’t be a big task , I can combine data from few dataset from different domains like news,reddit , Cornell Movie-Dialogs Corpus etc to make it more generalised , so that it works for both Grammatical Error Correction and Spelling Correction.