Multi word 'spellchecker'

Ruud_Baars · May 20, 2019, 6:44am

\Since there are just too many mistakes made in texts, there is a limit to errors one can detect specifically.
So I thought of a different way.
Let’s collect all groups of words (space separated) and give them a search key (text without spaces). Collect all word groups leading to that key, and count their occurrence count in a (huge) corpus.
Leave only the apparently common ones in that data file, with the key.

When checking a text, generate the word groups again, calculate the key, and check if it is in the data file. If not, it is not a common word group (no alarm; this is okay). If the key is, it could be a case of run-on or run-off etc. Offer the common alternatives in order of commonness.

Same data file could also be used by spell checking to order the suggestions (replace the wrong word with every suggestion, and sort them by commonness).

The data file should be editable, because some word group errors are used more than the correct notation (in Dutch e.g. ‘man/vrouw relatie’ instead of ‘man-vrouwrelatie’).