Back to LanguageTool Homepage - Privacy - Imprint

ML: wrong word in context

Would it be possible to have a machine learning rule for wrong word in context? Training it using sentences or even paragraphs, to determine word relation strength for word pairs?
Or even decide if a word is to be unexpected in a sentence like this?

I think this idea can be very useful: combining simple rules like “wrong word in context” (or similar ones) with data extracted from corpus.
I have done it, but to be useful in other languages I need to clean the code and put it in a more friendly way. It could be a simple Java command-line program that takes as input a corpus and a few parameters. I add it to my to-do list.
Has anyone worked on similar ideas, @dnaber?

Isn’t that basically what the ngram rule does with its confusion pairs?

It has more or less the same goal, but using other means.

There are differences. “Wrong word in context” looks at every word in the sentence. In fact, we try to determine the “topic” of the sentence with a simple list of words.

On the other hand, the n-gram looks only at 3-5 words after or before the word, and it needs huge files in execution.

In general, I think that very sophisticated ML or AI approaches are not so promising for grammar checking. Combining ML and simple rules (that can be manually tweaked) will give better results. I will try to develop these ideas further.

Could work on longer sentences, but might work better on paragraph or even document level …