Grammar Checking using Naive Bayes Algorithm [further vs farther]

Hi Mility,

I would first go through a document and only apply a model if there is a keyword in a sentence (e.g. there needs to be the word farther or further to use the farther-further model). Then I test the sentence with the model and if the model suggests I use farther when I wrote further, a program will underline the word with red to indicate it is wrong. However, if I wrote farther and the model suggests farther, the program will not do anything.

I haven’t coded any of this for Language Tools yet but am just working on building models for individual rules. This way the models can be used for LanguageTools or could be used by anyone to create their own grammar checker.

btw, I’m planning on using this format to create a bigotry spelling checker. The model will underline sexist, racist, and homophobic sentences. I reckon it’ll be very easy to build.

Hi Troy,
Thanks your explain.
Maybe my explain is not clear.
For example as the sentence below:(the data from your test data)


  "he didn't want to talk about it any", further

Because we know that this sentence should use further, how to decide the position to insert the further?
further may at any position in this sentence. Such as:


 he further didn't want to talk about it any
 he didn't further want to talk about it any
 he didn't want further to talk about it any
 ....

How to determine which of the above sentence is what we want, this is a problem?

Ahh, yeah that could be a problem. I guess this could be fixed by logging the position of the keyword before putting it through the model. So a program could first (1) look through the sentences for a keyword, (2) if it finds a keyword, split up the sentence into words and log the position of the keyword, (3) put the words through the model without the keyword, and (4) find out whether the keyword is an error or not.

However, it may be difficult if there are 2 keywords in a sentence. Such as:

how can we further run farther at the marathon?

This could be another problem.