[GSoC] Extending AI approach

bharatprakash · March 28, 2018, 2:13pm

@drex
When you train the seq2seq model how does the parallel corpus look like?
Is it [incorrect sentence] , [correct sentence]?

I tried something similar to this with average success.
character level encoding was able to correct small sentences and phrases.

For longer sentences, its been not very successful.
Again the main issue I am struggling with is getting a good quality large dataset.
Have you tried NUCLE and lang8 dataset?

drex · March 29, 2018, 5:18am

Hey, I am referring to the the ngram model. More specifically the “module 3” in my proposal uses the method described here.

drex · March 29, 2018, 5:22am

Hey,
No, my model doesn’t try to convert incorrect sentences to correct ones. It is trained to detect errors only. So, the parallel corpus would look like this :
[“What is you’re name” ] -> [0 0 1 0]
If the ith word in the output is ‘1’ that means the ith word in the input is detected as an error word.