Performance of Stanford POS tagger

Hi,

We are making a website/software to help Indian students write better English essays. For that, we are using LanguageTool currently. Later on, we will go onto other Indian languages.

For better outcomes, we are looking at whatever improvements we can do to the current system. Stanford’s POS tagger claims 97.24% accuracy on the Penn Treebank WSJ (according to http://nlp.stanford.edu/~manning/papers/tagging.pdf). Given this number is slightly higher than OpenNLP’s tagger, we wanted to check out Stanford’s tagger. When we tested on different manually tagged text, both gave similar performance.

Does anybody know why this could have happened? Has anybody verified the performance of Stanford’s tagger?

Regards,
Ilyeech Kishore