[GSOC] new feature for Language Tool


Sometimes it’s necessary to check grammar rules following on paper version of some text. A possibility to check if document is grammatically valid by just sending a picture is a very useful feature on which I am really interested to work.
Will such new functionality be interesting for you? If yes, I will be glad to provide more detailed information in the draft proposal.
Hi, thanks for your suggestion. How would that work, would you use an existing API for OCR, or would you develop your own?


Yes, I would use Tesseract OCR, or something similar. It doesn’t give very precise results often. Sometimes existing ocr don’t support certain fonts and need additional training depending on kind of the text that has to be recognized. So main goal here is improving the results that it can give + implementing a possibility to save the text in different formats (and maybe trying to keep document structure like spacing and so on).

Mhhh, I’m not sure, is this really something that belongs to LT? To me, it sounds more like a project for Tesseract.


It is just a multipurpose tool, that can be used for different needs. I do believe it is very useful here, as the person can see mistakes that were made on paper version of some text. After saving the text to the document it is possible to see highlighted places which should be corrected, very comfortable to save it and print/send after.


Let me know please if you are interested in adding this feature

personally I think it’s better to do it the other way around:
Add LT to an OCR-package. (that way the OCR can get grammatical/logical probability on the possible spellings of ‘multi-choice’-glyphs.
EG: (exaggerated) “She had a p@t of gold with her.” substituting ‘@’ with ‘a’ would render the sentence nonsensical, but not ‘o’.)