[GSOC] new feature for Language Tool

elenadonts · March 23, 2018, 2:18pm

Hi, Language Tool team!
Sometimes it’s necessary to check grammar rules following on paper version of some text. A possibility to check if document is grammatically valid by just sending a picture is a very useful feature on which I am really interested to work.
Will such new functionality be interesting for you? If yes, I will be glad to provide more detailed information in the draft proposal.
Thank you!

dnaber · March 23, 2018, 2:29pm

Hi, thanks for your suggestion. How would that work, would you use an existing API for OCR, or would you develop your own?

elenadonts · March 23, 2018, 2:41pm

Yes, I would use Tesseract OCR, or something similar. It doesn’t give very precise results often. Sometimes existing ocr don’t support certain fonts and need additional training depending on kind of the text that has to be recognized. So main goal here is improving the results that it can give + implementing a possibility to save the text in different formats (and maybe trying to keep document structure like spacing and so on).

dnaber · March 23, 2018, 3:05pm

Mhhh, I’m not sure, is this really something that belongs to LT? To me, it sounds more like a project for Tesseract.

elenadonts · March 23, 2018, 3:16pm

It is just a multipurpose tool, that can be used for different needs. I do believe it is very useful here, as the person can see mistakes that were made on paper version of some text. After saving the text to the document it is possible to see highlighted places which should be corrected, very comfortable to save it and print/send after.

elenadonts · March 23, 2018, 4:08pm

Let me know please if you are interested in adding this feature

SkyCharger001 · March 23, 2018, 5:12pm

personally I think it’s better to do it the other way around:
Add LT to an OCR-package. (that way the OCR can get grammatical/logical probability on the possible spellings of ‘multi-choice’-glyphs.
EG: (exaggerated) “She had a p@t of gold with her.” substituting ‘@’ with ‘a’ would render the sentence nonsensical, but not ‘o’.)

jusore · March 26, 2019, 11:45am

In the line that @SkyCharger001 suggest, I’d like to use LT in the Firefox window that the Image Reader add-on opens when I do OCR to images. I usually use it for Light Novel (Japanese) and manga translation, where OCR and volunteer people that do their translations, makes lots of mistakes.

Of course should be a nice feature to have Image Reader integrated in the tabs, so when you are reading a manga, and you see a mistake, you can have a text area under it to suggest a correction or better translation, but I suppose it is a task of the Image Reader support team. This could help to collaborative fan translation.