Why doesn’t LanguageTool use the OpenCorpora dictionary? Is it too big, does it take a long time to recode and integrate it, or are there some other reasons?
At the time of dictionary integration, the opencorpora project did not exist.
Maybe add some words and their forms from the OpenCorpora dictionary (for example, кабмин, каннабис, канцерогенез, кетамин, кинодебют, лагман) to the file
I wrote a python script that selects the basics of words from the OpenCorpora dictionary and checks them using LanguageTool. These 4 thousand words from the OpenCorpora LanguageTool dictionary are marked as erroneous. So far I have not understood in what form I need to offer them for inclusion in LanguageTool words.zip (20.0 KB)
Looks like the format is here https://github.com/languagetool-org/languagetool/blob/master/languagetool-language-modules/ru/src/main/resources/org/languagetool/resource/ru/tagset.txt. But the abbreviations (ABR) don’t seem to be able to change their form. You can write:
Поступил в вуз.
Поступил в вузах.
It may be useful to store mutable abbreviations as nouns
Is this a suitable form?
a.zip (21.5 KB)