[Ru] OpenCorpora dictionary

Andrey · April 30, 2021, 5:38pm

Why doesn’t LanguageTool use the OpenCorpora dictionary? Is it too big, does it take a long time to recode and integrate it, or are there some other reasons?

Yakov · May 1, 2021, 9:44pm

At the time of dictionary integration, the opencorpora project did not exist.

Andrey · May 14, 2021, 10:36pm

Maybe add some words and their forms from the OpenCorpora dictionary (for example, кабмин, каннабис, канцерогенез, кетамин, кинодебют, лагман) to the file

github.com

languagetool-org/languagetool/blob/master/languagetool-language-modules/ru/src/main/resources/org/languagetool/resource/ru/added.txt

# A part-of-speech dictionary that's used additionally to the binary dictionary (*.dict).
# This does not add words to the spell checker, see hunspell/spelling.txt for that.
# File Encoding: UTF-8
# Format: three tab-separated fields: fullform baseform postags
#
мадам	мадам	NN:Name:Fem:PL
полу	пол	NN:Inanim:Masc:Sin:2R
попозже	поздний	ADJ:Comp
# todo - add other forms *Inanim*
обозреватель	обозреватель	NN:Inanim:Masc:Sin:V
#
кили	кили	NN:Name:Masc
нурсултан	нурсултан	NN:Name:Masc:Sin:V
по-американски	по-американски	ADV
по-ангельски	по-ангельски	ADV
по-армянски	по-армянски	ADV
по-болгарски	по-болгарски	ADV
по-венгерски	по-венгерски	ADV
по-вьетнамски	по-вьетнамски	ADV
по-геройски	по-геройски	ADV

This file has been truncated. show original

Andrey · May 18, 2021, 9:52am

I wrote a python script that selects the basics of words from the OpenCorpora dictionary and checks them using LanguageTool. These 4 thousand words from the OpenCorpora LanguageTool dictionary are marked as erroneous. So far I have not understood in what form I need to offer them for inclusion in LanguageTool words.zip (20.0 KB)

Andrey · May 23, 2021, 9:30pm

Looks like the format is here https://github.com/languagetool-org/languagetool/blob/master/languagetool-language-modules/ru/src/main/resources/org/languagetool/resource/ru/tagset.txt. But the abbreviations (ABR) don’t seem to be able to change their form. You can write:
Поступил в вуз.
Поступил в вузах.
It may be useful to store mutable abbreviations as nouns

Andrey · May 27, 2021, 12:21pm

Is this a suitable form?
a.zip (21.5 KB)