Asturian spell checking update

esbardu · January 10, 2024, 8:09am

Hello all there! After a few years of inactivity, I’ve just resumed the improvement of Asturian LanguageTool. I’m finishing a POS-tagger file to improve the power of the rules, but meanwhile I’d like to provide you with the last Asturian Spell Checker update, since the current one built in LanguageTool is very old and with a lot of errors. The thing is that I’m a newbie with GitHub and I don’t know how to work up there. And I can’t attach you any files here as I am a new user. How could I transfer the files to you? I’d also provide you an update of Asturian grammar.xls with some new rules.
So much thanks in advance!

dnaber · January 10, 2024, 8:16am

Hi Xesús, that’s good news! But I think there’s no way around learning git and GitHub. It would be too much effort in the long run for someone else to commit the files for you. I suggest you make a fork of LT on GitHub and play around with that, learning how to make a pull request etc. There are many tutorials out there and the GitHub docs are also quite good, I think.

esbardu · January 10, 2024, 11:34pm

Thanks for the response, @dnaber. I’ll try to learn the process. I’m searching in the project’s tree and I found that the path for Asturian spelling dictionary data is languagetool-language-modules/ast/src/main/resources/org/languagetool/resource/ast/hunspell. Is this corrrect?
Thanks a lot for your support!

dnaber · January 11, 2024, 9:02am

Yes, that’s the correct path.

esbardu · January 22, 2024, 10:59pm

Hi again. I’ve been trying to learn all the process of Git and GitHub so I can make pull requests to the project. I think I’ve got the idea, but I have, however, a pair of doubts:

I’ve forked the master LT project in my personal GitHub account space. If I want to make a pull request to add, for example, new rules to Asturian version of grammar.xml file, should I generate a branch in my GitHub repository with the updated file and make the pull request from that branch, or should I make it directly in the master repository of my fork? I think (once I revised several tutorials) that master should remain unchanged, isn’t it? I don’t dare take the step in case I mess something up.
I’ve been trying to make a morfologik spelling ast.dict file (from Hunspell files) following the instructions here, using a modificated create_dict.sh (attached here as text file) and the result was a raw file of 25x10^6 words (about 360 Mb) with its correspondig binary (about 600 kb), but it is not complete. Composed words with multiple flags are missing (for example, those formed with enclitic personal pronoums attached to verbs). Could it be possible that I made something wrong, or is possibly due to a limitation of the script? Had other people the same problem than me (specially with galician, catalan, portuguese or spanish languages, wich has also enclitic pronouns)?
Thanks in advance.
create_dict.sh.txt (3.5 KB)