Just as an example, the Persian text at the home page of https://languagetool.org/ has several spelling mistakes that the tool does not recognize them:
لطفا متن خود را اینجا قرار دهید . یا بررسی کنید که این متن را برای دیدن بعضی بعضی از اشکال هایی که ابزار زبان توانسته تشخیسهدد. درباره ی نرم افزارهای بررسی کننده های گرامر چه فکر می کنید؟ لطفا در نظر داشته باشید که آنها بی نقص نمی باشند.
The two bolded words in the example are incorrect and the correct words are: تشخیص دهد
It seems the tool is only recognizing the grammar mistakes not the spelling mistakes for Persian.
Is it possible to add the true spell checker to Persian language as well?
There’s currently no maintainer for Persian in LanguageTool, thus basically no work is done for it. If you’d like to become the maintainer, or know someone who would, please let us know.
What exactly should the maintainer do?
I’m interested to help.
Meanwhile, Isn’t there any open dictionary available for Persian to use in the spell checker? What do Google Chrome and Microsoft Word use to spell check the Persian strings?
The maintainers task is to improve error detection rules, write new rules, and generally take care of that language in LT. Here’s a more detailed description. Maintainers don’t need to be software developers, although it helps.
The file is attached. I just added the words in “Replace.txt” file in the same directory as they are true but do not appear in spell checking.
The rules that I have added are under CAT5 category. (I created this category and the type is “misspelling”). This category has more than 350 rules.
I also fixed one or two rules in the previous work.
I created a dictionary in excel with 2 columns, one for misspells and one for the correction. Then made rules from them.
I tested the rules in the stand alone software using a long article from wikipedia. For now everything is fine. I will be in touch if there is any problem. And will send more rules in the future.
Thanks, but I think there’s a problem with your approach: there can be any number of spelling errors, and we cannot write rules for all of them. So I think it would be better to search for an Open Source Persian spell checker / dictionary (probably hunspell-based) and see if we can use that. Technical details are documented in the wiki.
The errors I have added are the most common errors in Persian (part of them by one of the universities in Iran at this page and part of them at this wiki page and this page)
The errors with a particular pattern were already ruled. These errors that I have added are mostly the most common typos in Persian and there is no common pattern for them.
Thank you so much. So how exactly should I use them to modify LT spell checking for Persian?
I mean creating rules from a list of incorrect and correct words is so easy for me (automated).
Should I use these dictionaries to create rules?
Find a recent hunspell dictionary and affix file first; maybe from on of the links. Then maybe one of the programming contributors would be nice enough to add the spellchecker to LT.
In the meantime, I could teach you some Hunspell tricks. Just contact me directly.
The effort will be to find words that are wrong, but accepted, as well as words that are correct, but are not. And check if the suggestions provided are good enough. In all 3 cases, the solution is relatively simple.
The words frequency list I have for you has 1.7 million entries.
For now, you could download www.taaltik.nl/Persian/Persian.zip. This file contains the speller .dic, .aff and a frequency list. It also has the list of words in the frequencylist accepted by this speller, as well as the list of refused words.
This could give you a good start finding words missing in the speller, as well as words that are accepted, but which you consider incorrect.
Is there any updates on this topic?
Any updates for persian language?
There was also a topic in private messages about this which is not replied for a long time.
It would be nice if you help in updating the tool for Persian (Farsi).