I’m seeing that Finnish isn’t supported as a language in LanguageTool, but the wiki does link to Voikko. Are there plans to support Finnish in the future? Is it something that the community would like to add but has no plans on the radar now? Or is “use Voikko” the preferred solution for the foreseeable future?
There are no concrete plans, but contributions are very welcome. We might also add a Finnish spell checker as a first step - but that would require a hunspell dictionary for Finnish and a quick web search didn’t come up with one. Do you know one?
Not really, no. I’m researching options for adding Finnish language spellchecking/grammarchecking to my company’s software, which currently uses the JLanguageTool library. Has anyone looked at bringing in a dictionary from the list at https://www.puimula.org/htp/testing/voikko-snapshot-v5/ ? Is it not possible due to incompatible licenses maybe?
Those dictionaries seem to be binary files that only work with Voikko? Voikko is under GPL it seems, which would make integration into LT impossible/difficult. Of course, locally you can do whatever you want (i.e. as long as you don’t publish the result) and you could integrate Voikko the same way hunspell is integrated into LT. Requires some development with C++/Java integration though. Or integrate it by having Voikko running as a server separately, which would also avoid license issues.
Since Voikko is of no use to us, we would need either a complete words list, or info on how Finnish spelling works. Unfortunately, there is not much info about that in non-Finnish sources.
I do have a Finnish words list that cold be start, based on Wiktionary as well as Finnish words counts. But we would need a good editor to decide what is really Finnish, because there seem to be a lot of dialects and influences.
There is also the plug-in for Mozilla, which is a hunspell version, though the author claims compound words are impossible to do (also claimed by Voikko, but I sincerely doubt that; it is a challenge, but there is enough in Hunspell to make it work…)
Just for the record, it’s this one: https://addons.mozilla.org/firefox/addon/finnish-spellchecker-dict/
What I could easily do is to use my Finnish frequency data and this speller to list words that are not accepted and the alternatives given by it.
If a native Fin could check these words for missing ones, a good enough speller could be made. I would be willing to discuss compounding mechanisms to see if it is really undoable.
First thing I see is that there are at least 100.000 word forms I found that are well documented in a dictionary, but not accepted by the spell checker, some with a very high frequency. These need a manual check.
Second thing that I see is that : is not a word char in the speller, while it is clearly in some (spelled) words.
@aaron-hall: I might be able to help your company in some way or other.