Create a dictionary for French language and use it for spell checking

Hello all,
I’m new in using LT and I would like to parse several files to check spelling errors applying the correction automatically.
I’m using windows and ubuntu - Java 8.
My example file is a text file in utf-8 with this sentence written on it “Remmember how it goes”. Obviously I would like to have “Remember how it goes”.

This is what it happens:

C:\Users\KP\Desktop\LanguageTool-3.3>java -Dfile.encoding=UTF-8 -jar languagetool-commandline.jar -adl -a C:/Users/KP/Desktop/test.txt
Working on C:/Users/KP/Desktop/test.txt…
Using English for file C:/Users/KP/Desktop/test.txt
Remmember how it goes.

It seems that it didn’t correct the sentence… Why?

-adl will only detect the language like “English”, but not its variant, like “English (US)”. This means no spell checking can be activated. So you’ll need to specify the language like -l en-US. Anyway, I don’t think automatically applying suggestions is a good idea. The risk of introducing new errors is high, and several errors don’t have a suggestion and will be ignored.

I did it also with “fr” in a french sentence but it doesn’t work. I would like to correct just a list of words in french where, for example, the accent is missing. Is it possible with LT?

French doesn’t offer corrections for spelling errors (see LanguageTool - Supported Languages).

Oh, I see. Can I improve the spell checker creating a list of words well written as in Development Overview - LanguageTool Wiki ?

The process for spell checking is independent of the grammar rules described on that page. How spell checking works is documented at Spell check - LanguageTool Wiki.

Perfect ! The java libraries used for spelling are all up-to-date? Or, some of them have changed / considered deprecated?

Using the hunspell native code (which comes with LT) is kind of deprecated, although I don’t see it being removed any time soon. If you stick to the morfologik code (which is pure Java) as described in the Wiki, everything should be fine.

1 Like

Can’t find / load main class org.languagetool.dev.SpellDictionaryBuilder

I’ve updated the Wiki page with the latest class name. The parameters might be different, but the class will show its usage when you call it without parameters.

I was looking to my directories… I found org/languagetool but there isn’t a “dev” directory. Maybe for this reason it doesn’t work.

The classes are in the JAR files, so you won’t usually find them directly as files in the file system. The class as given in the wiki should work now.

I’m not a Java expert but this is what it happens:

C:\Users\KP\Desktop\LanguageTool-3.3>java -cp languagetool.jar org.languagetool.
tools.SpellDictionaryBuilder fr_FR C:/Users/KP/Desktop/LanguageTool-3.3/french_d
ict.txt C:/Users/KP/Desktop/LanguageTool-3.3/org/languagetool/resource/fr/french
.info - -o C:/Users/KP/Desktop/LanguageTool-3.3/output.dict
Errore: impossibile trovare o caricare la classe principale org.languagetool.too
ls.SpellDictionaryBuilder

You also need languagetool-tools.jar in your classpath, something like
java -cp languagetool.jar;libs/morfologik-tools.jar …

Please try a recent snapshot from Index of /snapshots/.

1 Like

Now is working but still I can’t create the dictionary file. I think I should put an option + parameter but… Where?

C:\Users\KP\Desktop\LanguageTool>java -cp languagetool.jar org.languagetool.tool
s.SpellDictionaryBuilder fr_FR -i C:/Users/KP/Desktop/LanguageTool/french_dict.t
xt -info org/languagetool/resource/fr/french.info -o C:/Users/KP/Desktop/Languag
eTool/output.dict
Running Morfologik FSACompile.main with these options: [--exit, false, -i, C:\Us
ers\KP\AppData\Local\Temp\SpellDictionaryBuilder257816875475185246.txt, -o, C:\U
sers\KP\Desktop\LanguageTool\output.dict, -f, CFSA2, --overwrite]
Invalid argument: Unknown option: --overwrite

Usage: fsa_compile [options]
  Options:
    --accept-bom
       Accept leading BOM bytes (UTF-8).
       Default: false
    --accept-cr
       Accept CR bytes in input sequences (\r).
       Default: false
    -f, --format
       Automaton serialization format.
       Default: FSA5
       Possible Values: [FSA5, CFSA2]
    --ignore-empty
       Ignore empty lines in the input.
       Default: false
  * -i, --input
       The input sequences (one sequence per \n-delimited line).
  * -o, --output
       The output automaton file.
**Done. The binary dictionary has been written to C:\Users\KP\Desktop\LanguageTool\output.dict**

@jaumeortola This looks like a bug. Is this something you could fix?

It looks like the dictionary was actually created. I see this message when I create my dictionary and it succeeds.
But I agree we need to remove the message. :slight_smile:

Yes it looks like the dict was created but it wasn’t. I already executed the code on Windows 7 and Ubuntu with Java 8 and… Nothing has changed !
Maybe the .txt file with the dictionary word list on it, should have a different layout?
In my file, words are listed in this way - utf-8:

a
b
c
d
...etc 

It looks like FSACompile doesn’t work.

Invalid argument: Unknown option: --overwrite

Moreover, I noticed that the file created by LT (I suppose) C:\Users\KP\AppData\Local\Temp\SpellDictionaryBuilder257816875475185246.txt
disappears few seconds after the code is executed. I thought that it was my antivirus and I disabled it… But again nothing has changed.

Any Suggestions?

It was a bug that should be fixed now. The fix will be in the next daily build, to be published later tonight at Index of /snapshots/

1 Like