Back to LanguageTool Homepage - Privacy - Imprint

Create a dictionary for French language and use it for spell checking


(SP) #1

Hello all,
I'm new in using LT and I would like to parse several files to check spelling errors applying the correction automatically.
I'm using windows and ubuntu - Java 8.
My example file is a text file in utf-8 with this sentence written on it "Remmember how it goes". Obviously I would like to have "Remember how it goes".

This is what it happens:

C:\Users\KP\Desktop\LanguageTool-3.3>java -Dfile.encoding=UTF-8 -jar languagetool-commandline.jar -adl -a C:/Users/KP/Desktop/test.txt
Working on C:/Users/KP/Desktop/test.txt...
Using English for file C:/Users/KP/Desktop/test.txt
Remmember how it goes.

It seems that it didn't correct the sentence... Why?


French grammar checking lacking
(Daniel Naber) #2

-adl will only detect the language like "English", but not its variant, like "English (US)". This means no spell checking can be activated. So you'll need to specify the language like -l en-US. Anyway, I don't think automatically applying suggestions is a good idea. The risk of introducing new errors is high, and several errors don't have a suggestion and will be ignored.


(SP) #3

I did it also with "fr" in a french sentence but it doesn't work. I would like to correct just a list of words in french where, for example, the accent is missing. Is it possible with LT?


(Daniel Naber) #4

French doesn't offer corrections for spelling errors (see https://languagetool.org/languages/).


(SP) #5

Oh, I see. Can I improve the spell checker creating a list of words well written as in http://wiki.languagetool.org/development-overview ?


(Daniel Naber) #6

The process for spell checking is independent of the grammar rules described on that page. How spell checking works is documented at http://wiki.languagetool.org/hunspell-support#toc1.


(SP) #7

Perfect ! The java libraries used for spelling are all up-to-date? Or, some of them have changed / considered deprecated?


(Daniel Naber) #8

Using the hunspell native code (which comes with LT) is kind of deprecated, although I don't see it being removed any time soon. If you stick to the morfologik code (which is pure Java) as described in the Wiki, everything should be fine.


(SP) #9

Can't find / load main class org.languagetool.dev.SpellDictionaryBuilder


(Daniel Naber) #10

I've updated the Wiki page with the latest class name. The parameters might be different, but the class will show its usage when you call it without parameters.


(SP) #11

I was looking to my directories.. I found org/languagetool but there isn't a "dev" directory. Maybe for this reason it doesn't work.


(Daniel Naber) #12

The classes are in the JAR files, so you won't usually find them directly as files in the file system. The class as given in the wiki should work now.


(SP) #13

I'm not a Java expert but this is what it happens:

C:\Users\KP\Desktop\LanguageTool-3.3>java -cp languagetool.jar org.languagetool.
tools.SpellDictionaryBuilder fr_FR C:/Users/KP/Desktop/LanguageTool-3.3/french_d
ict.txt C:/Users/KP/Desktop/LanguageTool-3.3/org/languagetool/resource/fr/french
.info - -o C:/Users/KP/Desktop/LanguageTool-3.3/output.dict
Errore: impossibile trovare o caricare la classe principale org.languagetool.too
ls.SpellDictionaryBuilder

(Andriy) #14

You also need languagetool-tools.jar in your classpath, something like
java -cp languagetool.jar;libs/morfologik-tools.jar ...


(Daniel Naber) #15

Please try a recent snapshot from https://languagetool.org/download/snapshots/?C=M;O=D.


(SP) #17

Now is working but still I can't create the dictionary file. I think I should put an option + parameter but... Where?

C:\Users\KP\Desktop\LanguageTool>java -cp languagetool.jar org.languagetool.tool
s.SpellDictionaryBuilder fr_FR -i C:/Users/KP/Desktop/LanguageTool/french_dict.t
xt -info org/languagetool/resource/fr/french.info -o C:/Users/KP/Desktop/Languag
eTool/output.dict
Running Morfologik FSACompile.main with these options: [--exit, false, -i, C:\Us
ers\KP\AppData\Local\Temp\SpellDictionaryBuilder257816875475185246.txt, -o, C:\U
sers\KP\Desktop\LanguageTool\output.dict, -f, CFSA2, --overwrite]
Invalid argument: Unknown option: --overwrite

Usage: fsa_compile [options]
  Options:
    --accept-bom
       Accept leading BOM bytes (UTF-8).
       Default: false
    --accept-cr
       Accept CR bytes in input sequences (\r).
       Default: false
    -f, --format
       Automaton serialization format.
       Default: FSA5
       Possible Values: [FSA5, CFSA2]
    --ignore-empty
       Ignore empty lines in the input.
       Default: false
  * -i, --input
       The input sequences (one sequence per \n-delimited line).
  * -o, --output
       The output automaton file.
**Done. The binary dictionary has been written to C:\Users\KP\Desktop\LanguageTool\output.dict**

(Daniel Naber) #18

@jaumeortola This looks like a bug. Is this something you could fix?


(Andriy) #19

It looks like the dictionary was actually created. I see this message when I create my dictionary and it succeeds.
But I agree we need to remove the message. :slight_smile:


(SP) #20

Yes it looks like the dict was created but it wasn't. I already executed the code on Windows 7 and Ubuntu with Java 8 and... Nothing has changed !
Maybe the .txt file with the dictionary word list on it, should have a different layout?
In my file, words are listed in this way - utf-8:

a
b
c
d
...etc

It looks like FSACompile doesn't work.

Invalid argument: Unknown option: --overwrite

Moreover, I noticed that the file created by LT (I suppose) C:\Users\KP\AppData\Local\Temp\SpellDictionaryBuilder257816875475185246.txt
disappears few seconds after the code is executed. I thought that it was my antivirus and I disabled it... But again nothing has changed.

Any Suggestions?


(Daniel Naber) #21

It was a bug that should be fixed now. The fix will be in the next daily build, to be published later tonight at https://languagetool.org/download/snapshots/?C=M;O=D