Back to LanguageTool Homepage - Privacy - Imprint

How to add new word to the lexical dictionary

(leny) #1

Hi all,

In case I would like to use the LanguageTool tools in a technical domain, I need to add technical words to the lexical dictionary.
Does anyone know how to add new word and the related lexical information (lemme, noun/verb/adj, gender, conjugation, ..) ?

I read some post about adding new rules but not about adding new words to the lexicon.


[EDIT] Note that I would like the POS tagger is able to use these new words when tagging.

(Daniel Naber) #2

Only a few languages allow adding words to the dictionary directly:


Note that this is independent of the spell checker. To add words to the spell checker, add them to the ignore.txt file, e.g. org/languagetool/resource/en/hunspell/ignore.txt for English.

(leny) #3

Thank you for your answer.

Bad news for me, it seems that the french language does not support that (I do not have added.txt or manual-tagger.txt files).

When you say "adding words to the dictionary direcly", do you mean that there is an indirect or alternative solution to add custom words for the POS tagger?


[EDIT] I have just found here that we can export a dictionary and build it again.
So, I exported the french dictionary from resource/fr/french.dict to a new text file french.txt. Then, I tried to build the dictionary again (even with no changes to the text file) using the POSDictionaryBuilder class but it fails.

The error is:

Exception in thread "main" java.lang.NullPointerException


(Daniel Naber) #4

I think you need to make sure the *.info file is in the current directory when calling POSDictionaryBuilder. You can find the *.info file in the same directory as the *.dict file.

(leny) #5

I checked for the info file. It is in the right place.

(Daniel Naber) #6

What is the exact command you use for POSDictionaryBuilder? Does the *.info file contain at least these items?



(leny) #7

Ok, the problem exists between chair and desk ....
I used :

java -cp languagetool.jar ~/french.dict.txt org/languagetool/resource/fr/french.dict

instead of:

java -cp languagetool.jar ~/french.dict.txt org/languagetool/resource/fr/

But the word that I added seems not be present in the built dictionary because the POS tagger tags my word as NULL. If I export again the binary dictionary, my word is not present.
Here is my custom line in my text dico:

magnétoscopie	magnétoscopie	N f s

My corpus is:

Si la matériau est ferrique, j'utilise la magnétoscopie.

The tagged corpus is:

Expected text language: French

Working on /home/leny/corpus.txt...
<S> Si[si/A,] la[le/D f s,] matériau[matériau/N m s,] est[être/V etre ind pres 3 s,] ferrique[ferrique/J e s,],[,/M nonfin,] j[je/R pers suj 1 s,]'[j'/</JE>,]utilise[utiliser/V sub pres 1 s,utiliser/V ind pres 1 s,] la[le/D f s,] magnétoscopie[magnétoscopie/null,].[./M fin,</S>,]

(Daniel Naber) #8

Sorry, I cannot reproduce this problem. Did you make sure your text is in the correct format (tab separated)? Did you notice that POSDictionaryBuilder writes to a temp file, not to the original *.dict file?

(leny) #9

I had not noticed that the builder builds a temporary dictionary. I was expected for the french.dict file to be overwritten.
So, it works when I replace the french.dict file by the created temporary file.

Thank you so much for your help.