Back to LanguageTool Homepage - Privacy - Imprint

How to add new word to the lexical dictionary


(leny) #1

Hi all,

In case I would like to use the LanguageTool tools in a technical domain, I need to add technical words to the lexical dictionary.
Does anyone know how to add new word and the related lexical information (lemme, noun/verb/adj, gender, conjugation, ..) ?

I read some post about adding new rules but not about adding new words to the lexicon.

Thanks
Leny

[EDIT] Note that I would like the POS tagger is able to use these new words when tagging.


(Daniel Naber) #2

Only a few languages allow adding words to the dictionary directly:

org/languagetool/resource/de/added.txt
org/languagetool/resource/ro/added.txt
org/languagetool/resource/ca/manual-tagger.txt
org/languagetool/resource/eo/manual-tagger.txt

Note that this is independent of the spell checker. To add words to the spell checker, add them to the ignore.txt file, e.g. org/languagetool/resource/en/hunspell/ignore.txt for English.


(leny) #3

Thank you for your answer.

Bad news for me, it seems that the french language does not support that (I do not have added.txt or manual-tagger.txt files).

When you say "adding words to the dictionary direcly", do you mean that there is an indirect or alternative solution to add custom words for the POS tagger?

Thanks

[EDIT] I have just found here that we can export a dictionary and build it again.
So, I exported the french dictionary from resource/fr/french.dict to a new text file french.txt. Then, I tried to build the dictionary again (even with no changes to the text file) using the POSDictionaryBuilder class but it fails.

The error is:

Exception in thread "main" java.lang.NullPointerException

	at org.languagetool.dev.DictionaryBuilder.getOption(DictionaryBuilder.java:130)
	at org.languagetool.dev.DictionaryBuilder.getTab2MorphOptions(DictionaryBuilder.java:78)
	at org.languagetool.dev.POSDictionaryBuilder.build(POSDictionaryBuilder.java:50)
	at org.languagetool.dev.POSDictionaryBuilder.main(POSDictionaryBuilder.java:43)

(Daniel Naber) #4

I think you need to make sure the *.info file is in the current directory when calling POSDictionaryBuilder. You can find the *.info file in the same directory as the *.dict file.


(leny) #5

I checked for the info file. It is in the right place.


(Daniel Naber) #6

What is the exact command you use for POSDictionaryBuilder? Does the *.info file contain at least these items?

fsa.dict.separator

fsa.dict.encoding
fsa.dict.encoder

(leny) #7

Ok, the problem exists between chair and desk ....
I used :

java -cp languagetool.jar org.languagetool.dev.POSDictionaryBuilder ~/french.dict.txt org/languagetool/resource/fr/french.dict

instead of:

java -cp languagetool.jar org.languagetool.dev.POSDictionaryBuilder ~/french.dict.txt org/languagetool/resource/fr/french.info

But the word that I added seems not be present in the built dictionary because the POS tagger tags my word as NULL. If I export again the binary dictionary, my word is not present.
Here is my custom line in my text dico:

magnétoscopie	magnétoscopie	N f s

My corpus is:

Si la matériau est ferrique, j'utilise la magnétoscopie.

The tagged corpus is:

Expected text language: French

Working on /home/leny/corpus.txt...
<S> Si[si/A,] la[le/D f s,] matériau[matériau/N m s,] est[être/V etre ind pres 3 s,] ferrique[ferrique/J e s,],[,/M nonfin,] j[je/R pers suj 1 s,]'[j'/</JE>,]utilise[utiliser/V sub pres 1 s,utiliser/V ind pres 1 s,] la[le/D f s,] magnétoscopie[magnétoscopie/null,].[./M fin,</S>,]

(Daniel Naber) #8

Sorry, I cannot reproduce this problem. Did you make sure your text is in the correct format (tab separated)? Did you notice that POSDictionaryBuilder writes to a temp file, not to the original *.dict file?


(leny) #9

I had not noticed that the builder builds a temporary dictionary. I was expected for the french.dict file to be overwritten.
So, it works when I replace the french.dict file by the created temporary file.

Thank you so much for your help.

Leny