Back to LanguageTool Homepage - Privacy - Imprint

Problem while creating synthesizer dictionary


(Aafreen) #1

I am trying to modify an English synthesizer dictionary as mentioned in the below site.

http://wiki.languagetool.org/developing-a-tagger-dictionary#toc4

I have created in 3-tab separated format. Dictionary built successfully. But, I cannot use the dictionary. I am getting no replacement when I am using the newly built dictionary.

java -cp languagetool.jar org.languagetool.tools.DictionaryExporter -i english_synth.dict -info english_synth.info -o dictionary.dump

java -cp languagetool.jar org.languagetool.tools.SynthDictionaryBuilder -i dictionary.dump -info org/languagetool/resource/en/english_synth.info -o result.dict

Please let me know If I did anything incorrectly while building the dictionary.


(Yakov) #2

It is necessary to export words from the standard dictionary english.dict, not from the synthesized english_synth.dict dictionary.


(Aafreen) #3

Thank you for your response Yakov. But I am not trying to modify the spell checker dictionary. I am trying to modify the synthesizer dictionary. Please let me know if you have any options regarding this.


(Yakov) #4

For synthesizer dictionary need extract POStag dictionary english.dict from /languagetool-language-modules/en/src/main/resources/org/languagetool/resource/en

java -cp languagetool.jar org.languagetool.tools.DictionaryExporter -i org/languagetool/resource/en/english.dict -info org/languagetool/resource/en/english.info -o dictionary.dump

and build synth dictionary:

java -cp languagetool.jar org.languagetool.tools.SynthDictionaryBuilder -i dictionary.dump -info org/languagetool/resource/en/english_synth.info -o result_synth.dict

Format of dictionary.dump

boyar boyar NN
boyard boyard NN

If the file dictionary.dump format is different, the resulting dictionary will be broken.


(Aafreen) #5

Thank you very much Yakov. I will try this immediately.


(Aafreen) #6

This method works fine. Thank you.
I have seen that there is also an option to export spell checker dictionary.
But, I am facing while trying to export en_US.dict.
I guess, you can help me in this too.


(Yakov) #7

For spellchecker dictionary with frequency data is impossible export word list due bug.
English spellcheck dictionaries contain frequency data.
But you can get word list from source hunspell dictionary using hunspell utillity unmunch.

./unmunch en_US.dic en_US.aff > en_US1.txt

You can get source en_US dictionary from:

https://github.com/marcoagpinto/aoo-mozilla-en-dict/tree/master/en_US%20(Kevin%20Atkinson)