Back to LanguageTool Homepage - Privacy - Imprint

Using a custom spelling dictionary

(Ida) #1


I'd like to use my own English frequency dictionary for spell checking. Specifically, I need better, more domain-relevant suggestions that the general dictionary can provide. I've extracted a list of words together with their raw and relative frequency counts from a domain corpus. However, I'm not much of a Java programmer and I'm a bit at a loss of how to proceed.
I gather that I need to use SpellDictionaryBuilder. What should be the input format? Can I use raw word frequencies or should I classify them into some specified number of ranges/bins?

any help greatly appreciated!

(Daniel Naber) #2

Hi, the frequency input file can look like this:

<w f="54754" flags="x">the</w>

<w f="31204" flags="x">to</w>
<w f="29830" flags="x">of</w>
<w f="26769" flags="x">and</w>

i.e., you can use absolute frequency values. The "flags" attribute is ignored, but must be included.