Back to LanguageTool Homepage - Privacy - Imprint

How to add a new dictionary


(Kapil) #1

Hi,

How do I add a new dictionary to my existing LT installation (java based)? For example, I would like to use a new medical dictionary(English US) but I do not want to override the existing English dictionary.

Similarly lets say I have a dictionary for a language which is yet not supported by LT and I would like to use it for spell checking only(no grammar check).
Do I need to clone the LT source and build the whole project?

Thanks

Kapil


(Daniel Naber) #2

You can add your words to org/languagetool/resource/en/added.txt (for English, for other languages use their code instead of en).


(Kapil) #3

Hi Daniel,

I want to add a new language and followed this link and implemented java files.
I have a hunspell dictionary in .dic format, which I tried to convert into .dict format using the SpellDictionaryBuilder.
Am receiving following output after running the command

The input starts with UTF-8 BOM bytes which is most likely not what you want. Use header-less UTF-8 file or override with --accept-bom.

Done. The binary dictionary has been written to C:\Users\kagupta\Downloads\output.dict

Though no file is generated


(Ruud Baars) #4

You could also just try to use unmuch from the hunspell tools. For most simple dictionaries, this works well enough.


(Kapil) #5

I was able to generate the dict file by saving the dic file with proper encoding.
However when I try to create instance of LanguageTool with my language, it is throwing error that grammar.xml file is not available.
I have specified only one rule for spellcheck and do not wish to check grammar.

Is it mandatory to specify grammar.xml file though I do not see it in some of the supported languages


(Kapil) #6

I put an empty grammar.xml file to prevent this error and continue with spell checking only.
But am not getting misspelled words even though there are spelling errors in my text string.

Any idea what is wrong with my setup?

Thanks

Kapil


(Daniel Naber) #7

That’s hard to tell without further information. Have you used a debugger to step through the code?


(Kapil) #8

Hi Daniel,

I am not building LT’s source code along with my language specific project as we do not wish to build LT just for adding a new dictionary (later on) for spell checking only. I was hoping to add LT’s compiled output in classpath and adding my own Language and SpellCheckRule implementation.
Though LT is showing my new language in list of available languages but somehow the spell checking is not working.

It is mandatory to build the whole source code to add a new language even for just spell checking?

Thanks

Kapil


(Daniel Naber) #9

I see - but if you attach the source, your IDE should still be able to debug your code as well as LT’s code.

I’m not sure, we rarely add a new language, but I think it should work without re-building LT.


(Kapil) #10

Hi Daniel,

I was able to attach source and found that spellchecker is not returning true for invalid word.
I was trying to add a new medical english dictionary under a new language Medical English. I have performed following steps.

  1. Converted Hunspell based .dic file to .dict format

  2. Create a new java file MedicalEnglish and new SpellCheckerRule extending the AbstractEnglishSpellerRule and specifying filepath of my dictionary file

  3. Added my language entry in language-module.properties file

  4. Placed the .dic and .info file at relevant locations

Now when I call the JLanguageTool API to check paragraph it does not return any suggestions or error words.


(Daniel Naber) #11

That sounds as if you’ll need to debug deeper, into the morfologik component (source code).


(Kapil) #12

On further debugging, I found that the issue was not with the spellchecker but with my project structure and classpath settings.
I had placed my dict file in classpath but ResourceDataBroker class assumes it to be placed under org/languagetool folder.
Secondly it throws error if spelling.txt is not available in same folder though it accepts empty file.

Now my spell checking is working perfectly.

Thanks

Kapil