Back to LanguageTool Homepage - Privacy - Imprint

Complete integration of a new language


(edkinsgael) #1

Hi all,
Id like to know how to fully integrate a new language in LanguageTool. I'm from Cameroon and I would like to integrate a local language called "eton"; I have already followed the little tutorial located here: http://wiki.languagetool.org/adding-a-new-language; but I would like to know how to integrate a new dictionary in my module I created, how to get it to be taken into account, how to integrate conjugation rules, agreements, and all the basics gramaticals rules which I suppose we cannot put into the file gramma.xml. Ps: I'm a java developper.


(Daniel Naber) #2

Hi, thanks for your interest in LanguageTool.

For spell checking, the best way is to convert an existing hunspell dictionary, as documented at http://wiki.languagetool.org/hunspell-support#toc4

How to build a dictionary for part-of-speech tags so you can add agreement rules depends on the language. If it's quite regular, you might want to write a Java class that assigns part-of-speech tags programmatically (that class would extend BaseTagger). If the language is too irregular, you might need to write a dictionary as a large text file in the format "inflectedForm baseForm POSTAGS", e.g. for English it would be "children child NNS" (with NNS being the tag for plural nouns). Maybe https://bitbucket.org/janek37/lexeme_forge could be used.

About getting a new language integrated into the official version of LT: I suggest you fork LT on github and make all your changes there. We will accept new languages once their maintainers have shown that they're willing to care about the language in the long term. I'd also suggest to build a small community around that language so you're not the only one who cares about it in LT.

Regards
Daniel