Back to LanguageTool Homepage - Privacy - Imprint

Developing a tagger dictionary for Icelandic

(starkadur) #1

I am trying to follow the instructions on but I run in to problems already with the first step.

I have created a test-file with two lines, each line including the inflected form, the base form and the pos-tag, separated by tabs. I have tried to save it as AINSI, UTF-8 and also converted it using dos2unix. But I always get the same error when I try to export the data running the command
java -cp languagetool.jar org/languagetool/resource/is/icelandic.dict >dictionary.dump

The error I get is:

Unhandled program error occurred. Invalid file header magic bytes.
at 41)

Do you now what could be the problem?

Many thanks,

(Daniel Naber) #2

Could you attach your files here, preferably zipped so the software doesn't change them? (you can use "More -> Upload file" for that).

(starkadur) #3

Thanks for you quick reply. Here is one version of the file (utf-8).icelandic.7z (156 Bytes)

(Daniel Naber) #4

You shouldn't call the input *.dict, as that's already the name of the binary file that needs to be generated (using the POSDictionaryBuilder command on the wiki page). However, the next issue will be that Icelandic never had a tagger, so even if you have a file *.dict file, LanguageTool will ignore it. Are you familiar with Java? The file will need to be adapted to use (which needs to be created, but can almost be a copy of

(starkadur) #5

I am a bit confused about this command line that is given as an example on the wiki page (

java -cp languagetool.jar org/languagetool/resource/en/english.dict >dictionary.dump

Here it seems that a file called english.dict is used to create dictionary.dump. The next step seems to use dictionay.dump to create a temporary file, using POSDictionaryBuilder, that is then renamed to *.dict.

It doesn't seem to matter what I call the file, I always get the same error (Invalid file header..."). If, on the other hand, I run the command using the file english.dict in resource/en then it works.

I don't have a lot of experience with java. But if I want to look at, then where do I find it (I have found FrenchTagger.class)?

(Daniel Naber) #6

The first command on the wiki is for exporting an existing tagger dictionary. As LT doesn't have such a dictionary for Icelandic, you won't be able to call that command unless you've built one yourself (using the second command on the Wiki, POSDictionaryBuilder).

You can look at at

(starkadur) #7

I managaed to create the *.dict file. I have taken the and modified it:


import java.util.Locale;

import org.languagetool.tagging.BaseTagger;

public class IcelandicTagger extends BaseTagger {

public String getManualAdditionsFileName() {
return "/is/added.txt";

public IcelandicTagger() {
super("/is/icelandic.dict", Locale.English, false);

The three errors I get when compiling are all related to BaseTagger which comes as no surprise since I haven't been able to locate it. I looked in the folder org.languagetool.tagging but did not find it. As I said I am not a very advanced programmer so probably simple things are getting in my way. But if there are any simple answer to my problem it would be appreciated.

(Daniel Naber) #8

How exactly did you try to compile the code? Here's some documentation: - basically it's just calling "mvn clean package".