Build n-gram for a new language

Hi Daniel,

Now, I have a huge amount of text. (~2GB)
I saw a reply of you on the discussion below: En n-gram data


It means I can use Lucene 5.2.1 to create the index, doesn’t it?
By the way, can you tell me next step to create the ngrams myself.

Thanks a lots.