I am experimenting with ngrams for Latvian language, but there are no ready-made google ngrams for it, so I’m trying to make my own data, however I am having issues making the lucene index.
How do I build the index? I have tried to use Luke to create the index, but it either crashes or gives an empty index. Maybe there are some scripts/programs that were used to build the index for those languages that are supported? I don’t have experience with programming in Java and I wasn’t able to find any information on what is LT expecting to find in that index.