Adding languages at runtime/dynamically

This page of the wiki: Java Spell Checker - LanguageTool Wiki seems to suggest that you need to have ALL of the language files for all languages you want to make available at startup of the JVM. Or more specifically the language files need to be available on the classpath.

Is it possible to add a supported language dynamically for a language that isn’t present on the classpath? For example, I would like to offer a certain number of languages in my desktop application but don’t want to have them present in the application until the user requests/needs them. For a language not already available in the application I would download the language files from my server and then save them to the a user directory, then make the language available for the spell check. Is this possible?

Another way of putting this is can I specify the folders where an instance of org.languagetool.language.BritishLanguage should be looking for its configuration files?

The language-module.properties file tells LT what language classes there are (on the classpath). Adding languages without restarting LT is probably tricky/impossible without further code changes.

Doing some more digging I think using a custom org.languagetool.databroker.ResourceDataBroker may do the job. If I get it working I’ll update this thread.

As far as I can tell it is in the current codebase. I am currently reworking LT to decouple it from the classpath and to decouple rules from format/file locations. It’s nearly done but the changes won’t be completely compatible with LT and so I’ll have it as a separate project on github.

In which way will it not be compatible? Maybe we could merge those changes into LT?

The changes I’m making remove the assumptions behind the current LT implementation. Most of the rules (in my changed version) now work using data rather than having hard coded file names. There is no assumption that things will be found in the classpath, resource lookup is deferred to a resource broker, which it sort of does now but not really. I’ve also modified things so that a language can be loaded at runtime, the files/data it needs doesn’t need to be on the classpath.

I’ve also made some changes to decouple rules from the language and decouple the rules from JLanguageTool. So I can create and test a rule by itself.

An example (in my local codebase) would be:

// A default broker is used that knows how to look up resources from the classpath. The user can set their own broker.
AmericanEnglish en = new AmericanEnglish();
JLanguageTool lt = new JLanguageTool(en);
MorfologikAmericanSpellerRule rule = en.createMorfologikSpellerRule(messages, userConfig);
lt.check(rule, “This is my test sentence.”);

The constructor for MorfologikSpellerRule is now:

public MorfologikSpellerRule(ResourceBundle messages, Language language, UserConfig userConfig, Set dictionaries, List ignoreWords, List prohibitedWords);

The language knows how to create it’s own rules (as it does now) but you can pick and choose the one(s) you are interested in. Resource lookup is requested from a ResourceDataBroker which knows how to look up dictionaries, the ignore words and prohibited words. Where the broker gets the actual data from is its own business, my default broker uses Path objects but can resolve those paths against the classpath if required.

Obviously my changes run a lot deeper than this (for instance the assumption that RuleFilter constructors have no args no longer applies). I’ve also removed things like the short code for languages and replaced it with a Locale. This simplifies a lot of things. I’ve tried as much as possible to keep compatibility with the existing codebase but a lot has changed. I’m happy for the changes to be merged with the current LT implementation but I think the volume of changes may be too much for some people (despite the increased flexibility it will bring, a lot of people won’t need it because LT already works for them but for my use case I really don’t want to mess with the classpath and don’t want to have the overhead of all the languages the end user may ever want to use). So I’ll leave that decision up to you guys, I’m happy to have a parallel project where I merge changes from LT into my local project. I don’t want to fragment your user base but I really want to use LT but I can’t in its current form, hence my changes.

I’ve currently got the core and English modules working (and all tests pass), the rest of the languages will be easy to modify since I now have all the components necessary to make it work.

this approach could also help with setting up a conlang that is more than “hceeps sdrawkcab” as you won’t need to restart LT completely every time you improve it.

On the side: how well does LT deal with Klingon?

Could you post a link to your version?

I’ll try to in a few days. I’m just finishing the changes to French now but I’ve got nearly 400 changed files and it will take some time to submit them all. I also want to get German changed first because it has some nuances that may affect some of the things I’ve done.

I’ve finally got German finished, as I suspected there were a number of nuances that needed careful attention.

You can find the code here:

Currently core, en, de, fr and fa are done, this includes the tests. It’s still very much a wip. There are a number of GTODO tags in the code comments. These are notes to myself for things to do/look at later.