Programmatically Loading Ignored Words For Each Spelling Check

jacobjohnson-wf · June 19, 2018, 7:09pm

I’m interested in learning more about programmatically loading a list of words to be ignored as spelling mistakes. I have read that LanguageTool already supports the use of the ‘ignore.txt’ file to load a list of ignored words into a JLanguageTool instance during construction. However, I’m trying to find a way to support a different set of ignored words for every request.

My situation is something like this: I’m using multiple instances of JLanguageTool in a micro-service architecture to accept and correct text as a proofreading service for our larger client-side text editor web application. I want to support many unique users where each unique user can have his or her own list of ignored words. I already have a plan for persisting this data. However, the last piece of the puzzle I want to figure out is how to programmatically load the list of ignored words for a user when a request comes in from that user. It would look something like this…

User A sends text through to my service to be checked.
I programmatically load User A’s ignored words into JLanguageTool.
I check the text with JLanguageTool.
I programmatically unload these ignored words from JLanguageTool.
User B sends text through to my service to be checked.
I programmatically load User B’s ignored words into JLanguageTool.
I check the text with JLanguageTool.
I programmatically unload these ignored words from JLanguageTool.

Is there a way to support this programmatic loading and unloading of ignored words without having to construct a new JLanguageTool instance each time? If not, does language tool support this “multi-user concurrent usage” with regards to user-specific ignored words? If JLanguageTool doesn’t support this “pre-process” loading of ignored words I might have to filter the results returned through the ignored word list in a “post-process” which seems less ideal. Any help would be greatly appreciated; thank you in advance!

Update: It looks like it’s possible to iterate over all the spelling rules and add ignore tokens to each. This may accomplish half of what I’m looking for. However, there does not appear to be a way to remove ignore tokens. Does the class SpellingCheckRule contain something similar to a ‘removeIgnoreTokens()’?

dnaber · June 19, 2018, 7:58pm

There’s now a UserConfig object that can be given as a parameter when creating a new JLanguageTool. It can contain a word list. It is in the nightly snapshots and will be in next week’s release. Don’t worry about re-creating the JLanguageTool objects, it’s lightweight (unlike the language objects like English etc.)

jacobjohnson-wf · June 19, 2018, 8:09pm

So you would recommend creating a new JLanguageTool instance for each request (which should be pretty lightweight as long I don’t create a new language object) and inject the user’s ignored word list as part of the UserConfig object during that instance construction?

I’m working on implementing this new user-specific ignore word story now and would like to make some forward progress before the release next week. Would I be able to accomplish something similar in the time being by iterating over SpellingCheckRules and adding ignore tokens after I construct the new JLanguageTool object?

Thanks for the quick feedback; I greatly appreciate it! For what it is worth we are really enjoying the value LanguageTool is adding to our application.

dnaber · June 19, 2018, 8:20pm

Yes, that’s what the embedded server does (used e.g. for languagetool.org).

Do you need LT to be on Maven central for that? Otherwise I’d suggest using the snapshot, it has all the required classes already. Giving ignore tokens to the rules might work, but the ignored words would then probably not be used to create suggestions when a mistyped word is close to an ignored word (don’t have time to check that now).

jacobjohnson-wf · June 19, 2018, 8:37pm

Glad to hear!

Yes, we pull in via Maven so I’ll have to wait for release. Ignored words not being used to generate suggestions should be fine in our use case so I’m not worried about that at this time. As long as the ignored words don’t appear as mistake outputs that should work. How can I watch to see when that next release goes out? Are release notes posted anywhere on the LT website?

Thanks for the help!

dnaber · June 19, 2018, 8:46pm

Releases are announced here on the forum and on twitter.