Back to LanguageTool Homepage - Privacy - Imprint

Programmatically Loading Ignored Words For Each Spelling Check


(Jacob Johnson) #1

I’m interested in learning more about programmatically loading a list of words to be ignored as spelling mistakes. I have read that LanguageTool already supports the use of the ‘ignore.txt’ file to load a list of ignored words into a JLanguageTool instance during construction. However, I’m trying to find a way to support a different set of ignored words for every request.

My situation is something like this: I’m using multiple instances of JLanguageTool in a micro-service architecture to accept and correct text as a proofreading service for our larger client-side text editor web application. I want to support many unique users where each unique user can have his or her own list of ignored words. I already have a plan for persisting this data. However, the last piece of the puzzle I want to figure out is how to programmatically load the list of ignored words for a user when a request comes in from that user. It would look something like this…

User A sends text through to my service to be checked.
I programmatically load User A’s ignored words into JLanguageTool.
I check the text with JLanguageTool.
I programmatically unload these ignored words from JLanguageTool.
User B sends text through to my service to be checked.
I programmatically load User B’s ignored words into JLanguageTool.
I check the text with JLanguageTool.
I programmatically unload these ignored words from JLanguageTool.

Is there a way to support this programmatic loading and unloading of ignored words without having to construct a new JLanguageTool instance each time? If not, does language tool support this “multi-user concurrent usage” with regards to user-specific ignored words? If JLanguageTool doesn’t support this “pre-process” loading of ignored words I might have to filter the results returned through the ignored word list in a “post-process” which seems less ideal. Any help would be greatly appreciated; thank you in advance!

Update: It looks like it’s possible to iterate over all the spelling rules and add ignore tokens to each. This may accomplish half of what I’m looking for. However, there does not appear to be a way to remove ignore tokens. Does the class SpellingCheckRule contain something similar to a ‘removeIgnoreTokens()’?


(Daniel Naber) #2

There’s now a UserConfig object that can be given as a parameter when creating a new JLanguageTool. It can contain a word list. It is in the nightly snapshots and will be in next week’s release. Don’t worry about re-creating the JLanguageTool objects, it’s lightweight (unlike the language objects like English etc.)


(Jacob Johnson) #3

So you would recommend creating a new JLanguageTool instance for each request (which should be pretty lightweight as long I don’t create a new language object) and inject the user’s ignored word list as part of the UserConfig object during that instance construction?

I’m working on implementing this new user-specific ignore word story now and would like to make some forward progress before the release next week. Would I be able to accomplish something similar in the time being by iterating over SpellingCheckRules and adding ignore tokens after I construct the new JLanguageTool object?

Thanks for the quick feedback; I greatly appreciate it! For what it is worth we are really enjoying the value LanguageTool is adding to our application.


(Daniel Naber) #4

Yes, that’s what the embedded server does (used e.g. for languagetool.org).

Do you need LT to be on Maven central for that? Otherwise I’d suggest using the snapshot, it has all the required classes already. Giving ignore tokens to the rules might work, but the ignored words would then probably not be used to create suggestions when a mistyped word is close to an ignored word (don’t have time to check that now).


(Jacob Johnson) #5

Glad to hear!

Yes, we pull in via Maven so I’ll have to wait for release. Ignored words not being used to generate suggestions should be fine in our use case so I’m not worried about that at this time. As long as the ignored words don’t appear as mistake outputs that should work. How can I watch to see when that next release goes out? Are release notes posted anywhere on the LT website?

Thanks for the help!


(Daniel Naber) #6

Releases are announced here on the forum and on twitter.