For English, German, Spanish, and French LanguageTool makes use of ngram data to detect wrong use of homophones and some typos. Details are documented at Finding errors using n-gram data - LanguageTool Wiki. However, the data is huge (several GB), so it’s not part of LT and most users of the LT downloads will not use it. On languagetool.org, the ngram data is always used.
We should make it more transparent to the users that the download versions of LT don’t support these rules. I’m not sure what exactly can be done, but one idea is to rename the download versions to something like “LanguageTool Light” and explain the difference on the homepage. Any ideas on how to call LanguageTool without ngram rules? Any ideas of better approaches?
For me, “LanguageTool Light” sounds as if there was a (paid) full version I can upgrade to. “Special offer: Get your LanguageTool Premium membership today and save 20 %!”
Also, I’d rather choose the opposite path and keep the name “LanguageTool” for the normal version and call the version with the ngram data something like “LanguageTool+”, or perhaps “LanguageTool n-hanced” (trying to be funny here, which I am not good at). That suggests that the ngram data is an extra feature on top of the standard version, and this is pretty much how I perceive the situation.
And yes, an explanation that and why the download version does not include the ngram data would be nice.
A first feedback: It took me several minutes to process all the information presented in that table. And I am a user who already has, basically, all the information presented there. At first sight, I was like, “Gahh, that looks way too complicated!”
Maybe it might help if you avoid complex double and triple negations. This is obviously subjective, but for me “Works offline” seems unnecessarily complicated: “It works if I don’t have an Internet connection.” I would rather use “Requires live Internet connection” or “Internet access required”: no.
Maybe I’m the only person on the planet who feels like that, but for me, if an app needs Internet access, it should say, “Yes, I need Internet access”, rather than “No, I can work offline”. But maybe it’s only me.
My interpretation is that a green check mark has a connotation of “good”, and a red cancel icon means “bad”. So if we have a green check mark at “Requires internet”, all the green check marks are on the side of the add-ons, and none on the side of the stand-alone version. That looks as if there’s no reason to use the stand-alone version at all.
About the page looking complicated: all I could think of is to move to explanation texts of the check mark/cancel icons to a popup.
“Also, I’d rather choose the opposite path and keep the name “LanguageTool” for the normal version”. I agree with Jan.
“About the page looking complicated: all I could think of is to move to explanation texts of the check mark/cancel icons to a popup.” I disagree. If you put the explanations in popups, the information is not simpler. It is only more difficult to find.
LT stand-alone
But, Software That Supports LanguageTool As A Plug-In Or Add-On is an list of all software that LT integrates with. It includes, for example, LT for Chrome, which on the comparison page is different to LT standalone. Change the organisation on the Software that Supports… page to be the same as on the Comparison of LT Editions.
“My interpretation is that a green check mark has a connotation of “good”, and a red cancel icon means “bad”.” My interpretation is that a green check mark means ‘yes’ and a red cross means ‘no’. Possibly, remove the icons and use the words ‘yes’ and ‘no’. Or, use the words and the icons.
For the line about additional homophone rules, the red cross for LT stand-alone and LT LO/OO is confusing because those versions of LT have the rules, but only after a manual installation of the rules. Possibly, use a third icon to mean ‘yes, but’. (For example, I use this: ).
As an alternative option, isn’t it possible to prune the Lucene Index to only contain NGram data for tokens within the confusion sets? Not sure how much space this would save, but it could make the standalone version viable with the NGram data.
Yes, but I guess that the pruned data would still be larger than the complete LT download is today. Also, we’d lose the flexibility of adding confusion pairs - the data would need to be re-generated every time.