LanguageTool without ngram data =?

dnaber · November 3, 2016, 9:27am

For English, German, Spanish, and French LanguageTool makes use of ngram data to detect wrong use of homophones and some typos. Details are documented at Finding errors using n-gram data - LanguageTool Wiki. However, the data is huge (several GB), so it’s not part of LT and most users of the LT downloads will not use it. On languagetool.org, the ngram data is always used.

We should make it more transparent to the users that the download versions of LT don’t support these rules. I’m not sure what exactly can be done, but one idea is to rename the download versions to something like “LanguageTool Light” and explain the difference on the homepage. Any ideas on how to call LanguageTool without ngram rules? Any ideas of better approaches?

Jan_Schreiber · November 3, 2016, 2:56pm

For me, “LanguageTool Light” sounds as if there was a (paid) full version I can upgrade to. “Special offer: Get your LanguageTool Premium membership today and save 20 %!”
Also, I’d rather choose the opposite path and keep the name “LanguageTool” for the normal version and call the version with the ngram data something like “LanguageTool+”, or perhaps “LanguageTool n-hanced” (trying to be funny here, which I am not good at). That suggests that the ngram data is an extra feature on top of the standard version, and this is pretty much how I perceive the situation.
And yes, an explanation that and why the download version does not include the ngram data would be nice.

dnaber · November 3, 2016, 4:54pm

Thanks, your suggestions make sense… here’s a first draft of a page that compares the editions: LanguageTool - Online Grammar, Style & Spell Checker - anything that should be added?

Jan_Schreiber · November 3, 2016, 6:36pm

A first feedback: It took me several minutes to process all the information presented in that table. And I am a user who already has, basically, all the information presented there. At first sight, I was like, “Gahh, that looks way too complicated!”

Maybe it might help if you avoid complex double and triple negations. This is obviously subjective, but for me “Works offline” seems unnecessarily complicated: “It works if I don’t have an Internet connection.” I would rather use “Requires live Internet connection” or “Internet access required”: no.

Maybe I’m the only person on the planet who feels like that, but for me, if an app needs Internet access, it should say, “Yes, I need Internet access”, rather than “No, I can work offline”. But maybe it’s only me.

tiagosantos · November 3, 2016, 8:32pm

I really like the table, but:

FWIW, this does not sound bad. I like humorous titles.

Nowdays, the default has changed to “Internet required” everywhere. It is fine, until you travel and realize your tools do not work any longer.

dnaber · November 4, 2016, 8:52am

My interpretation is that a green check mark has a connotation of “good”, and a red cancel icon means “bad”. So if we have a green check mark at “Requires internet”, all the green check marks are on the side of the add-ons, and none on the side of the stand-alone version. That looks as if there’s no reason to use the stand-alone version at all.

About the page looking complicated: all I could think of is to move to explanation texts of the check mark/cancel icons to a popup.

Mike_Unwalla · November 4, 2016, 9:29am

“Also, I’d rather choose the opposite path and keep the name “LanguageTool” for the normal version”. I agree with Jan.

“About the page looking complicated: all I could think of is to move to explanation texts of the check mark/cancel icons to a popup.” I disagree. If you put the explanations in popups, the information is not simpler. It is only more difficult to find.

For LanguageTool stand-alone, include a link to Software That Supports LanguageTool As A Plug-In Or Add-On - LanguageTool Wiki.

The Comparison of LanguageTool Editions seems to conflict with Software That Supports LanguageTool As A Plug-In Or Add-On - LanguageTool Wiki. In the comparison, LT is split into 3 groups:

LT.org, LT for FF, LT for Chrome
LT for LO/OO
LT stand-alone
But, Software That Supports LanguageTool As A Plug-In Or Add-On is an list of all software that LT integrates with. It includes, for example, LT for Chrome, which on the comparison page is different to LT standalone. Change the organisation on the Software that Supports… page to be the same as on the Comparison of LT Editions.

“My interpretation is that a green check mark has a connotation of “good”, and a red cancel icon means “bad”.” My interpretation is that a green check mark means ‘yes’ and a red cross means ‘no’. Possibly, remove the icons and use the words ‘yes’ and ‘no’. Or, use the words and the icons.

For the line about additional homophone rules, the red cross for LT stand-alone and LT LO/OO is confusing because those versions of LT have the rules, but only after a manual installation of the rules. Possibly, use a third icon to mean ‘yes, but’. (For example, I use this: ).

Jan_Schreiber · November 4, 2016, 8:11pm

I looked at the table again today and didn’t find it complicated. I think I was overworked and tired yesterday.

curon · November 4, 2016, 11:07pm

As an alternative option, isn’t it possible to prune the Lucene Index to only contain NGram data for tokens within the confusion sets? Not sure how much space this would save, but it could make the standalone version viable with the NGram data.

dnaber · November 5, 2016, 10:10am

Yes, but I guess that the pruned data would still be larger than the complete LT download is today. Also, we’d lose the flexibility of adding confusion pairs - the data would need to be re-generated every time.