First, thank you for the great tool! Second, I apologize for my lack of knowledge of LT’s inner workings. This suggestion may actually already be used.
I have an idea that will allow LT to offer a smaller n-gram dataset, at the expense of user flexibility. I think LT ought to offer a “cooked” n-gram database, where the cooked database only contains n-grams with at least one of the words of confusion. This reduced dataset should be significantly smaller. This may even allow the “cooked” database to use 4-grams, should the savings be substantial enough. The trade-off though, is that that the user can no longer effectively add new words of confusion.
I think in addition to the default n-gram database that can be downloaded, LT ought to offer a “cooked” n-gram database. This allows people developing new rules to continue to use the large uncooked dataset. Everyone else who simply wants a secure way to use the n-gram capabilities would only need the smaller “cooked” database, and those who do not need a secure means may simply use the form on the LT website.