Back to LanguageTool Homepage - Privacy - Imprint

LT Evaluation

Hi LT dev community,

I was wondering if LT has ever been evaluated by the academic community with resources such as NUCLE or FCE datasets, such that we could know for sure which errors caught by LT are True Positives and which ones are False Positives. Additionally, I’d like to know if a similar effort has ever been made to evaluate the usefulness of the feedback given in each error correction.

Consider this paper on the evaluation of Grammar Error Correction methods.

1 Like

Hi,

I’m also interested in evaluation results. Actually, is there any information on how well language tool performers?

It would not be very trustworthy if anyone from our team did this, would it?
Quite somectime ago I had a look in a test data set, and it was clear to me that it was not a natural set, but a constructed one.
Sets like that carry the view of the creator.
Best would be a collection of several real sets from real individuals and organisations presenting several use cases.
No chance on getting that.

It would be great, though.

I consider there’s no conflict if LanguageTool is evaluated on external resources that are not built by the LT team. Actually, it would be useful if the LT team publishes LT performance on external benchmarks. There are several academically built datasets for Grammar Error Detection and Correction. I list them below:

FCE https://ilexir.co.uk/datasets/index.html
JFLEG https://www.aclweb.org/anthology/E17-2037/
NUCLE https://www.comp.nus.edu.sg/~nlp/corpora.html
AESW http://textmining.lt/aesw/

I found a study which compares different tools.

If you are aware of any similar study, please let me know.