Wouldn’t it be great if users could contribute an example that is wrong with the desired correction, and it would be tested as a rule immediately against an enormous corpus. Then the sentences with a hit could be shown, and marked as ‘positive’ or ‘false positive’, automatically creating ‘antipatterns’.
Most of the functionality is already there (and more) in the rule generator. But that one is still far too complex and technical for a ‘normal’ language user.
The generated rule could be stored separately for review/edit/generalising by one of the more computer educated contributors.
I like the idea, but how to create the pattern in the first place? We can’t just use the incorrect word, but we need to use its context. But how many words of context and just the words or their POS tags?
The diff between the corrected and wrong sentences determines what the pattern is. I would suggest only words to be used, not postags (too much knowledge needed).
Context could maybe be distilled from the sentences marked as true or false positive. Or by letting the user mark it.
The simple rules generated could be edited and enhanced afterwards, by one of the more technical people. That is what i meant with ‘generalize’ : use postags instead of words, add skip etc.
@Ruud_Baars I am not sure, but maybe you haven’t check this LT feature: https://community.languagetool.org/ruleEditor2/
Type in a good and a bad example, then click “Create initial error pattern”, then, in the end of the page click: “Evaluate error pattern” without touching anything else. The XML rule will appear on the end of the page.
I guess the “user contributions” in the main site do the same thing, just automate the process, probably by only keeping “good contribution” that do not trigger detections in the corpus. The API would only have to retrieve the correction as the “good example”, and the “untouched text” would be the “bad example”.
Unfortunately, we, people working on the public code, don’t have access to those user contributions.
@Ruud_Baars Is this something different from what you suggest?
The idea is great and I like it also but you still need to develop it on how to create pattern. Maybe this will help a lot of people and this will become useful in the future. Just keep on developing it. Thanks for sharing.