Back to LanguageTool Homepage - Privacy - Imprint

'easy rules' feature


(Ruud Baars) #1

Wouldn’t it be great if users could contribute an example that is wrong with the desired correction, and it would be tested as a rule immediately against an enormous corpus. Then the sentences with a hit could be shown, and marked as ‘positive’ or ‘false positive’, automatically creating ‘antipatterns’.

Most of the functionality is already there (and more) in the rule generator. But that one is still far too complex and technical for a ‘normal’ language user.

The generated rule could be stored separately for review/edit/generalising by one of the more computer educated contributors.

What about it?


(Daniel Naber) #2

I like the idea, but how to create the pattern in the first place? We can’t just use the incorrect word, but we need to use its context. But how many words of context and just the words or their POS tags?


(Ruud Baars) #3

The diff between the corrected and wrong sentences determines what the pattern is. I would suggest only words to be used, not postags (too much knowledge needed).
Context could maybe be distilled from the sentences marked as true or false positive. Or by letting the user mark it.

The simple rules generated could be edited and enhanced afterwards, by one of the more technical people. That is what i meant with ‘generalize’ : use postags instead of words, add skip etc.


(Tiago F. Santos) #4

@Ruud_Baars I am not sure, but maybe you haven’t check this LT feature:
https://community.languagetool.org/ruleEditor2/
Type in a good and a bad example, then click “Create initial error pattern”, then, in the end of the page click: “Evaluate error pattern” without touching anything else. The XML rule will appear on the end of the page.
I guess the “user contributions” in the main site do the same thing, just automate the process, probably by only keeping “good contribution” that do not trigger detections in the corpus. The API would only have to retrieve the correction as the “good example”, and the “untouched text” would be the “bad example”.
Unfortunately, we, people working on the public code, don’t have access to those user contributions.

@Ruud_Baars Is this something different from what you suggest?


(Ruud Baars) #5

What is missing is:

  • the possibility to just mark a hit as a false positive, resulting in an antipattern
  • auto adding examples from teh correct positives
  • storing the result somewhere for the user to use immediately
  • storing the concept rule somewhere for a more trained person for edit
  • notify this/these person(s)

and of course

  • obvious presence for this all…