Severeness

Ruud_Baars · July 4, 2017, 8:30am

Would it be an idea to add a ‘severeness’ to a rule, rating the importance between 0 and 9 e.g. This would make it possible to let the user select how strict the text is being checked, without having to resort to categories or Java programming.

Jan_Schreiber · July 4, 2017, 10:54am

The problem I see with this approach is that the severeness of an error is often difficult to assess. For German, we would have to rate more than 2000 error rules. That’s a lot of tedious work.

SkyCharger001 · July 4, 2017, 10:58am

We could use a value of null/zero to indicate unrated errors.

Ruud_Baars · July 4, 2017, 11:44am

Unspecified could just be 4 on the range of 0 to 9

tiagosantos · July 4, 2017, 12:05pm

This would work as with rule priorities. Unspecified is the default value in the middle.

I like this idea. If the scale includes negative and positive grades, “sure” rules would be set to higher positive numbers, while controversial or dodgy rules could be set to negative numbers.
Later this system could be extended with a “percentage of false positive” related rating, that could be introduced by some linguistic research project.

Eirik · July 10, 2017, 12:10pm

10 levels sounds like a recipe for analysis paralysis, but maybe that’s just me? I find it difficult to choose between 3 levels for a different system, so 10 levels seems extreme.

How about using a floating point between 0 and 3 (including null of course)? Thus, we could allow people to use integers like 1, 2, 3, but also allow high granularity like 2.3, 2.4, etc.

This way, we cater to different audiences and different needs.

That said, I tend to prefer “KISS” (keep it simple, stupid), so we could also just use strings like “low”, “medium”, “high” - this would avoid any confusion as to whether 0 or 3 is the ‘higher’ value, but with the disadvantage of forcing English into every rule, regardless of language.

Ruud_Baars · July 10, 2017, 12:34pm

Any scale is better than none. My intention is to let users vote some way or other.

SkyCharger001 · July 10, 2017, 1:05pm

perhaps you could add it as a config-option (every rule has a number-box where one can enter how important one judges a rule, empty for default, as opposed to simply turning them on/off as it is now)

Ruud_Baars · July 10, 2017, 3:07pm

Since I am not a Java programmer, I will not be able to to so. Just hoping someone else might.

tiagosantos · July 11, 2017, 7:57am

I will have a look at it and see if it is something I can address. Something rough but functional, for actual contributors.[quote=“SkyCharger001, post:8, topic:1875, full:true”]
perhaps you could add it as a config-option (every rule has a number-box where one can enter how important one judges a rule, empty for default, as opposed to simply turning them on/off as it is now)
[/quote]

A system can be made to assesses the rules automatically.
Create a database with total number of detections by rule and the number of times each rule is ignored by the user. After the data in that batabase gather enough data (passes a threshold) the rules would be graded by accuracy (i.e. ignored, rarely ignored, never ignored, etc.).

Fell free to send the merge request with it.

SkyCharger001 · July 11, 2017, 8:07am

How do you differentiate between every ignore and every unique ignore?

dnaber · July 11, 2017, 8:14am

I think the tricky part is the UI: where is this setting shown in the UI for the user, considering we have many add-ons, the command-line client, and the stand-alone client. How to show to the user that they’re in some kind of mode? How to show them they could find more/fewer errors if they changed the mode?

tiagosantos · July 11, 2017, 8:26am

A points system can be set. Ignore = 1; Ignore_all=10. Or, you can make ignore_all count all ignored matches in that texts.

What I meant with that, is that we can think on fancy things, but they are probably too much work for anyone to do it.
Any user that wants fine-grained control over rule accuracy will likely make that evaluation and/or finetune rules.

I reaffirm the “fancy” part.
Start simple and grow from there if it become relevant. I was thinking on a config file string and two extra levels added to default= in the <rule> tag. Something like:
ACCURACY_LEVEL=off/normal/high/perfect
default='off' is the level 0;
default=‘normal’ is level 1 (the default rule setting and not needing to be explicitly written);
default=‘high’ is level 2;
default=‘perfect’ is level 3.
No UI in the beginning. If maintainers use the system, great, we can think on devoting more time to that endevour. Otherwise it can remain as a feature for advanced users/developers, just like the custom highlighting.

tiagosantos · July 13, 2017, 10:41pm

@dnaber
I am not sure if I finally get what you mean with this.
In the last couple of days, I have been trying to make a functional proof-of-concept on this idea. I figured out how to add the Configuration line in Configuration.java and how to allow new tags on grammar.xml through rules.xsd and Rule.java.
However, I am stuck on how to load those variables in the existing getDefaultDisabledRulesForVariant() logic in JLanguageTool.java in languagetool-core. From my basic understanding of the structure of this program, languagetool-gui is not a dependency of languagetool-core, so I can’t find a way to load the variables in the configuartion to JLanguageTool, even though disabled rules are read from the same config file.
My best guess was to change ignoreRule() resulting in this early concept:

private boolean ignoreRule(Rule rule) {

Category ruleCategory = rule.getCategory();
boolean isCategoryDisabled = (disabledRuleCategories.contains(ruleCategory.getId()) || rule.getCategory().isDefaultOff()) 
        && !enabledRuleCategories.contains(ruleCategory.getId());

/** Verify accuracy rating and severity
 *  @since 3.9
 */

int generalAccuracyLevel = 1;
int ruleAccuracyLevel = 1;
/*String accuracyRating = props.getProperty(accuracyRating);
switch (accuracyRating) {
  case "all"             : generalAccuracyLevel = 0;
                           break;
  case "normal"          : generalAccuracyLevel = 1;
                           break;
  case "high"            : generalAccuracyLevel = 2;
                           break;
  case "perfect"         : generalAccuracyLevel = 3;
                           break;
}     */
String ruleAccuracyRating = rule.getRuleAccuracyLevel();
switch (ruleAccuracyRating) {
  case "off"             : ruleAccuracyLevel = 0;
                           break;
  case "on"              : ruleAccuracyLevel = 1;
                           break;
  case "high"            : ruleAccuracyLevel = 2;
                           break;
  case "perfect"         : ruleAccuracyLevel = 3;
                           break;
}
/* boolean enoughAccuracy = generalAccuracyLevel >= ruleAccuracyLevel;
   boolean enoughSeverity = generalSeverity >= ruleSeverity */

boolean isRuleDisabled = disabledRules.contains(rule.getId()) 
        || (rule.isDefaultOff() && !enabledRules.contains(rule.getId())); 
        /*|| !enoughAccuracy; || !enoughSeverity; */
boolean isDisabled;
if (isCategoryDisabled) {
  isDisabled = !enabledRules.contains(rule.getId());
} else {
  isDisabled = isRuleDisabled;
}
return isDisabled;

}

String accuracyRating = props.getProperty(accuracyRating) For now I am having difficculties here.

Is there any way to read a configuration variable without using the default configuration data loader (or having circular references) or do I really need to move this logic to each of the clients (languagetool-standalone, languagetool-office, etc.) I wish to work on?