Back to LanguageTool Homepage - Privacy - Imprint

[de] Potential minor spell-check suggestion tweak


(Jan Schreiber) #1

Current behavior: For the misspelled German word “Verhlaten”, I get the following suggestions (in that order):

  • Verladen
  • Verlagen
  • Verhalten
  • Erraten
  • Verraten

That’s fine. The first three suggestions all have an edit distance of two from the original string. However, maybe this can be tweaked a little.

Desired behavior: IMO, the most intuitive sorting of the first three items would be

  • Verhalten
  • Verladen
  • Verlagen

I think this is because “Verhalten” (a) consists of exactly the same characters as the original string, and (b), more specifically, it can be obtained from the original string by swapping two adjacent characters (this is probably one of the most common fixes for typos). Intuitively speaking, “Verhalten” is even closer to the original string than the other two. I wonder if these two criteria (a) and (b) can be used to tweak the order of the suggestions. [Also, “Verladen” is better than “Verlagen” because t and d sound similar, but I think this is already covered in de_DE.info.]


(Daniel Naber) #2

This deserves some documentation in the bug tracker so it won’t be forgotten. I’ve created an issue with analysis at https://github.com/languagetool-org/languagetool/issues/1255


(Jan Schreiber) #3

Apparently the rh → r replacement is part of the problem according to your analysis.

The reasoning behind it is that users often seem confused about the position of the h character in Greek loan words. Exaggerated example, but not very far from what users actually suggest: terhapheuthisch.

On a somewhat related note: In my tests, I got rhe in as first suggestion for rhein (lowercase). Should we prohibit the word rhe? https://de.wikipedia.org/wiki/Rhe_(Einheit)


(Daniel Naber) #4

It will not help yet, as rhe in is generated and that generation step will not consider the prohibited.txt