There’s an interesting problem popped up in crh module.
Most locales are non-conflicting when we need to do the case conversion, so our StringTools methods that deal with case don’t take the locale as an argument.
But there are few locales that have non-standard rules, e.g. in several locales like tr, tt, kk, az, and crh have an “i” conversion different:
- Lowercase “I” → “ı” (dotless i)
- Uppercase “i” → “İ” (dotted İ)
I was able to fix few places (tagger and spellchecking recognition for this by using “tr_UA” locale for crh - note: there’s no “crh” locale unfortunately), but spellchecking suggestion still uses locale-less case conversion so misspelled İngliiz will have an (incorrect) suggestion of Іngliz instead of İngliz.
On one side this is very odd case and only affects one language in LT but on the other side any character manipulation should use the language locale for all operations consistently. E.g. we use locale argument in some calls to toLowerCase() in LT but not in others.
So I am wondering if we should adjust StringTools to use the language locale and some other places with toLower/UpperCase should do that consistently too.
Another aspect of this is that we have a locale (optionally) specified in the speller dictionary .info file - that is used by morfologik speller, but only 2 languages actually specify that local, and you also specify it in the language class. I wonder if need to make them more in sync.