Back to LanguageTool Homepage - Privacy - Imprint

Polish combined characters

I have found an extremely rare encoding of Polish language text by chance here:

http://gazetaolsztynska.pl/653158,Robert-Biedron-o-Polsce-Warmii-i-Mazurach-i-Olsztynie-ZDJECIA.html

“Często słyszę, że jesteście niezadowoleni z sytuacji w kraju. Nie podoba Wam się, że obecna władza niszczy nasze państwo, łamie prawa kobiet i mniejszości, ignoruje kwestie klimatu i nie dostrzega prawdziwych problemów obywateli. Porozmawiajmy, jak to zmienić!”

This text may have been copied from Facebook to the custom CMS:

or alternatively emailed to the journalist as a press release. It breaks each Polish diacritic character into two, so
ę is U+0065 LATIN SMALL LETTER E, U+0328 COMBINING OGONEK instead of typical U+0119 LATIN SMALL LETTER E WITH OGONEK
ż is U+007A LATIN SMALL LETTER Z, U+0307 COMBINING DOT ABOVE instead of typical U+017C LATIN SMALL LETTER Z WITH DOT ABOVE
ś U+0073 LATIN SMALL LETTER S, U+0301 COMBINING ACUTE ACCENT instead of typical U+015B LATIN SMALL LETTER S WITH ACUTE
ń U+006E LATIN SMALL LETTER N, U+0301 COMBINING ACUTE ACCENT instead of typical U+0144 LATIN SMALL LETTER N WITH ACUTE
ł is standard U+0142 LATIN SMALL LETTER L WITH STROKE
ó is U+006F LATIN SMALL LETTER O, U+0301 COMBINING ACUTE ACCENT instead of typical U+00F3 LATIN SMALL LETTER O WITH ACUTE
ć is standard U+0107 LATIN SMALL LETTER C WITH ACUTE oddly
This is still Unicode. Relevant Unicode block is Combining Diacritical Marks https://en.wikipedia.org/wiki/Combining_Diacritical_Marks.
The full set of Polish diacritical characters to analyze is:
ą U+0105 LATIN SMALL LETTER A WITH OGONEK
ć U+0107 LATIN SMALL LETTER C WITH ACUTE
ę U+0119 LATIN SMALL LETTER E WITH OGONEK
ł U+0142 LATIN SMALL LETTER L WITH STROKE
ń U+0144 LATIN SMALL LETTER N WITH ACUTE
ó U+00F3 LATIN SMALL LETTER O WITH ACUTE
ś U+015B LATIN SMALL LETTER S WITH ACUTE
ź U+017A LATIN SMALL LETTER Z WITH ACUTE
ż U+017C LATIN SMALL LETTER Z WITH DOT ABOVE
Ą U+0104 LATIN CAPITAL LETTER A WITH OGONEK
Ć U+0106 LATIN CAPITAL LETTER C WITH ACUTE
Ę U+0118 LATIN CAPITAL LETTER E WITH OGONEK
Ł U+0141 LATIN CAPITAL LETTER L WITH STROKE
Ń U+0143 LATIN CAPITAL LETTER N WITH ACUTE
Ó U+00D3 LATIN CAPITAL LETTER O WITH ACUTE
Ś U+015A LATIN CAPITAL LETTER S WITH ACUTE
Ź U+0179 LATIN CAPITAL LETTER Z WITH ACUTE
Ż U+017B LATIN CAPITAL LETTER Z WITH DOT ABOVE

It even does not look properly in the font this news site uses “Open Sans”, it looks OK in “Source Sans Pro” you use though. Obviously, your system marks involved words as erroneous.