Words with typographic apostrophe marked wrong for Belarusian

The Unicode Consortium says that U+2019 is preferred as apostrophe [1]. The Belarusian language uses quite a lot of apostrophes as part of words (considered an extra-alphabetical character). LanguageTool, however, rejects Belarusian words written with U+2019, while written with U+0027 are fine.

Steps to reproduce:

  1. Copy to the clipboard the following words:
З'яўляцца
З’яўляцца
п'яны
п’яны
  1. Go to https://languagetool.org/ , paste the words to the sample window (first, click Edit to get raw words), select Belarusian for the language to check. The first and third words (with U+0027) are OK, the second and forth (with U+2019) are marked wrong, their counterparts with U+0027 are suggested as correct in the first place.
  2. Go to any website that supports text input fields, paste the words, wait for LanguageTool to recognize the language with the same result as in 2 (tested with 5.1.1, last updated on April 21, 2022, in Mozilla Firefox 94.0.2 64-bit, Microsoft Windows 7).
  3. Launch the standalone version of LanguageTool (languagetool.jar, tested with the snapshot of April 25, 2022), paste the words, select Belarusian for the language to check. Again, the same result as in 2.

[1] https://www.unicode.org/versions/Unicode13.0.0/ch06.pdf

Hi,
Thanks for the comment. I added a quick fix to this problem. [be] allow words with character \u2019 · languagetool-org/languagetool@24d3438 · GitHub

1 Like

Thanks! Waiting for a snapshot build to test (I don’t know how to build from source, meaning that I can’t do anything that requires more skills than opening a project in NetBeans and hitting Shift+F11 :-))

Tested with the snapshot build of April, 26. Works fine.

Thank you!

A further question. Are both apostrophes used as quotation marks in Belarusian? Do they work as expected in LanguageTool?

Normally they aren’t (and I did not test them as quotation marks). Mostly «» are used, less frequently „“ or “” (and, of course, straight

""

by lazy typists [the forum converts them to fancy “”, so I had to break the sentence]).

The Unicode Consortium says that U+2019 is preferred as apostrophe [1].

Actually U+02BC is preferred for Belarusian. From the linked pdf file: Letter Apostrophe. U+02BC modifier letter apostrophe is preferred where the apostrophe is to represent a modifier letter (for example, in transliterations to indicate a glottal stop). In the latter case, it is also referred to as a letter apostrophe.”

U+02BC apostrophe is also the only supported apostrophe type for use in Belarusian domain names on the Internet (".бел" zone) and the other types of apostrophes are forbidden there. But right now LanguageTool can’t spellcheck U+02BC apostrophe correctly.

Submitted a bugreport at github: Letter Apostrophe U+02BC is incorrectly rejected by the spellchecker in Belarusian texts · Issue #8366 · languagetool-org/languagetool · GitHub