Back to LanguageTool Homepage - Privacy - Imprint

Characters to ignore?

I’ve added code to ignore a character, namely \uFEFF (zero-width non-breaking space). If you think other characters should be ignored when checking text, please let me know.

The commit was this one: https://github.com/languagetool-org/languagetool/commit/046ca9edb64b761ca8e21693b5a9a1d34541d157

The soft hyphen is already ignored, isn’t it?

There are some special cases for the soft hyphen in some places in the code, yes. If you have a case where it should be ignored but it’s not, please let me know.

Is there a reason why handling of “\uFEFF” is different than “\u00AD”?

There shouldn’t be, but when I tried to use the existing way to handle “\uFEFF”, some speller tests started to fail.

In general, it is in the middle of words that look okay, and are, apart from that. So when spell-checking, it should be ignored. For postagging as well.
Having words with that in the dictionary would not be great, better throw a warning for these when building a dictionary, just in case.