Back to LanguageTool Homepage - Privacy - Imprint

Characters to ignore?

I’ve added code to ignore a character, namely \uFEFF (zero-width non-breaking space). If you think other characters should be ignored when checking text, please let me know.

The commit was this one:

The soft hyphen is already ignored, isn’t it?

There are some special cases for the soft hyphen in some places in the code, yes. If you have a case where it should be ignored but it’s not, please let me know.

Is there a reason why handling of “\uFEFF” is different than “\u00AD”?

There shouldn’t be, but when I tried to use the existing way to handle “\uFEFF”, some speller tests started to fail.

In general, it is in the middle of words that look okay, and are, apart from that. So when spell-checking, it should be ignored. For postagging as well.
Having words with that in the dictionary would not be great, better throw a warning for these when building a dictionary, just in case.