Back to LanguageTool Homepage - Privacy - Imprint

[de] Do we have a policy or strategy for hyphenated words?

We’re receiving a lot of users’ suggestions for the German dictionary that include a hyphen, such as ‘SNOWBOARD-WM’.
AFAIK, most spell checkers treat the hyphen as a word separator. Is there a reason why we are not doing it that way? Treating the hyphen as part of the word certainly has some advantages, but the downside is a huge number of false positives. Any ideas?

There must be some special case, because Snowboard-WM is accepted. Feel free to open an issue about that.

1 Like

SNOWBOARD-WM came out oft the OCR-Tool, so I suppose it it has to do something with the “-” - maybe a different char encoding than the normal hyphen

Unfortunately most spellcheckers only do the treat-as-separate-words thing when no user-defined words are involved.

@dnaber That’s true. I tested the following four sentences:

  1. Die US-NOTENBANK sagt ja.
  2. Die US-Notenbank sagt ja.
  3. Eine andere NOTENBANK sagt nein.
  4. Die XCFG-Notenbank reagiert gelassen.

Only the first one yields an unexpected result, namely a spelling error. So apparently the gist is that the spell checker can’t handle hyphenated compounds written as all-caps. Do you think this is worth opening an issue?

Generally yes, but that doesn’t mean I’m going to work on it :slight_smile:

Thank you, that is a very good suggestion. There are a lot of characters that are visually indistinguishable from the normal hyphen. But in this case I made sure that that is not the problem at hand by typing it out manually.

OK, let’s bury that. Now that I understand the underlying mechanism it seems far less important than I thought it was.

@Jan_Schreiber

Could you please explain in a nutshell how I could address this problem in preprocessing the texts for the spellchecking. I think most of the uppercase words come out of my tool.

Thank you in advance

Hi all,

Just resuming this convo. I think hyphenated compounds are fine and Langtool works well on these. The issue I seem to be facing is on initial hyphens:

-Kann sein.
-Oder ihr macht es nie.
-Denkst du, das ist hier der Fall?
-Was?

hyp

Is there any rule we could set to waive the hyphen on first letter?

Thank you!

I was wondering about something similar.
Would it be possible to ignore ‘words’ that start with ellipsis+hyphen?

EG:
Turning on the TV she heard the 8 O’clock News news-anchor say “…-ven-Four-Seven has crashed at Charles du Gaulle. While the emergency services have already responded, there is little they can do unt-…” before she tuned in on her intended channel.