Back to LanguageTool Homepage - Privacy - Imprint

[de] Do we have a policy or strategy for hyphenated words?


(Jan Schreiber) #1

We're receiving a lot of users' suggestions for the German dictionary that include a hyphen, such as 'SNOWBOARD-WM'.
AFAIK, most spell checkers treat the hyphen as a word separator. Is there a reason why we are not doing it that way? Treating the hyphen as part of the word certainly has some advantages, but the downside is a huge number of false positives. Any ideas?


(Daniel Naber) #2

There must be some special case, because Snowboard-WM is accepted. Feel free to open an issue about that.


#3

SNOWBOARD-WM came out oft the OCR-Tool, so I suppose it it has to do something with the "-" - maybe a different char encoding than the normal hyphen


(Lodewijk Arie van Brienen) #4

Unfortunately most spellcheckers only do the treat-as-separate-words thing when no user-defined words are involved.


(Jan Schreiber) #5

@dnaber That's true. I tested the following four sentences:
1. Die US-NOTENBANK sagt ja.
2. Die US-Notenbank sagt ja.
3. Eine andere NOTENBANK sagt nein.
4. Die XCFG-Notenbank reagiert gelassen.

Only the first one yields an unexpected result, namely a spelling error. So apparently the gist is that the spell checker can't handle hyphenated compounds written as all-caps. Do you think this is worth opening an issue?


(Daniel Naber) #6

Generally yes, but that doesn't mean I'm going to work on it :slight_smile:


(Jan Schreiber) #7

Thank you, that is a very good suggestion. There are a lot of characters that are visually indistinguishable from the normal hyphen. But in this case I made sure that that is not the problem at hand by typing it out manually.


(Jan Schreiber) #8

OK, let's bury that. Now that I understand the underlying mechanism it seems far less important than I thought it was.


#9

@Jan_Schreiber

Could you please explain in a nutshell how I could address this problem in preprocessing the texts for the spellchecking. I think most of the uppercase words come out of my tool.

Thank you in advance


(Sergio Giustozzi) #10

Hi all,

Just resuming this convo. I think hyphenated compounds are fine and Langtool works well on these. The issue I seem to be facing is on initial hyphens:

-Kann sein.
-Oder ihr macht es nie.
-Denkst du, das ist hier der Fall?
-Was?

hyp

Is there any rule we could set to waive the hyphen on first letter?

Thank you!


(Lodewijk Arie van Brienen) #11

I was wondering about something similar.
Would it be possible to ignore ‘words’ that start with ellipsis+hyphen?

EG:
Turning on the TV she heard the 8 O’clock News news-anchor say “…-ven-Four-Seven has crashed at Charles du Gaulle. While the emergency services have already responded, there is little they can do unt-…” before she tuned in on her intended channel.