Back to LanguageTool Homepage - Privacy - Imprint

New dashes to hyphens rule


(Marcin Miłkowski) #1

I have developed a simple rule that takes the compound file (just like our CompoundRule) but checks for n-dashes and m-dashes used instead of a hyphen, for example:

Papua — Nowa Gwinea

instead of

Papua-Nowa Gwinea

I am not sure if such behavior is fine for other languages. But if it is, the DashRule I've written can be very easily customized. It just takes the compound file and creates PatternRules that check for the pattern in question. Definitely, this makes development much easier (instead of writing these rules in the grammar file) and doesn't really slow down LT as these rules are very simple.

Best,
Marcin


(Mike Unwalla) #2

@Marcin,

This rule would be useful for English.

Sorry for the late reply.


(Marcin Miłkowski) #3

No problem, I was unable to code for some time anyway. I will then make this rule soon available also for English.


(Marco A.G.Pinto) #4

@tiagosantos

Do we have this one in PT?


(Marcin Miłkowski) #5

No, but you can easily produce it. As far as I can see, there are some complications because you have three variants of compounds file, and I'm not sure which one should be used. Moreover, two strings are required to produce the comment. The displayed error:

  • "A dash was used instead of a hyphen. You should probably write: "

And the description of the rule:

  • "Checks if hyphenated words were spelled with dashes (e.g., 'T — shirt' instead 'T-shirt')."

(Tiago F. Santos) #6

Hi @MarcinMilkowski,

Many thanks for your prompt reply and contribution. At the time, I wanted to thank you for this contribution and sharing but there is always a lot of things that I want to do at the same time, and it ended up slipping my mind.
This is a great rule, and I will eventually port it to Portuguese, like I did with some other Polish rules. I just haven't prioritized it yet.
The conversion should be easy. It needs to be done for each variant individually, since they use different compound word databases.

Anyway, I would like to leave here my (late) thanks for the sharing atitude that you showed on this post.
Even if, I hadn't handled this rule yet, it is on the plans.

Best regards


(Mike Unwalla) #7

Thanks for the new rule @MarcinMilkowski.


(Marcin Miłkowski) #8

I looked at the diff, and it seems to have worked fine.

Best,
Marcin


(Marcin Miłkowski) #9

Sure, no problem. I think the rule is very easy to adapt (have a look at the English version here:

Basically, this is just a couple of lines of code. Plus the appropriate JUnit test, equally easy to write up.

Best,
Marcin


(Tiago F. Santos) #10

Hi Marcin,

I pushed now the hooks to the Portuguese variants.

Again, many thanks.
Best regards,

Tiago