Back to LanguageTool Homepage - Privacy - Imprint

Consistency of spelling

(Ruud Baars) #1

Some words can be spelled in different ways (e.g. btw and BTW, or deur-knop/deurknop), all of them correct. But it is a good thing to keep it constant across an entire document. Is there a function to get this done?

(Daniel Naber) #2

Yes, there’s a file coherency.txt for some languages. I’ll activate it for Dutch, I will let you know when it’s available.

(Lodewijk Arie van Brienen) #3

I sometimes use alternate spellings to hint at (minor) accents, so would it be possible to disable this (as a rule)?

(Daniel Naber) #4

The file is now here. Could you also provide translations for these strings? Feel free to post them here.

(Daniel Naber) #5

It’s a rule like any other rule, so it can be enabled/disabled, just like any other rule.

(Lodewijk Arie van Brienen) #6

just wanted to be sure.

(Ruud Baars) #7

We could discuss the variations. Casing might not be an issue, but accents might. Let’s try it first.

(Ruud Baars) #8

Do not mix variants of the same word (’" + word1 + “’ and '” + word2 + “’) within a single text.
Gebruik liever geen verschillende spellingen (’” + word1 + “’ en '” + word2 + "’) door elkaar in een tekst.

Consistente spelling van woorden met meerdere correcte vormen.

(P.S. Is is strictly limited to 2? There are probably more in Dutch…)

(Ruud Baars) #9

I added hivtest and hiv-test as variants forst, to get an idea of how it will work.

(Daniel Naber) #10

Thanks, I’ve added the translations. Could you also come up with an example in which both variants of such a word are used? This is shown when the user wants to see an example.

Currently it is.

(Ruud Baars) #11

Not a big issue; they can always be added as pairs.

(Ruud Baars) #12

An example: We raden af om in één tekst zowel hivtest als hivtest te schrijven.

(Ruud Baars) #13

The rule is working fine. The rule apparently assumes the first hit is the best, second is reported as deviation.
Understandable. I Dutch however, both variants are correct, but one is considered better than the other (hivvirus is the ‘base’ form, hiv-virus ‘optional’).

So a whish of mine would be to to alter the function so the first item in the pair in the consistency file is the best one. Should i make it a request in Github?

(Robin van der Vliet) #14

I do not fully agree with that reasoning for this specific word, the dictionary of the Dutch Language Union contains both forms with the following clarification:

Bij enkele woorden kan het eerste deel in losse letters worden uitgesproken, maar ook als een geheel. In het eerste geval komt er een verbindingsstreepje tussen de beide delen van het woord, in het tweede geval worden ze zonder meer aan elkaar geschreven.

So the prefered ortography depends on how you actually pronounce the word. But I still agree that there should be a way to indicate a preferred orthography, as there are words where there is a clear preference. I am creating an issue on GitHub with some of my ideas for the word coherency rule.

(Ruud Baars) #15

You are correct for this example. But even then, it would be better for consistency to hava e preferred one. Consider the situation:
I can say hivvirus, but also hiv-virus, but also hiv-patient and hivpatiënt.
Here hiv-virus and hivpatiënt will be marked, which is also an inconsisteny.

(Robin van der Vliet) #16

I think that ideally we should only have one line in the file with a regular expression, which matches all compound words with “hiv”. I made an issue about it here.