Last week, I added Qur’an to spelling.txt.LT gives a spelling error:
As a test, I added these terms to spelling.txt in my local copy of LT:
notcap’italised
Capti’lised
LT gives warnings about spelling for those terms. Should it give warnings?
Last week, I added Qur’an to spelling.txt.LT gives a spelling error:
As a test, I added these terms to spelling.txt in my local copy of LT:
notcap’italised
Capti’lised
LT gives warnings about spelling for those terms. Should it give warnings?
There are two solutions for this issue:
multiwords.txt
file, where words like “Qur’an” or “Palme d’or” are tagged regardless of their number of tokens.But, there are hyphenated words in spelling.txt, and they give the expected result. (That is why I showed Palme d’or as an example.)
The current English word tokenizer creates one token for “avant-garde”, but four tokens for “Palme d’or”. If there is no error in “Palme d’or”, it is because each token individually is allowed as an independent word.
So the current results are the expected results. The fix is one of the two solutions I mentioned.
If you want to tag expressions containing white spaces (like “Palme d’or”), the multiwords.txt file is necessary.
Palme alone gives a spelling error, but not Palme d’or.
I see. “Palme d’or” has its own disambiguation rule.
You can write rules like this (for “Palme d’Or”, “Qur’an”, etc.) or you can use a multiwords.txt file. The result will be equivalent.
@jaumeortola, thanks, but I do not understand.
In some languages, when a token is tagged the spelling is ignored automatically (for example: languagetool/MorfologikCatalanSpellerRule.java at master · languagetool-org/languagetool · GitHub). That’s not the case in English, it seems, and I was not aware of it.
I see now that spelling.txt also allows multiwords. Perhaps it doesn’t support multiwords without white spaces (like Qur’an)? I don’t know. The implementation of multiwords in spelling.txt is different from what I implemented in multiwords.txt. I’m no able to provide more help. Perhaps @Knorr or @dnaber can help you.
The fact that words with spaces work in spelling.txt
is because this has been implemented as a special case (here). This could surely be improved…
@jaumeortola, thanks for your comments. I didn’t know about the different behaviours in different languages.
@dnaber, Qur’an does not contain a space.