Slovak spelling in Pages by Apple

Hello,
this is my first post here. I tried to look for my topic, but no luck…

I found here LanguageTool » Extensions a spelling dictionaries package and I would like to ask if its somehow possible to extract the Slovak .aff and .dic files to use them for Slovak spellcheck in Apple Pages.

Thank you

1252

Hi, I think you can get the files at hunspell-sk | sk-spell, probably no need to take them from LanguageTool.

Hello Daniel,
I know that source but the files are very old. Not that the language changed, but the word count for Slovak is very low. Also there is some other problem, I cannot identify it. Using in Pages the language pack from the recommended source this is underlining a word and the offered replacement is absolutely identical…

Unfortunately, the files used in LT aren’t more up-to-date either. We just took the spell checking dictionaries from that site… (I can’t help with Apple Pages, sorry.)

The dictionaries could be improved from good Slovak text sources when available. e.g. e-books can help a lot. If you would feel like enhancing the Slovak dictionary, I would be able to get you going in September.

Hello Ruud,
thank you for good news. Should I place a reminder here at the beginning of September?

If you send me the most correct Hunspell set for Slovak, I will run a Slovak words frequency list through it and send you the results. I will combine it with some data I already have. Then I will leave the manual work to you.

You can e-mail me at info (at) taaltik.nl

Thank you for reply Ruud!

Now I have one dumb question.
The Hunspell sk.aif and sk.dic files are different to the same named files from other free Offices ???

I would not know. They might, or might not. I do have a version myself, but I would rather compare that with what you are using.
My first checks of my files result in this:

  • there are no proper names in it, or they are not in there using capitals, which is bad (unless Slovak does not require proper names to be capitalized) If proper names is an issue, we might catch the most used by checking them against the other languages of LT; it would be an easy extension. Maybe there is a baby naming site for Slovakian names?
  • the full supported set of words has 1.5 million words. Quite a lot; lots of flexies.
  • Spaces are suggested; is Slovak a compounding language? (hypersenzitivita suggests it might be.) If so, this might be a bad choice!
  • checking the words in my text database from most frequent words, the top not accepted words and their Hunspell suggestions are:
    & SR 15 0: S, R, SER, SIR, SRD, SRŠ, SÍR, SYR, SÚR, SÝR, SÉR, SRĎ, SO, OR, SA
    & sk 15 0: ks, ak, s, k, sek, sok, sak, sák, syk, súk, so, sa, si, ss, sú
    & USA 11 0: SA, USLA, PUSA, KUSA, USAĎ, ESA, OSA, UST, PSA, UJA, U SA
    & Bratislave 2 0: Bratislavský, Bratislavsky
    & EÚ 15 0: E, Ú, EN, TÚ, MÚ, KÚ, ES, SÚ, EJ, EH, HÚ, BÚ, EG, EX, E Ú
    & eur 8 0: eure, euro, eura, euru, eurá, ebur, tur, dur
    & Bratislava 2 0: Bratislavsky, Bratislavský
    & sv 13 0: s, v, sov, so, ov, sa, si, iv, uv, ss, sú, sň, xv
    & Ján 12 0: Rán, Lán, Mán, Pán, Bán, Jám, Já, Jen, Jáj, Jún, Jáw, Já n
    & Európe 1 0: Európske
    & Európy 1 0: Európsky
    & of 14 0: pf, od, o, f, on, oa, or, ov, op, ou, os, oj, oh, oň
    & Peter 8 0: Neter, Meter, Seter, Perte, Poter, Páter, Perlete, Perute
    & TASR 7 0: RAST, TAS, TRAS, TASÁ, TASÍ, TAS R, TARTAS
    & minút 8 0: minúť, minú, minúte, minúta, minútu, minúty, minúc, minú t
    & Martin 1 0: Martinský
    & The 11 0: He, Ehe, Che, Tuhe, Túhe, Het, Tne, Tie, Tme, Tse, T he
    & Slováci 2 0: Slovách, Slováckom
    & ze 15 0: že, z, e, zen, zev, zve, zle, zem, zje, úze, zo, ne, za, ie, re
    & Slovákov 5 0: Slová kov, Slová-kov, Slováckom, Slovko, Kosáková
    & Jána 8 0: Rána, Lána, Pána, Bána, Jena, Júna, Já na, Já-na
    & Rusko 10 0: Ruko, Ruskom, Ruskou, Ruská, Ruskí, Rusky, Ruskú, Ruský, Ruské, Rusínsko
    & Jozef 1 0: Jozefský
    & Košice 2 0: Košine, Košickej
    & Bratislavy 2 0: Bratislavsky, Bratislavský
    & New 5 0: Ne, Nem, Nes, Nej, Než
    & Ježiš 4 0: Ježíš, Ježia, Ježil, Ježiť
    & Česku 10 0: Lesku, Českú, Sečku, Česky, Českou, Česká, Českí, Český, České, Česáčku
    & the 11 0: he, tuhe, túhe, het, ehe, tne, tie, tme, tse, che, t he
    (Obviously, there is some (a lot of) junk in the collected data as well. That is the manual labour that takes the most time…

A large set of Slovakian ebooks, if you can lay a hand on them, might improve the words list quite a bit.

OK Ruud, I will check during the weekend.
If the version you have has 1,5 million words, mine is totally crippled. Where can I download your version please?

The Linguee dictionary app has Slovak in the pipeline, the interesting thing is that for contextual translation (through DeepL) they use the different European Union language laws and other documents. The word count on those documents must be very high and as you can see from the attachment this can be dowloaded, but no idea how to do it

Lets proceed on email. info at taaltik.nl