Back to LanguageTool Homepage - Privacy - Imprint

LanguageTool lang code details


#1

Hi,
Below is my attempt to provide a "one of each" choice for "lang" code.
Is this complete and accurate?

I saw several other codes such as,
[{code: 'ES', name: 'General'}, {code: 'ES-Valencia', name: 'Valencian'}]

ast-ES, be-BY, br-FR, ca-ES, ca-ES-valencia, da-DK, de, de-AT, de-CH, de-DE, de-DE-x-simple-language, el-GR, en, en-AU, en-CA, en-GB, en-NZ, en-US, en-ZA,
eo, es, fa, fr, gl-ES, is-IS, it, ja-JP, km-KH, lt-LT, ml-IN, nl, pl-PL, pt, pt-BR, pt-PT, ro-RO, ru-RU, sk-SK, sl-SI, sv, ta-IN, tl-PH, uk-UA, zh-CN.

So for example, what is the difference between:
zh and zh-CN?
ca and ca-ES and ca-ES-Valencia?

What are all the Spanish possibilities?

(used <pre> for this)

<select name="lang" id="lang">

<option value="en-US" selected='selected'>English American</option>
<option value="en-GB" >English British</option>
<option value="en-CA" >English Canada</option>
<option value="en-AU" >English Australia</option>
<option value="en-NZ" >English New Zealand</option>
<option value="en-ZA" >English South Africa</option>

<option value="auto" >Auto-detect</option>
<option value="ast" >Asturian</option>
<option value="be" >Belarusian</option>
<option value="br" >Breton</option>

<option value="ca" >Catalan</option>
<option value="ca-ES-Valencia" >Catalan Valencia</option>

<option value="zh" >Chinese</option>
<option value="da" >Danish</option>
<option value="nl" >Dutch</option>
<option value="eo" >Esperanto</option>
<option value="fr" >French</option>
<option value="gl" >Galician</option>

<option value="de-DE" >German Germany</option>
<option value="de-AT" >German Austria</option>
<option value="de-CH" >German Switzerland</option>
<option value="el" >Greek</option>
<option value="is" >Icelandic</option>
<option value="it" >Italian</option>
<option value="ja" >Japanese</option>
<option value="km" >Khmer</option>
<option value="lt" >Lithuanian</option>
<option value="ml" >Malayalam</option>
<option value="fa" >Persian</option>
<option value="pl" >Polish</option>

<option value="pt-PT" >Portuguese Portugal</option>
<option value="pt-BR" >Portuguese Brazil</option>

<option value="ro" >Romanian</option>
<option value="ru" >Russian</option>
<option value="sk" >Slovak</option>
<option value="sl" >Slovenian</option>

<option value="es" >Spanish</option>

<option value="sv" >Swedish</option>
<option value="ta" >Tamil</option>
<option value="tl" >Tagalog</option>
<option value="uk" >Ukrainian</option>
      
</select>

(Josep Bofarull Gallés) #2

The Spanish keyboard includes of course de ñ (n with tilde) used only in Spanish language plus Ç, ç (c-cedilla), grave accent (à) and Interpunct (l·l) not used in Spanish language but used in others languages in Spain.
I think the code 'ES', name: 'General' is for Spanish languages or for Spanish computers. I remember a discussion in other forum about Occitan in ES code and FR code.
Jaume do you know that?


(Daniel Naber) #3

There's no difference. Specifying the country code (CN in this case) only makes a difference for languages that have special rules for that country variant. Typically, spell checking differs.


(jaumeortola) #4

ca or ca-ES is Catalan. ca-ES-valencia is the variant of Catalan spoken in Valencia, called Valencian. As most speakers of Catalan are in Spain, the country code is ES for both variants.


(jaumeortola) #5

Where did you get these ones? The language code is missing here in both: ca (Catalan).
* ca-ES for Catalan (general), but most of the times we drop the adjective "general"
* ca-ES-valencia for Catalan (Valencian)


#6

I saw it with a "variable" bookmarklet on LT main page, which is how I learned of "Valencia".
So I would add:
"ca > Catalan" and "ca-ES-Valencia > Catalan Valencia" to my "one of each" select shown
at the top of this post?
(the ca-ES) would not be necessary)?
Any other missing or incorrect code possibilities?

Thanks


#7

I am a little surprised that there is only one Spanish (es) needed. There are so many different
Spanish spell checkers for different countries out there.


(Taha) #8

Hello,
I am trying to add Arabic language module, the arabic language code is "ar".
There are many countries codes for arabic for example 'ar-DZ, ar-TN, ar-EG, ar-SA,,,,,', all those codes have the same language tokenizer, dictionary and spellchecker, How can I configure them to in language module?
When I call the Spellchecker by 'ar' code, it ask me to give it a country code, How to do an alias for this
The forked LT project for arabic is on https://github.com/linuxscout/languagetool
Thanks


(Daniel Naber) #9

I'm not sure what you mean by that, what's the exact message you get? In general, LT doesn't care about the country codes unless there are differences in spelling. For example, for French there's only fr without any country code, as the spelling dictionary is always the same for all the countries in which French is spoken.


(Taha) #10

ok,
when I do tests with regression-test:
1- When I do
./regression-test.sh ar tests/tests 1000 semantic_errors

LT works and I get:

[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 01:14 min
[INFO] Finished at: 2017-03-24T22:19:09+01:00
[INFO] Final Memory: 44M/392M
[INFO] ------------------------------------------------------------------------
3.01kB 0:00:00 [40.3MB/s] [========================================================================================>] 100%
Expected text language: Arabic (no spell checking active, specify a language variant like 'en-GB' if available)
Working on STDIN...

2- When I use ar-DZ,
./regression-test.sh ar-DZ tests/tests 1000 semantic_errors
LI works, and I get

  `[INFO] ------------------------------------------------------------------------

[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 01:46 min
[INFO] Finished at: 2017-03-24T22:33:51+01:00
[INFO] Final Memory: 44M/382M
[INFO] ------------------------------------------------------------------------
3.01kB 0:00:00 [ 70MB/s] [========================================================================================>] 100%
java.lang.IllegalArgumentException: 'ar-DZ' is not a language code known to LanguageTool. Supported language codes are: ar, ast-ES, be-BY, br-FR, ca-ES, ca-ES-valencia, da-DK, de, de-AT, de-CH, de-DE, de-DE-x-simple-language, el-GR, en, en-AU, en-CA, en-GB, en-NZ, en-US, en-ZA, eo, es, fa, fr, gl-ES, it, ja-JP, km-KH, nl, pl-PL, pt, pt-AO, pt-BR, pt-MZ, pt-PT, ro-RO, ru-RU, sk-SK, sl-SI, sv, ta-IN, tl-PH, uk-UA, zh-CN. The list of languages is read from META-INF/org/languagetool/language-module.properties in the Java classpath. See http://wiki.languagetool.org/java-api for details.
`

My spell checker is Hunspell, and is configured for ar-DZ
How can I solve this problem


(Daniel Naber) #11

That's strange - does spell checking actually work or not? You might need to debug this, the message comes from org.languagetool.commandline.Main, line 441.