Back to LanguageTool Homepage - Privacy - Imprint

Spellchecker improvement discussion


(Oleg) #41

@dnaber, Could you, please, run the updated features extractor?
And could you also SELECT DISTINCT rule_id FROM corrections WHERE rule_id LIKE "MORFOLOGIK_%";?


(Daniel Naber) #42
MORFOLOGIK_RULE_PL_PL
MORFOLOGIK_RULE_RU_RU
MORFOLOGIK_RULE_CA_ES
MORFOLOGIK_RULE_EN_US
MORFOLOGIK_RULE_UK_UA
MORFOLOGIK_RULE_IT_IT
MORFOLOGIK_RULE_ES
MORFOLOGIK_RULE_EN_GB
MORFOLOGIK_RULE_RO_RO
MORFOLOGIK_RULE_SL_SI
MORFOLOGIK_RULE_EN_AU
MORFOLOGIK_RULE_NL_NL
MORFOLOGIK_RULE_SK_SK
MORFOLOGIK_RULE_AST
MORFOLOGIK_RULE_EL_GR
MORFOLOGIK_RULE_EN_NZ
MORFOLOGIK_RULE_TL
MORFOLOGIK_RULE_BE_BY
MORFOLOGIK_RULE_EN_CA
MORFOLOGIK_RULE_BR_FR
MORFOLOGIK_RULE_EN_ZA
MORFOLOGIK_RULE_SR_EKAVIAN

Will send result of feature extractor soon.


(Oleg) #43

features extractor was erroneously containing a mistake – it worked mostly with en- language based records. Could you, please, run the updated features extractor?


(Daniel Naber) #44

Done, but now the result is rather small (3MB).


(Oleg) #45

I’ve improved the errors handling, so could you, please, run the updated features extractor one more time?


(Oleg) #46

So there are morfologik rules that were never logged (i.e. invoked)? For example “MORFOLOGIK_RULE_DE_DE”.


(Daniel Naber) #47

Sorry, I forgot about our rule ids being inconsistent. German rules are: AUSTRIAN_GERMAN_SPELLER_RULE, GERMAN_SPELLER_RULE, SWISS_GERMAN_SPELLER_RULE


(Oleg) #48

Ok, and are there any other languages with morfologik rules named non-morfologik way?


(Daniel Naber) #49

Not sure, please check the getId() of all classes extending SpellingCheckRule.


(Oleg) #50

Ok, thanks!


(Oleg) #51

@dnaber could you, please, run the updated features extractor? I’ve added %GERMAN and FR rules extraction and increased the context window size – now equals to 3.


(Daniel Naber) #52

Running java -jar languagetool-suggestions-logs-features-extractor-1.7.jar I now get:

Exception in thread "main" java.lang.ExceptionInInitializerError
Caused by: java.lang.RuntimeException: Could not activate rules
	at org.languagetool.JLanguageTool.<init>(JLanguageTool.java:192)
	at org.languagetool.JLanguageTool.<init>(JLanguageTool.java:167)
	at io.github.oserikov.languagetool.Main$1.<init>(Main.java:68)
	at io.github.oserikov.languagetool.Main.<clinit>(Main.java:65)
Caused by: java.io.IOException: Cannot load or parse input stream of '/org/languagetool/rules/fr/grammar.xml'
	at org.languagetool.rules.patterns.PatternRuleLoader.getRules(PatternRuleLoader.java:76)
	at org.languagetool.Language.getPatternRules(Language.java:368)
	at org.languagetool.JLanguageTool.activateDefaultPatternRules(JLanguageTool.java:368)
	at org.languagetool.JLanguageTool.<init>(JLanguageTool.java:189)
	... 3 more
Caused by: java.lang.IllegalArgumentException: 'fr' is not a language code known to LanguageTool. Supported language codes are: be-BY, br-FR, ca-ES, de-AT, el-GR, en-AU, en-CA, en-GB, en-NZ, en-US, en-ZA, es, it, nl, pl-PL, ro-RO, ru-RU, sk-SK, sl-SI, sr, tl-PH, uk-UA. The list of languages is read from META-INF/org/languagetool/language-module.properties in the Java classpath. See http://wiki.languagetool.org/java-api for details.
	at org.languagetool.Languages.getLanguageForShortCode(Languages.java:151)
        (...)

(Oleg) #53

Aww, will fix tonight, now afk.


(Oleg) #54

Could you, please re-download the tool? I’ve updated the release with a fix.