Back to LanguageTool Homepage - Privacy - Imprint

How does "disambiguation.xml" work?


(SP) #1

Hello,
I created the dictionary file for french language (using mvn package and buiding a new LT snapshot) and it seems to work.
Unfortunately there's a thing that I don't understand: even if in my dictionary there's the word "aujourd'hui", when it appears in my list of words to be corrected, the spell checker sees this word as a mistake.

I created then a rule in disambiguation.xml for ignoring this word during the spell checking but nothing has changed.

Where am I making a mistake?


(Daniel Naber) #2

Could you post the XML you've added to disambiguation.xml?


(SP) #3

I tried in 2 different ways:

  <rule name="aujourdhui" id="aujourdhui">
        <pattern>
            <marker>
                <token>aujourd'hui</token>
            </marker>
        </pattern>
        <disambig action="ignore_spelling"/>
    </rule>

  <rule name="aujourdhui" id="aujourdhui">
        <pattern>
            <marker>
                <token>aujourd'</token> <token>hui</token>
            </marker>
        </pattern>
        <disambig action="ignore_spelling"/>
    </rule>

(jaumeortola) #4

As you can see here: http://community.languagetool.org/analysis/analyzeText, this word is tokenized in three tokens. You need:

<rule name="aujourdhui" id="aujourdhui">
        <pattern>
            <marker>
                <token>aujourd</token>
                <token spacebefore="no">'</token>
                <token spacebefore="no">hui</token>
            </marker>
        </pattern>
        <disambig action="ignore_spelling"/>
    </rule>

You can try to add the word to the file "multiwords.txt" in the French resource foulder. I'm not sure if it will work in French.


Adding a word with an apostrophe to the spelling checker: Bahá'í
(SP) #5

Ok thanks It works !
I noticed also that when I launch the command to start the spell checker, the output file changes from utf8 to ANSI, even if the window console is set as 65001 - unicode char and even if I specify the encoding during the launch of the spell checker. Any suggestions?