Back to LanguageTool Homepage - Privacy - Imprint

Strange bug in English


(John) #1

Hi,

I am testing the LT standalone java application...but come across simple but strange bugs.

Tested texts:

i am john.
i am susan.

The results (ignore the Chinese characters):

  1. 行 1, 列 1
    错误信息: This sentence does not start with an uppercase letter (使无效)
    校正: I
    错误字段: i am john. i am susan.

  2. 行 1, 列 1
    错误信息: Did you mean I? (使无效)
    校正: I
    错误字段: i am john. i am susan.

  3. 行 2, 列 1
    错误信息: This sentence does not start with an uppercase letter (使无效)
    校正: I
    错误字段: i am john. i am susan.

  4. 行 2, 列 1
    错误信息: Did you mean I? (使无效)
    校正: I
    错误字段: i am john. i am susan.

  5. 行 2, 列 6
    错误信息: Possible spelling mistake found (使无效)
    校正: Susan
    错误字段: i am john. i am susan.


The problem is why john cannot be checked out, while susan can be ?

Based on the grammar.xml, this problem, looks like, does not make sense..as the first rule is to check the error:

<rulegroup default="off" id="EN_CAPITALIZE" name="Capitalize lowercase words ('i am')">

            <rule> 
                <pattern case_sensitive="yes">
                    <and>
                        <token inflected="yes" regexp="yes">[A-Z][a-z]+</token>
                        <token regexp="yes">[a-z]+</token>
                    </and>
                </pattern>
                <message>The word \1 probably should be uppercase: <suggestion><match case_conversion="startupper" no="1"></match></suggestion>.</message>
                <example correction="Susan" type="incorrect">My name is <marker>susan</marker>.</example>
                <example type="correct">My name is Susan.</example>
            </rule>


<rule>    
                <pattern case_sensitive="yes">
                    <token>i</token>
                </pattern>
                <message>This should be written in uppercase: <suggestion>I</suggestion>.</message>
                <example correction="I" type="incorrect">Who do you think <marker>i</marker> am?</example>
                <example type="correct">Who do you think I am?</example>
            </rule>


<rule>    
                <pattern case_sensitive="yes">
                    <token regexp="yes">&quot;|“</token>
                    <and>
                        <token inflected="yes" regexp="yes">[A-Z][a-z]+</token>
                        <token regexp="yes">[a-z]+</token>
                    </and>
                    <marker>
                        <token postag="NN" regexp="yes">[a-z]+</token>
                    </marker>
                </pattern>
                <message>The word \3 should probably be written in uppercase: <suggestion><match case_conversion="startupper" no="3"></match></suggestion>.</message>
                <example correction="Monthly" type="incorrect">This is a story from “atlantic <marker>monthly</marker>”.</example>
                <example type="correct">This is a story from “The Atlantic Monthly”.</example>
            </rule>


<rule>    
                <pattern case_sensitive="yes">
                    <token postag="UNKNOWN" regexp="yes">[a-z]+</token>
                </pattern>
                <message>The word \1 should probably be uppercase:<suggestion><match case_conversion="startupper" no="1"></match></suggestion>.</message>
                <example correction="Sobol" type="incorrect">Eve <marker>sobol</marker>, Indiana</example>
                <example type="correct">Eve Sobol, Indiana</example>
            </rule>


<rule>    
                <pattern case_sensitive="yes">
                    <token postag="NNP" regexp="yes">[a-z]+</token>
                </pattern>
                <message>The word \1 should probably be uppercase: <suggestion><match case_conversion="startupper" no="1"></match></suggestion>.</message>
                <example correction="Susan" type="incorrect">This is <marker>susan</marker>.</example>
                <example type="correct">This is Susan.</example>
            </rule>

Thx.


(Daniel Naber) #2

The red underline indicates it's a spelling error, so it has nothing to do with grammar.xml. You can see at http://app.aspell.net/lookup?dict=en_US&words=i%0D%0Aam%0D%0Ajohn%0D%0Asusan%0D%0A that our dictionary (which we use as a component, maintained by someone else) accepts "john" as correct. I don't know why - you might want to submit a bug report at https://github.com/kevina/wordlist/issues


(John) #3

I can understand that john, tom, etc...may have other meanings than a name....so they are accepted....

But how to coin a rule to check the error, like "I am john", where john means a name...?

Is the listed rule below OK? Can you give me a right rule as a learning example, Thx so much for your time...

<rule>      
                <pattern case_sensitive="yes">
                    <and>
                        <token inflected="yes" regexp="yes">[A-Z][a-z]+</token>
                        <token regexp="yes">[a-z]+</token>
                    </and>
                </pattern>
                <message>The word \1 probably should be uppercase: <suggestion><match case_conversion="startupper" no="1"></match></suggestion>.</message>
                <example correction="Susan" type="incorrect">My name is <marker>susan</marker>.</example>
                <example type="correct">My name is Susan.</example>
            </rule>

(Daniel Naber) #4

LT doesn't know where John is a name and where it's not, so you have to specify the context in the rule. For example, "am/is/was", followed by a lowercase proper noun could be wrong:




<rule>     
        <pattern case_sensitive="yes">
            <token regexp="yes">am|is|was</token>
            <marker>
               <token postag="NNP" regexp="yes">[a-z]+</token>
            </marker>
        </pattern>
        <message>The word \2 probably should be uppercase: <suggestion><match case_conversion="startupper" no="2"></match></suggestion>.</message>
        <example correction="John" type="incorrect">My name is <marker>john</marker>.</example>
        <example type="correct">My name is Susan.</example>
    </rule>