Strange bug in English

John · September 26, 2014, 6:43am

Hi,

I am testing the LT standalone java application…but come across simple but strange bugs.

Tested texts:

i am john.
i am susan.

The results (ignore the Chinese characters):

行 1, 列 1
错误信息： This sentence does not start with an uppercase letter (使无效)
校正： I
错误字段： i am john. i am susan.
行 1, 列 1
错误信息： Did you mean I? (使无效)
校正： I
错误字段： i am john. i am susan.
行 2, 列 1
错误信息： This sentence does not start with an uppercase letter (使无效)
校正： I
错误字段： i am john. i am susan.
行 2, 列 1
错误信息： Did you mean I? (使无效)
校正： I
错误字段： i am john. i am susan.
行 2, 列 6
错误信息： Possible spelling mistake found (使无效)
校正： Susan
错误字段： i am john. i am susan.

The problem is why john cannot be checked out, while susan can be ?

Based on the grammar.xml, this problem, looks like, does not make sense…as the first rule is to check the error:

<rulegroup default="off" id="EN_CAPITALIZE" name="Capitalize lowercase words ('i am')">    
            <rule> 
                <pattern case_sensitive="yes">
                    <and>
                        <token inflected="yes" regexp="yes">[A-Z][a-z]+</token>
                        <token regexp="yes">[a-z]+</token>
                    </and>
                </pattern>
                <message>The word \1 probably should be uppercase: <suggestion><match case_conversion="startupper" no="1"></match></suggestion>.</message>
                <example correction="Susan" type="incorrect">My name is <marker>susan</marker>.</example>
                <example type="correct">My name is Susan.</example>
            </rule>

        
<rule>    
                <pattern case_sensitive="yes">
                    <token>i</token>
                </pattern>
                <message>This should be written in uppercase: <suggestion>I</suggestion>.</message>
                <example correction="I" type="incorrect">Who do you think <marker>i</marker> am?</example>
                <example type="correct">Who do you think I am?</example>
            </rule>

        
<rule>    
                <pattern case_sensitive="yes">
                    <token regexp="yes">&quot;|“</token>
                    <and>
                        <token inflected="yes" regexp="yes">[A-Z][a-z]+</token>
                        <token regexp="yes">[a-z]+</token>
                    </and>
                    <marker>
                        <token postag="NN" regexp="yes">[a-z]+</token>
                    </marker>
                </pattern>
                <message>The word \3 should probably be written in uppercase: <suggestion><match case_conversion="startupper" no="3"></match></suggestion>.</message>
                <example correction="Monthly" type="incorrect">This is a story from “atlantic <marker>monthly</marker>”.</example>
                <example type="correct">This is a story from “The Atlantic Monthly”.</example>
            </rule>

        
<rule>    
                <pattern case_sensitive="yes">
                    <token postag="UNKNOWN" regexp="yes">[a-z]+</token>
                </pattern>
                <message>The word \1 should probably be uppercase:<suggestion><match case_conversion="startupper" no="1"></match></suggestion>.</message>
                <example correction="Sobol" type="incorrect">Eve <marker>sobol</marker>, Indiana</example>
                <example type="correct">Eve Sobol, Indiana</example>
            </rule>

        
<rule>    
                <pattern case_sensitive="yes">
                    <token postag="NNP" regexp="yes">[a-z]+</token>
                </pattern>
                <message>The word \1 should probably be uppercase: <suggestion><match case_conversion="startupper" no="1"></match></suggestion>.</message>
                <example correction="Susan" type="incorrect">This is <marker>susan</marker>.</example>
                <example type="correct">This is Susan.</example>
            </rule>

Thx.

dnaber · September 26, 2014, 7:26am

The red underline indicates it’s a spelling error, so it has nothing to do with grammar.xml. You can see at English Speller Word Lookup that our dictionary (which we use as a component, maintained by someone else) accepts “john” as correct. I don’t know why - you might want to submit a bug report at Issues · en-wl/wordlist · GitHub

John · September 26, 2014, 11:50am

I can understand that john, tom, etc…may have other meanings than a name…so they are accepted…

But how to coin a rule to check the error, like “I am john”, where john means a name…?

Is the listed rule below OK? Can you give me a right rule as a learning example, Thx so much for your time…

<rule>      
                <pattern case_sensitive="yes">
                    <and>
                        <token inflected="yes" regexp="yes">[A-Z][a-z]+</token>
                        <token regexp="yes">[a-z]+</token>
                    </and>
                </pattern>
                <message>The word \1 probably should be uppercase: <suggestion><match case_conversion="startupper" no="1"></match></suggestion>.</message>
                <example correction="Susan" type="incorrect">My name is <marker>susan</marker>.</example>
                <example type="correct">My name is Susan.</example>
            </rule>

dnaber · September 26, 2014, 1:21pm

LT doesn’t know where John is a name and where it’s not, so you have to specify the context in the rule. For example, “am/is/was”, followed by a lowercase proper noun could be wrong:

<rule>     
        <pattern case_sensitive="yes">
            <token regexp="yes">am|is|was</token>
            <marker>
               <token postag="NNP" regexp="yes">[a-z]+</token>
            </marker>
        </pattern>
        <message>The word \2 probably should be uppercase: <suggestion><match case_conversion="startupper" no="2"></match></suggestion>.</message>
        <example correction="John" type="incorrect">My name is <marker>john</marker>.</example>
        <example type="correct">My name is Susan.</example>
    </rule>