What does the XML tag "exception" mean?

John_K_Lee · July 23, 2013, 2:47am

I checked the grammar xml and found tag “exception” which I couldn’t understand. For example:

并非|而非
北美|南美|欧|亚|非|大洋|南极|大|沙

州

How should I explain the usage/meaning of “并非|而非”?

Thanks in advance.

Dominique_PELLE · July 23, 2013, 4:14am

I don’t understand this particular exception either. It seems that there is no way for this particular exception to match. So it seems useless. Perhaps a skip="…" is missing, or a scope=“previous” is missing in the token?

I don’t speak Chinese so I can’t tell what was the intention of the exception here.

Glancing at languagetool-language-modules/zh/src/main/resources/org/languagetool/rules/zh/grammar.xml, I see several other exceptions that look useless. For example in in rule DA_TAI, I see:

概概要

When I find time (next weekend ?), I will see if we can detect automatically such useless exceptions when running the ‘mvn test’.

John_K_Lee · July 23, 2013, 4:31am

Many thanks!

John_K_Lee · July 30, 2013, 3:41am

By the way

Perhaps a skip="…" is missing, or a scope=“previous” is missing in the token?

What does “scope” attribute mean?

I see one rule has below content:

书籍|岁月|莘莘学子|人群|车辆

For this token, how to explain element in this example?

Thanks in advance!

Dominique_PELLE · July 30, 2013, 5:55am

The rule that you give as example, will match if a token is:

“书籍” or “岁月” or “莘莘学子” (etc.)
and that token has postag=“n”
except if the token just before it (i.e. previous) has postag=“v” or postag=“p”.

It would be nice by the way if the chinese XML rule had examples to test that the exception works as expected (it does not have such example at the moment). Not only it would help to understand the rule, but it would also help for testing, especially since in your original message in this thread, you found exceptions that are useless, so additional tests would have found those errors.

The scope=“previous” attribute is further explained here:

http://www.languagetool.org/development/#xmlrules

Have a look at this simple example (rule “I_A”) in the English rules:

        <pattern>
            <token><exception scope="previous">am</exception>I</token>
            <marker>
                <token regexp="yes">a|an</token>
            </marker>
        </pattern>

It finds an error in I a not sure as well as in I an not sure.
But thanks to the exception, it will not signal an error in the correct sentence: “am I an atheist”.
In this example, it’s almost equivalent to:

        <pattern>
             <token negate="yes">am</token>
            <token>I</token>
            <marker>
                <token regexp="yes">a|an</token>
            </marker>
        </pattern>

But it’s not completely equivalent, because with the exception, the rule will match even if “I” is the first word of the sentence. Whereas when using the alternative am, LanguageTool exepects a token before I (which must not be “am”), so it won’t match if “I” is the first word of the sentence.

John_K_Lee · July 30, 2013, 6:27am

Very clear!

Thanks for your great help.