[zh] Part of Speech

handwritten · April 12, 2018, 7:57pm

Hi, I speak Chinese and I wanted to create a new rule, but I can’t seem to create rules based on parts of speech like other languages. English, for example, has CC and CD for different parts of speech. Can I use these for Chinese?

dnaber · April 12, 2018, 8:14pm

Hi, thanks for interest in LanguageTool! Unfortunately, support for Chinese is not maintained in LanguageTool, and the tags are not properly documented. But you can use this tool to analyze text and thus see its tags: Text Analysis - LanguageTool

handwritten · April 12, 2018, 8:30pm

Thanks for the reply. What does it mean when the rule editor says "The rule did not find the expected error”? Does it mean I can’t add the rule?

dnaber · April 12, 2018, 8:41pm

It means the pattern did not match the example sentence, i.e. there’s some problem with the pattern. Maybe it was too strict? If that doesn’t help, you can post the rule here and we’ll try to help if we can.

handwritten · April 12, 2018, 8:56pm

I got it resolved. Finally wrote my first rule:

<!-- Chinese rule, 2018-04-13 -->
<rule id="" name="不存在有/没有">
 <pattern>
  <token>不</token>
  <token>存在</token>
  <token regexp='yes'>有?</token>
  <token postag='v'></token>
 </pattern>
 <message>使用<suggestion>没有</suggestion>比不存在有更简洁。</message>
 <example correction=''><marker>不存在有隐瞒</marker>。</example>
 <example>没有隐瞒。</example>
</rule>

How do I get it submitted?

dnaber · April 13, 2018, 7:17am

Thanks! We need two more things, then I can add it: Could you set the ID for that rule (using only these characters: A-Z, _) and could you tell me which category it best fits in? Chinese currently has these categories:

词语错误, 成语错误, 词法-实词, 词法-虚词, 句法

Also, you’ve probably seen the message We've checked your pattern (...) and found the following matches. Please consider modifying your rule if these matches are false alarms. Have you checked that the match it finds is a valid match and not a false alarm?

handwritten · April 13, 2018, 10:39am

<!-- Chinese rule, 2018-04-13 -->
<rule id="NOT_EXIST_NO" name="不存在有/没有" type="style">
 <pattern>
  <token>不</token>
  <token>存在</token>
  <token regexp='yes'>有?</token>
  <token postag='v'></token>
 </pattern>
 <message>“不存在有”为欧化中文，您可以使用<suggestion>没有</suggestion>。</message>
 <example correction="不存在"><marker>不存在有隐瞒</marker>。</example>
 <example>没有隐瞒。</example>
</rule>

Fixed the rule. The rule should go to “句法”. I have also checked the pattern matches and they are indeed errors.

dnaber · April 13, 2018, 3:52pm

For me, the test says Found wrong correction(s) in sentence '不存在有隐瞒。': '[没有]' but expected '[不存在]' - could you check that?

handwritten · April 13, 2018, 4:20pm

Please check if this works:

<!-- Chinese rule, 2018-04-13 -->
<rule id="NOT_EXIST_NO" name="不存在有/没有" type="style">
 <pattern>
  <token>不</token>
  <token>存在</token>
  <token regexp='yes'>有?</token>
  <token postag='v'></token>
 </pattern>
 <message>“不存在有”为欧化中文，您可以使用“<suggestion>没有</suggestion>”。</message>
 <example correction="没有"><marker>不存在有</marker>隐瞒。</example>
</rule>

dnaber · April 13, 2018, 4:52pm

I get a different error now. You can test it yourself at https://community.languagetool.org/ruleEditor/expert:

handwritten · April 13, 2018, 5:15pm

This passed the test (I also added another example):

<rule id="NOT_EXIST_NO" name="不存在有/没有" type="style">
     <pattern>
      <marker>
       <token>不</token>
       <token>存在</token>
       <token regexp="yes">(有|任何)?</token>
      </marker>
      <token postag='v'></token>
     </pattern>
     <message>“<match no="1"/><match no="2"/><match no="3"/>”为欧化中文，您可以使用<suggestion>没有</suggestion>。</message>
     <example correction="没有">医生<marker>不存在有</marker>误解病人的病历。</example>
     <example correction="没有">政府报告<marker>不存在任何</marker>隐瞒。</example>
    </rule>

dnaber · April 13, 2018, 5:29pm

And the matches (now 4) are also real matches, not false alarms, is that correct?

handwritten · April 13, 2018, 5:32pm

Not false alarms.

dnaber · April 13, 2018, 5:41pm

Thanks - I’ve just added the rule, it will go online at https://languagetool.org today at about 22:30 CEST.