[ZH] Chinese Collective Noun 中文量词 Problems

zhaoyuan · June 24, 2020, 10:38am

There are several problems about Chinese collective nouns: 1. cannot distinguish some collective nouns with the same pronunciations; 2. cannot detect errors about some collective nouns for describing plural objects; 3. cannot detect the inappropriate collective nouns for depicting natural landscapes.

I made several corrections as below in XML forms and hopefully they are helpful in improving the quality of LT in Chinese language error detection.

<!-- Chinese rule, 2020-06-24 -->
<rule id="" name="颗与棵">
 <pattern>
  <marker>
  <token>棵</token>
  <token>糖</token>
  </marker>
 </pattern>
 <message>量词<suggestion>棵</suggestion>不能修饰<suggestion>糖</suggestion></message>
 <example correction=''>一<marker>棵糖</marker></example>
 <example>一颗糖</example>
</rule>


<!-- Chinese rule, 2020-06-24 -->
<rule id="" name="只与支">
 <pattern>
  <token>只</token>
  <token regexp='yes'>笔|铅笔|钢笔|水笔|毛笔|蜡笔|水彩笔</token>
 </pattern>
 <message>量词“只”不能修饰“笔”</message>
 <example correction=''>一<marker>只笔</marker></example>
 <example>一支笔</example>
</rule>


<!-- Chinese rule, 2020-06-24 -->
<rule id="" name="批与群">
 <pattern>
  <token>批</token>
  <token regexp='yes'>羊|山羊|绵羊|公羊|牧羊|母羊|牡羊</token>
 </pattern>
 <message>量词“批”不能修饰“羊”</message>
 <example correction=''>一<marker>批羊</marker></example>
 <example>一群羊</example>
</rule>


<!-- Chinese rule, 2020-06-24 -->
<rule id="" name="堆与群">
 <pattern>
  <token>堆</token>
  <token regexp='yes'>人|男人|女人|中国人|外国人|美国人|英国人|法国人</token>
 </pattern>
 <message>量词“堆”不能形容“人”</message>
 <example correction=''>一<marker>堆人</marker></example>
 <example>一群人</example>
</rule>