Error not detected in Chinese sentence

Hi,
I found several sentences which had the same type of error were not detected in LT.

For example1:

Wrong Sentence:

北京是一个好季节。

“北京” which is a place is the subject of the sentence. “季节” which means “season” is the object of the sentence.

The word “北京” is inappropriate to that word"季节"

Correct Sentence:

北京是一个好地方。

For example2:

Wrong Sentence:

夏天真是好天气啊。

“夏天” which is a season is the subject of the sentence. “天气” which means “weather” is the object of the sentence.

Correct Sentence:

夏天真是好季节啊。

These above show a classic type of error -“Words mismatches” in Chinese sentence.

I am a student who wants to participate in GSoC 2018. I think doing some classification in the Chinese words or adding some word-matching rules may improve the performance and accuracy in LT for Chinese Users.

Please give me some advice on how to solve the problem above?

thanks.

It’s hard to tell for me what the best approach is when not speaking Chinese. I think you should try to understand the different ways in which LT can detect errors and then see what works best:

Thank you for your reply.

I have read the material above and talked about the problem I invoked with my university classmate.

First, we think the sentences I said above are just “Born for wrong”. Neither a book nor a Chinese person will write something like that. But there actually exist some mismatches between verbs and objects when we use the verbs. Fortunately, I have seen some Rule implementation written down to solve this problem :smile:

Second, I find the spelling detection in Chinese is not as useful as it in English. I think we can improve the feature of lexical analysis in Chinese sentences then using the synonyms association technique to make the spelling detection better.

For example,
Wrong Sentences: Hanzi / Pinyin romanization
我 每天 睡交。/ Wo Meitian Shuijiao.
我 每天 跳无。/ Wo Meitian Tiaowu.
Correct Sentences: Hanzi / Pinyin romanization
我 每天 睡觉。/ Wo Meitian Shuijiao. ( I sleep everyday )
我 每天 跳舞。/ Wo Meitian Tiaowu. ( I dance everyday )

It’s a good idea to focus on “real” errors that people actually make. We try to do that for English and German, too. However, I guess quite a number of people also learn Chinese as a second/third language, we shouldn’t completely ignore them.

Sounds like a good plan. For GSoC, a more detailed plan will be needed, but you still have time for that (see complete timeline).

So, can I create a new thread to talk about the details of my idea?

Or should I find a mentor to discuss with that?

Thank you!

Feel free to create a new thread. We’re here to help you with technical questions. As Chinese is not maintained, helping with questions that require knowledge of Chinese will be tricky for most of us, maybe @user.sg can help a bit with that? If you can also find a local mentor (formally not part of GSoC), that’s great.

Hi guys,

Forgive me for the delayed response. My Chinese resource will be available for linguistic input as required. Right now she’s on mat leave, but should be back by the summer.

S

I guess a translation engine with a reasonable level of language ability would solve this problem… :grinning: