[GSoC] Prototype - Chinese spelling checker

t0iiz · March 14, 2018, 1:50pm

Hi,
I have finished a simple prototype of a detection module based on the idea ProposalForGsoc2018.

After comparing the module with the LT’s checker by a test set including 835 sentences each has at least an error, 895 errors are discovered by our method, and LT’s checker can only find 80 errors.

Can anyone tell me what I should do next?

dnaber · March 14, 2018, 2:05pm

That sounds great! Can you post some examples? Where did you take the examples from? Does your method cause false alarms on the test set, i.e. find “errors” which are not actually errors?

dnaber · March 14, 2018, 2:30pm

Another comment about the proposal: even though it’s quite detailed already, I think it would be great to have more planning about evaluation:

are there other sources of test data than SIGHAN? e.g. from native speakers?
how do you make sure no false positives are introduced?
can you make sure you don’t blindly optimize for the test set? (e.g. by holding back a part of the test set and only using it at the very end)

When you mention n-gram data, do you refer to the Google data set or will new data be needed?

t0iiz · March 14, 2018, 2:30pm

They are from here.

For example

[老:lao/true, 板:ban/true, 告:gao/true, 诉:su/true, 王:wang/true, 大:da/true, 华:hua/true, 帮:bang/true, 你:ni/true, 点:dian/true, 蔡:cai/false, 。:。/true]
[恭:gong/true, 喜:xi/true, 妳:ni/true, 们:men/true, 永:yong/true, 远:yuan/true, 幸:xing/true, 福:fu/true, ，:，/true, 早:zao/true, 生:sheng/true, 孩:hai/true, 子:zi/true, ，:，/true, 恭:gong/true, 喜:xi/true, 恭:gong/true, 喜:xi/true, 。:。/true]
[但:dan/true, 是:shi/true, 我:wo/true, 不:bu/true, 能:neng/true, 去:qu/true, 参:can/true, 加:jia/true, ，:，/true, 因:yin/true, 为:wei/true, 我:wo/true, 有:you/true, 一:yi/true, 点:dian/true, 事:shi/true, 情:qing/true, 阿:a/true, ！:！/true]
[敬:jing/false, 祝:zhu/false, 身:shen/true, 体:ti/true, 建:jian/false, 慷:kang/false, 。:。/true]
[你:ni/true, 的:de/true, 家:jia/true, 人:ren/true, 的:de/true, 生:sheng/false, 体:ti/false, 好:hao/true, ？:？/true]
[出:chu/false, 你:ni/true, ：:：/true, 万:wan/true, 事:shi/true, 如:ru/true, 意:yi/true, ，:，/true, 身:shen/true, 体:ti/true, 健:jian/true, 康:kang/true, 。:。/true]

The detection module may cause false alarm. To make it avoid this situation, the system need a pre-treatment by a pattern matcher which we have talked about in another thread.

t0iiz · March 14, 2018, 2:42pm

Of course, I promise SIGHAN-13 also provided a set of test data from native speakers.

And the n-gram data is from HanLP collected from People’s Daily in 2014. We can also extend the data ourself.

t0iiz · March 15, 2018, 7:03am

I think it is not terrible if our system make a false alarm. After all, computers always make mistakes, it is feasible as long as their frequency of making mistakes can be controlled to a very low rate. And there is a metric which is widely used in evaluating systems for detecting things. We can measure our system with these values.

Snipaste_2018-03-15_14-20-53

Precision,Recall and F-measure

Precision measures the percentage of the items that the system detected(i.e., the system labeled as positive) that are in fact positive.
Recall measures the percentage of items actually present in the input that were correctly identified by the system.
F-measure is a combination of precision and recall.

We should improve the recall if we want to reduce the possibility of false alarm. I have come up with a good idea to solve the problem that means the system will not exclusively rely on a pattern matcher to deal with the false alarm problem.

And I will continue testing my detection module. Once the result come out, I’ll inform you here if you don’t mind. After that, maybe I need to revise my proposal to keep everything fit in to 3 months.

dnaber · March 15, 2018, 9:06am

Actually, the precision needs to be improved for that. For the n-gram approach, we strive for a precision of 0.99 or better, often accepting a recall of 0.5 or even worse. (While 0.99 sounds like a lot, it’s per pair, so the false alarms add up with the n-gram approach.)

t0iiz · March 17, 2018, 1:46pm

Hi, I have finished writing the detection prototype and evaluating the results. Here is my demo. I used 2 sets to evaluate it.

The system hits 772 out of 1105 errors in 835 sentences in the first test set and 1034 out of 2035 errors in 1501 sentences in the second one. And I also caught a glimpse at what the system missed. I think by collecting more words in a larger corpus and setting a series of grammar rules, it will be more precise.

t0iiz · March 19, 2018, 1:28pm

I have already submit my proposal and prototype. But nobody gave me any feedback…

dnaber · March 19, 2018, 8:30pm

Hardly anybody from the community speaks Chinese, so it’s difficult to comment. I think your proposal is viable.