Chinese part development daily record

Hi, I am Ze Dang, a sophomore from China.
I would like to do development in the Chinese part. I am going to work on my proposal outside of GSoC and will post updates here to the forum.
p.s. My English is not very well. I will be happy if you can point my mistakes in my text.)

4/26/2018

  • Complete all the translation work in WebTranslateIt.
  • Fork the repository and download Apache Maven.
  • Since I have no experience in Maven, I start to read the Maven docs to learn about it.
  • Build resources by following the instructions.

Thanks for the updates so far! Of course, you can also use this thread to ask questions if there are issues (e.g. with Maven).

4/27/2018
I watched a series of videos to get started with Maven today. It took me nearly six hours today!:joy: But I think it really makes sense.
Here are my learning results.

  • I understand what Maven is and why we should use it.
  • I get the conceptions of built and dependencies.
  • I make it clear that how to write a simple pom.xml and how it can find the repositories.
  • Some commonly used commands and what the life cycle is.

However, the videos show the examples by Ellipse. I think the next step I need to do is to learn how to use it in Intellij IDEA with our project and get familiar with project structure.

That’s probably not even needed. For example, I use IntelliJ but do all Maven-related work on the command line. In the end, it’s just the same small set of commands anyway.

Thanks for advice!

minor nitpick: “[de]” stands for “Deutsch” not “development” as you seem to assume.

Thank you! I have modified it.

5/2/2018

  • Read codes.

I add a new dependency in language-module/zh/pom.xml. Then I typed mvn compile but it failed. Can anyone help me?

You need to call mvn install -DskipTests once in the top-level folder of the LT source code. After that, your command should work.

Thanks.
The maven command works successfully now. And there is another thing confusing me that I’d like to write tests first usually but it failed to run the tests in src/test by clicking the green triangle button. I googled the problem but I didn’t find a good solution. Can you help me?



How exactly does the test code look like? You can usually also run it selecting the “Run …test” item in the context menu, when you click the test code.

Thanks for reply. It works now.

5/3/2018

  • add new sentenceTokenizer and write tests for it. You can see my commit here.

5/4/2018

  • Add new word tokenizer and write tests.

8/5/2018

  • Add new tag tokenizer.

Qusetion:
Is it fine that I put 1GB of dictionary and model data provided by the hanlp in the resource folder?

github doesn’t like such big files I think. When I last checked, 1GB was the total limit for the whole repo. So you should find another place for the files.

The language-module I write will use these data to check errors. So I need to find another way to upload these data. Is that right?

There is one more thing. Since I have finished refactoring the sentence tokenizer, word tokenizer and tagger. I can start working on zh\src\main\java\org\languagetool\language\Chinese.java. In my opinion, I want to split Chinese.java into SimplifiedChinese.java and TraditionalChinese.java.

For now, yes. Once everything is more stable (i.e. doesn’t often change), we can host the data on languagetool.org.

No problem. I guess Chinese should not be deleted so SimplifiedChinese and TraditionalChinese can extend it?

Yes, I think so. Now I’m trying to make them extend Chinese.java.