[GSoC] Integral approach to Spanish

Hi all!

I am a last-year Computer Science student and I have been in touch with LanguageTool since the list of accepted organizations was published on the GSoC website. I have sent some pull requests, but then I had to do university work. Now I am designing my proposal, and I would like to know if I have taken the right track.

I was talking with Juan Martorell, because he is the Spanish maintainer, for knowing the state-of-the-art of that language. I think that he would appreciate some collaboration to improve the Spanish module, it is my mother tongue (even I finished Communication studies and got professional experience on it) and I have Java knowledge.

Over the last few weeks I have been thinking about some ideas to boost the Spanish language. I will give thanks for every advice, e.g., is an integral approach to the Spanish module enough as a GSoC proposal or would I have to add more features?

Regards,

david

Hi David, thanks for your interest in LT and GSoC.

It’s hard to tell - what exactly have you discussed with Juan? The main concern I have about a task not “being enough” is that GSoC is supposed to be about programming. So sitting down for three months writing rules in XML would probably not be enough from the GSoC point of view, even though it might move LT forward a lot. But when looking at the bug reports, or by just using LT, I think there should be enough other tasks than could complement rule writing.

Juan was very kind. He explained to me the roadmap for Spanish that you discussed on the SourceForge’s mailing list in the past. Also, he showed me this repository that he shared on the mailing list before.

A few weeks ago he committed an improved Spanish dictionary to the main repository of LanguageTool, so I have been studying the changes and thinking about how to follow his work. I will focus on it this weekend, so I hope to have a draft proposal as soon as possible. Then it will be nice to read any suggestions you have in mind.

Have a nice weekend!

Hi Daniel,

Please have a look to my draft proposal. Comments and suggestions are welcome.

:slight_smile:

Hi David, the proposal looks fine! Some ideas / suggestions, though:

  • “I will add support for different style guides and specific terminology.” - I think some planning for the UI is needed for that, especially considering that we have more than the stand-alone version.
  • To make it easier for the reviewers, maybe add links to the forum discussions that you started.
  • A more detailed roadmap that works on a weekly basis might be better. I know it’s not easy, but the planning is supposed to help, making the implementation easier, so it’s usually worth it to spend quite some time there.
  • “For the detection of new possible rules for grammar.xml and disambiguator.xml, I will use statistical taggers, trying to automate the process to the maximum. This could be extensible to other languages.” -> This could be more detailed. Are you planning to use a new tagger in LT, or only as an external tool during development?

Regards
Daniel

Thanks for your feedback, Daniel. I will work on it.

Regards,

david

Hallo!

I have been improving my proposal, taking into account your ideas and suggestions.

Please find the almost-finished proposal following the same link. Any feedback would be greatly appreciated!

Regards,

david

Thanks. Haven’t re-check in detail, but you’re aware that you need to submit this at https://summerofcode.withgoogle.com, aren’t you? The deadline is tomorrow, and if you miss it, Google will not accept you. I suggest you submit it now.

Thanks! Yes, I am aware of the deadline. I will submit the proposal today.

:grinning: