Newcomer questions

BPL · May 16, 2018, 10:30am

Hello everyone, this is the first time I’m using Languagetool as well as this forum and I’ve got some little questions \o

I’m not a native English speaker but I’d like to use this tool mainly to learn more about my grammar mistakes. Thing is, I have noticed the tool isn’t detecting basic grammar errors, like the correct usage of prepositions… So I’d like to know what to expect from it and what not.

@dnaber has explained me at github there is this site where I can check all the rules so my questions would be:

Is that site showing the rules of the latest github version or the ones living in the stable website version? For instance, I’ve downloaded languagetool v4.1 from the website but then I’ve just built myself the latest version from github, 4.2-snapshot and I’ve noticed this last one has few more rules, so…
What limitations should I expect from the software… for instance, is there any constraint when it comes to grammar error detection? I’ve noticed the software isn’t detecting bad usage of prepositions (not in 4.1), is that because missing rules? rules not supporting that? something else?

Anyway, got more questions but I don’t want to make a very long thread here… just saying to all devs/contributors thank you very much you guys have created an open source tool like this one, very promising and exciting ;D

dnaber · May 16, 2018, 12:34pm

It’s showing the latest github version.

Basically yes, there’s no rule for that so the error is not found. There are other ways to detect errors, e.g. by writing Java code or using statistics. We’re always looking for contributions, so let us know if you want to give it a try.

BPL · May 16, 2018, 12:53pm

Sure thing, unfortunately right now I’ve got enough on my plate therefore very little spare time to dedicate to yet another hobby coding projects but sure, in the future I’d be willing to contribute as much as possible, although first I need to learn more about what the tool is offering & lacking right now.

Anyway… let’s say we want to add bad usage detection of the prepositions (in, on, at), what’d be the right way to do so? As a Spanish guy, I’d say this is one of the main source of errors when using English myself. In Spanish basically we use “en” for everything instead {“in”, “on”, “at”} for a lot of different particular cases like in English Couple of more questions:

How is this option affecting to the software behaviour?
What is the feature this subpackage https://github.com/gulp21/languagetool-neural-network would add to languagetool? If improving the grammar checking somehow it’d be interesting to give it a shot on windows, as I’m c++/python developer so I don’t think it’d take me a lot of effort to make it work
I’ve compiled the software using cygwin/maven/jdk as I’ve seen you guys are using bash scripts, is it possible to build&run the thing using just the windows command prompt? (last time i’ve coded in java was 10 years ago :))

dnaber · May 16, 2018, 4:05pm

It turns on false friend rules.

It detects error similar to the ngram approach, but with a different technical approach.

I guess so, “mvn package” is usually all that’s needed to build, including running the tests.

BPL · May 17, 2018, 5:35pm

@dnaber So, let’s say I’d like to implement a rule/s to detect preposition errors, what are the steps to do so?

Imagine I’ve made some research such as looking on websites like https://www.englishgrammar.org/mistakes-prepositions/ which provides a very little subset of correct/incorrect pairs of examples:

Incorrect: The ball rolled slowly in the goal.
Correct: The ball rolled slowly into the goal.

Incorrect: She ran in the room crying.
Correct: She ran into the room crying.

Incorrect: The train will arrive within five minutes.
Correct: The train will arrive in five minutes.

Incorrect: If you don’t live by your income, you will incur huge debts.
Correct: If you don’t live within your income, you will incur huge debts.

Incorrect: The ball went to the window ad fell on the ground.
Correct: The ball went through the window and fell on the ground.

Incorrect: He wrote the book in a month’s time.
Correct: He wrote the book in a month.

Incorrect: We usually go and see Granny on Sunday.
Correct: We usually go and see Granny on Sundays.

Incorrect: I don’t care for your opinion.
Correct: I don’t care about your opinion.

Now what? the above subset wouldn’t be representative at all to make any generalization of the rule/s… so, could you please briefly explain how do you implement complex cases like this one?

Thanks.

SkyCharger001 · May 17, 2018, 9:14pm

The examples you give contain sentences that can be contextually correct:
example 1: ‘incorrect’ could mean that the ball was already in the goal.
example 2: practically the same issue.
example 3: ‘correct’ is specific time, ‘incorrect’ is maximum time. (the latter gives more leeway when it comes to pacing.)
example 6: ‘correct’ is gross-time, ‘incorrect’ can mean net-time (EG: 1 hour a day, would make it 24 months in total)

BPL · May 18, 2018, 11:11am

I see, makes sense, I guess my little subset of examples was really bad to create something usable out of it, I’ll try to bring up a better set of examples next time. So let me see if I understand correctly, if a sentence is used incorrectly in the context but there are no strict rules that detect correctness, that would be out of scope from LanguageTool analysis, right?

Guess the main question here is, what can I expect or not expect to be detected LanguageTool right now?

NS: When I said strict rule I meant something like Number 1; "in June"; we've seen already; and this shows us that we can use in for months. So, "in June", "in July", "in August" and so on. , taken from http://www.englishgrammarexpress.com/grammar/in-on-at-time

tiagosantos · May 18, 2018, 11:26am

@BPL
Don’t worry much with borderline exceptions while creating rules.
Once you make the rules and publish them here, we can verify if they produce too many false positives in the daily regression tests.
If they produce a silly amount of false positives, we can add antipatterns, which correspond to correct usage patterns, or add token exceptions.
“Throwing the baby out with the bathwater”, is only what people that do not want to see the project develop desire.

BPL · May 18, 2018, 11:33am

@tiagosantos Awesome, that sounds like a fantastic idea. So I guess the next logical step would be then learning more about rules (what are they, how to create them, …). It seems like a fantastic way to improve this amazing software and at the same time improving my current creepy English

tiagosantos · May 18, 2018, 1:15pm

Great!
Play with the rule editor for a while, then read:
http://wiki.languagetool.org/development-overview
If you want to check if LanguageTool is able to do an advanced feature, try:
http://wiki.languagetool.org/tips-and-tricks