<match no="10"/> and \10 don't work

Bokomaru · October 24, 2016, 4:37pm

Hi all,
I have noticed for a while that you cannot match token number 10, either with or \10?

one
two
three
four
five
six
seven
eight
nine
ten
eleven
twelve

This is the tenth token: .

The result is this:
This is the tenth token: One0.

Is anyone aware of this? Can this be fixed or is this cause by a limitation of Java?
Best wishes,
Nick

Knorr · October 24, 2016, 5:42pm

Hi, Nick!
Right now, LanguageTool supports only one-digit matches. Hence, the maximum match number is 9. This is no limitation of Java. I was not aware of this (until I looked in the code).
Is this a real requirement for you. I can hardly think of cases where “match > 9” is required.

Bye, knorr

Bokomaru · October 24, 2016, 10:09pm

Hello Knorr,
Thanks for the reply. In fact, match no= 11 and \11, and tokens greater than 11 all work. It’s only \10 that’s a problem.

Yes, rules for sentences with more than ten tokens are very useful for catching tense shift errors and run-on sentences where two sentences are joined with a coordinator but no punctuation. For example:

Helen and I woke up early this morning but I don’t have time for breakfast.

This sentence has 17 tokens includiing SENT_START. To indicate that the comma should come before the coordinator, you might want to include \9, \10 \11… in the suggestion.

To correct the tense shift error, you need to substitute “didn” for \12.

If there is a better way to reference these tokens, please let me know.

Best wishes,
Nick

Knorr · October 25, 2016, 7:46pm

Hi Nick,

I have found the problem and it is clearly a bug. My first evaluation was wrong; Yesterday I have checked the parsing of the XML file and the conversion of its content into rules. Everything is fine here (although i first thought that this was the problem, because is converted into \1x).
The problem is the actual interpretation of \1x: Here we check whether x is a positive number. In this case everything is fine. As x is not interpreted as positive, \10, \100, \20 will not work…

I’ll write fix within the next few days and let you know.

Kind regards,
Knorr

Bokomaru · October 25, 2016, 8:00pm

Thanks Knorr! I appreciate the time and effort you have put into this, and I will watch this thread for news.
–Nick

Knorr · October 30, 2016, 9:29pm

Hi Nick,

I have committed a bug-fix regarding the behavior explained above.

https://github.com/languagetool-org/languagetool/commit/adb5b8ee906a1c297cb39602f26ca0022bffc046

The next version of LT will handle “\10” / no=“10” correctly.

Bye!