Back to LanguageTool Homepage - Privacy - Imprint

<match no="10"/> and \10 don't work


(Nicholas Walker) #1

Hi all,
I have noticed for a while that you cannot match token number 10, either with or \10?

<rule><!-- Try this sentence: One two three four five six seven eight nine ten eleven twelve.-->
 <pattern>
			<token>one</token>
			 <token>two</token> 
			 <token>three</token> 
			 <token>four</token> 
			 <token>five</token> 
			 <token>six</token> 
			 <token>seven</token> 
			 <token>eight</token> 
			 <token>nine</token> 
			<marker> <token>ten</token> </marker>
			 <token>eleven</token> 
			 <token>twelve</token> 
  </pattern>
  <message>This is the tenth token: <suggestion><match no="10"/></suggestion>.</message>
  </rule>

The result is this:
This is the tenth token: One0.

Is anyone aware of this? Can this be fixed or is this cause by a limitation of Java?
Best wishes,
Nick


(Knorr) #2

Hi, Nick!
Right now, LanguageTool supports only one-digit matches. Hence, the maximum match number is 9. This is no limitation of Java. I was not aware of this (until I looked in the code).
Is this a real requirement for you. I can hardly think of cases where "match > 9" is required.

Bye, knorr


(Nicholas Walker) #3

Hello Knorr,
Thanks for the reply. In fact, match no= 11 and \11, and tokens greater than 11 all work. It's only \10 that's a problem.

Yes, rules for sentences with more than ten tokens are very useful for catching tense shift errors and run-on sentences where two sentences are joined with a coordinator but no punctuation. For example:

Helen and I woke up early this morning but I don't have time for breakfast.

This sentence has 17 tokens includiing SENT_START. To indicate that the comma should come before the coordinator, you might want to include \9, \10 \11... in the suggestion.

To correct the tense shift error, you need to substitute "didn" for \12.

If there is a better way to reference these tokens, please let me know.

Best wishes,
Nick


(Knorr) #4

Hi Nick,

I have found the problem and it is clearly a bug. My first evaluation was wrong; Yesterday I have checked the parsing of the XML file and the conversion of its content into rules. Everything is fine here (although i first thought that this was the problem, because is converted into \1x).
The problem is the actual interpretation of \1x: Here we check whether x is a positive number. In this case everything is fine. As x is not interpreted as positive, \10, \100, \20 will not work...

I'll write fix within the next few days and let you know.

Kind regards,
Knorr


(Nicholas Walker) #5

Thanks Knorr! I appreciate the time and effort you have put into this, and I will watch this thread for news.
--Nick


(Knorr) #6

Hi Nick,

I have committed a bug-fix regarding the behavior explained above.

The next version of LT will handle "\10" / no="10" correctly.

Bye!