Back to LanguageTool Homepage - Privacy - Imprint

Regular expression '(?-i)' and testrules in LT 4.3


(Mike Unwalla) #1

In LT 4.2, testrules gives no messages for these two rules:

<rule id="CASE_SENSITIVE_TEST1" name="Case-sensitive test1">
  <pattern>
  <token regexp="yes">(?-i)Bill</token>
  </pattern>
      <message>Found '\1'</message>
      <short>Case-sensitive test1</short>
      <example type="correct">Tell <marker>bill</marker> about the error.</example>
      <example type="incorrect">Did you see <marker>Bill</marker> yesterday?</example>
      <example type="correct">DID YOU SEE <marker>BILL</marker> LAST WEEK?</example>
</rule>

<rule id="CASE_SENSITIVE_TEST2" name="Case-sensitive test2">
  <pattern>
  <token regexp="yes">(?-i)(Bill|BILL)</token>
  </pattern>
      <message>Found '\1'</message>
      <short>Case-sensitive test2</short>
      <example type="correct">Tell <marker>bill</marker> about the error.</example>
      <example type="incorrect">Did you see <marker>Bill</marker> yesterday?</example>
      <example type="incorrect">DID YOU SEE <marker>BILL</marker> LAST WEEK?</example>
</rule>

In LT4.3 snapshot 2018-07-23, testrules gives this message for rule CASE_SENSITIVE_TEST2:

Running pattern rule tests for English... The English rule: CASE_SENSITIVE_TEST2[1], token [1], contains duplicated non
case sensitive disjunction part (BILL) within "(?-i)(Bill|BILL)". Did you forget case_sensitive="yes"?

Disambiguation rules show a similar change.

I think that the testrules message is not correct, because with the following regexp, testrules does not give a message:
<token regexp="yes">(?-i)(Bill|XXBILL)</token>

Is the testrules message correct?


(Daniel Naber) #2

This doesn’t really answer your question, but I wonder whether using case_sensitive="yes/no" as an attribute wouldn’t be better?


(Mike Unwalla) #3

Hi @dnaber, yes, case_sensitive works fine.

But, as you wrote, my question is about whether the testrules message is correct. The wiki tells me “Alternatively, case-sensitive matching can be turned on for single tokens by using (?-i) in regular expressions” (http://wiki.languagetool.org/development-overview#toc4).