Writing a rule which deals with correct language

grees · April 26, 2018, 3:06pm

Hello everyone,

Would it be possible to make a general rule that dealt with language that is not necessarily incorrect but could be more idiomatic?

For example, the collocation “empirical data say…” is grammatically correct, but “data suggest/support/indicate…” might be more appropriate in certain contexts. I understand that, this could be expressed:

<rule id="subject_of_DATA" name="Collocate of DATA is subject slot">
    <pattern>
      <token>data</token>
      <token>say</token>
    </pattern>
    <message>Perhaps you could try: <suggestion>suggest, support, indicate</suggestion></message>
    <example type="incorrect">Empirical <marker>data say</marker> that</example>
    <example type="correct">Empirical data suggest that</example>
</rule>

But, I’d like to know how I might trigger a rule every time a user writes “data + verb”. The rule below does not work because there is no clash between VB and “say”.

<rule id="subject_of_DATA" name="Collocate of DATA is subject slot">
    <pattern>
      <token>data</token>
      <token postag="VB"/>
    </pattern>
    <message>Perhaps you could try: <suggestion>suggest, support, indicate</suggestion></message>
    <example type="incorrect">Empirical <marker>data say</marker> that</example>
    <example type="correct">Empirical data suggest that</example>
</rule>

I’d appreciate any suggestions you might have.

dnaber · April 26, 2018, 3:19pm

That’s because say is VBP. You can check that with Check a LanguageTool XML rule, which shows a verbose error message in case the rule example doesn’t work yet.

grees · April 26, 2018, 4:27pm

Many thanks for your reply Daniel.

You are, of course, correct. I’ve modified the rule to account for any verbal form:

<rule id="subject_of_DATA" name="Collocate of DATA is subject slot">
    <pattern>
      <token>data</token>
      <token postag="V.*" postag_regexp="yes"/>
    </pattern>
    <message>Perhaps you could try: <suggestion>suggest, support, indicate</suggestion></message>
    <example type="incorrect">Empirical <marker>data say</marker> that</example>
    <example type="correct">Empirical data suggest that</example>
</rule>

Returning to my original question, this rule still does not allow me to provide suggestions when any verb following data is entered. Is this possible?

Regards,
Geraint

dnaber · April 26, 2018, 9:14pm

Sorry, I’m not sure I understand the question. You can refer to matched tokens with \1, \2 etc if you want to use them as suggestions. If that’s not what you mean, could you give an example?