English These - singular

Hi, I’m writing a rule which catches an error if “these” is followed by singular expression.
For example…

<rule id="THESE-NN" name="These singular">    
         <pattern>
          <marker>
            <token>these</token>
          </marker>
          <token postag='NN'><exception postag='NNS|CD' postag_regexp='yes'></exception></token>
          <token postag='MD'></token>
         </pattern>
         <message>Did you mean <suggestion>this</suggestion>?</message>
         <example type="incorrect" correction='This'><marker>These</marker> author should rewrite this point.</example>
         <example type="incorrect" correction='This'><marker>These</marker> writer should rewrite this point.</example>
        <example>These two should do ok.</example>
        </rule>

However, for some reason the word “author” doesn’t get assign a “Part-of-speech” if preceded by the word “these”.
I’ve checked the rulebase and found a few other rules which deal with “This” and “These”, for example ‘this’ vs. ‘these’ and these/those ones (these/those), but nothing which could impact on the word “author”. Any idea what’s special about the word author?

Thanks

This seems to be caused by rule DT_plural_VBNNN_VBN in disambiguation.xml. You can find that out easily by running the sentence through Text Analysis - LanguageTool.

Do you think the DT_plural_VBNNN_VBN needs to be modified so it doesn’t remove all postags?

Would changing the exception line, in DT_plural_VBNNN_VBN, to include MD like this…

… a possible solution?

I’m not sure. As changes in disambiguation.xml may have an effect on several rules, it’s necessary to test them carefully. But you might get more helpful replies on the mailing list. For example, Marcin has worked a lot on the English disambiguation.

OK will look into this. I think the solution might be simply to ensure that this disambig pattern never removes all postags from a word. Hopefully this disambig pattern only impacts on rules that utilise “these”.

Hi, I assume you mean the mailing list on sourceforge?
I’ve also found the same issue with the word “man”.
For now I’ve found a solution using “chunks”…

<rule id="THESE-NN-BLOCK" name="These singular block">    
         <pattern>
          <marker>
            <token chunk="B-NP-singular">these</token>
          </marker>
          <token chunk="E-NP-singular"><exception postag='NNS|JJ' postag_regexp='yes'></exception><exception regexp="yes">are|yours|alone</exception></token>
          <token postag='MD'></token>
         </pattern>
         <message>Did you mean <suggestion>this</suggestion>?</message>
         <example type="incorrect" correction='This'><marker>These</marker> author should rewrite this point.</example>
        </rule>

…but I feel this is not a great solution.
Thanks

Yes, I mean https://languagetool.org/development/mailing-list.php - it’s still hosted at Sourceforge as it’s not so easy to find a good mailing list hoster (github etc. don’t offer mailing lists).

Hi, Just letting you know I’ve had no replies from the mailing list on this issue.
It’s a tricky one to solve since if a disambig rule removes all postags it kind of limits the scope for developing other rules.
I imagine the intention of this disambig rule was not to remove all postags. Hence is it possible to catch the point when all postags are remove and then re-assign the word a postag selected from its original list?

Thanks

Peter

Hi, Peter, do you check your rules after write them? checked these rules against e.g. Wikipedia or some big data?

Hi Mility, yes I do check my rules using the Wikipedia data.
It’s while running the wiki test I identify most issues.
I also tend to not submit rules until I’ve tested them in my version for a number of months first, which is how I spotted this disambiguation issue.

In http://languagetool-user-forum.2306527.n4.nabble.com/add-rules-td4643105.html.
For some reasons, I don’t have those big data and the hardware environment to test, could you help me to rewrite the rules which could be real rules? Thanks advance.

Well, if it was easy to write the ngram rules as XML rules, we’d do that. But it’s not, the ngram rule covers all kinds of cases that cannot simply be listed in an XML rule.

Sorry, what is ngram rule?

Sorry, I misunderstood your question. I thought your question referred to Finding errors using n-gram data - LanguageTool Wiki. Anyway, then I don’t understand your question.

Oh,
For some reasons, I don’t have those big data(eg.Wiki) and the hardware environment to test those rules in http://languagetool-user-forum.2306527.n4.nabble.com/add-rules-td4643105.html, I mean I want ask Peter for help, help me to rewrite the rules.