Here’s a rule for spotting a missing verb in PRP, MD, CD, IN sentence sequence.
I’ve tested this rule on the rather large enwiki-latest-pages-articles.xml file.
So far it has been tested on over 46 million sentences! over the past couple of days, before I had to reboot the machine.
How long does it normally take run this test? Also is there a more efficient method since this does seem rather excessive.
<rule id="PRP_MD_CD_IN" name="PRP_MD_CD_IN missing verb"> <pattern> <token postag='PRP'><exception scope="previous" postag="IN|VBN" postag_regexp="yes"></exception></token> <marker> <token postag='MD'></token> <token postag='CD'></token> </marker> <token postag='IN'></token> </pattern> <message>Did you mean <suggestion><match no="2"/> use <match no="3"/></suggestion> or <suggestion><match no="2"/> be <match no="3"/></suggestion>?</message> <example correction='could use one|could be one'>We <marker>could one</marker> of them.</example> <example correction='can use two|can be two'>We <marker>can two</marker> of these.</example> <example correction='may use one|may be one'> That she <marker>may one</marker> of the Tyr-Ridan</example> <example type="correct">I'm not convinced we need one for that.</example> <example type="correct">Any more than I would one of NAFTA medals.</example> </rule>