Back to LanguageTool Homepage - Privacy - Imprint

What is the difference between these two antipatterns?


(Irvine) #1

Can somebody please explain to me why, in practice, these two antipatterns are different. In both cases, they are looking for a number followed by a comma?

R1.3C No comma when and|or|nor connects cardinal numbers.

Oxford comma 1: This caveat eliminated most of the dross from the Wikipedia samples plus the example lists 1 and 2

<antipattern>
	<token postag='CD' />
	<token postag=',' skip='-1'/>
	<token postag=','>
		<exception scope='previous' negate_pos='yes' postag='CD'></exception>
	</token>
	<token regexp='yes'>and|or|nor</token>
	<token postag='CD' />
</antipattern>

Oxford comma 2: This caveat eliminated the example lists 3 and 4, plus the lists from the articles on the Shekel and the Voting rights act.

<antipattern>
	<token postag=','>
		<exception scope='previous' negate_pos='yes' postag='CD'></exception>
	</token>
	<token skip='-1' />
	<token postag=','>
		<exception scope='previous' negate_pos='yes' postag='CD'></exception>
	</token>
	<token regexp='yes'>and|or|nor</token>
	<token postag='CD' />
</antipattern>

Note: In the following "correct examples", the differences between each progressive sentence is very subtle, usually just one word.

example list 1:

I like this 1st list: 1, 2, 3, 4, and 5, it does not cause me grief.

example list 2:

I like this 2nd list: 3 oranges or 4, 4 or 5, or 6 plums, it does not cause me grief.

example list 3:

I like this 3rd list: 3 oranges or 4, 4 apples or 5, or 6 plums, it is no longer causing me grief.

example list 4:

I like this 4th list: 3 oranges or 4, 4 apples or 5, 5 oranges or 7, and 6 plums, it is no longer causing me grief.

The question is more than academic and is not limited to lists dealing with numbers. If I could figure out what is going on, I hope to be able to write antipatterns that excludes several other list constructions I have found on Wikipedia. For example, the following defies every antipattern I have thrown at it:

example list 5:

I hate this 5th list: 3 oranges or 4, 4 apples or 5, neither 5 oranges nor 7, and 8 pears with sugar, it is causing me nothing but grief!

The problem with this 5th list is not tied to "neither", if I place any word, before 5 oranges nor 7, the antipatterns fail; but, like with list 4, they work fine with the word removed.

As far as I can seen the, the two antipatterns are "logically" identical, and either one should have caught ALL of the examples, but they don't. I only found the second antipattern by blindly trying variations on a theme. With the fifth example, I have tried a third antipattern looking for exotic variations of a number followed by a comma, all attempts have been unsuccessful.

Thank you for your consideration

Irvine


(Daniel Naber) #2

Could you post the complete rule that this antipattern belongs to?


(Irvine) #3

Thanks for your response, here is the complete rule:

<rule name="Test bed" type="grammar">    

    <rulegroup id="R1.3--Comma-CC" name="R1.3: Incorrect comma before coordinating conjunction">    
    			<!-- R1.3C No comma when and|or|nor connects cardinal numbers. -->
    			<rule>
    				<!-- Oxford comma 1: 
    				This caveat eliminated most of the dross from the Wikipedia samples plus the example lists 1 and 2
    				-->
    				<antipattern>
    					<token postag='CD' />
    					<token postag=',' skip='-1'/>
    					<token postag=','>
    						<exception scope='previous' negate_pos='yes' postag='CD'></exception>
    					</token>
    					<token regexp='yes'>and|or|nor</token>
    					<token postag='CD' />
    				</antipattern>
    				<!-- Oxford comma 2: 
    				This caveat eliminated the example lists 3 and 4, plus the lists from the articles on the Shekel and the Voting rights act. 
    				-->
    				<antipattern>
    					<token postag=','>
    						<exception scope='previous' negate_pos='yes' postag='CD'></exception>
    					</token>
    					<token skip='-1' />
    					<token postag=','>
    						<exception scope='previous' negate_pos='yes' postag='CD'></exception>
    					</token>
    					<token regexp='yes'>and|or|nor</token>
    					<token postag='CD' />
    				</antipattern>
    				<!-- Main pattern: numbers-->
    				<pattern>
    					<marker>
    						<token postag='CD' ></token>
    						<token postag=','></token>
    						<token regexp='yes'>and|or|nor</token>
    						<token postag='CD' ></token> 
    					</marker>
    				</pattern> 
    				<message>
    					1.3C: We should only use the comma before a coordinating conjunction, (\3,) when it joins two clauses. In this case, it appears you are joining two numbers, which is fine, but not clausal. As a result, the comma is confusing and should be removed. Be warned: Conjunction rules can be very sensitive to poor phrasing.”
    				</message>
    				<suggestion><match no="1"></match> <match no="3"></match> <match no="4"></match></suggestion>
    				<url>http://writing.wisc.edu/Handbook/CoordConj.html</url>
    				<short>
    					1.3C: No comma when CC, (\3,) joins two numbers
    				</short>
    				<example type='incorrect'>
    					He had between <marker>15, and 20</marker> minutes to walk to school.
    				</example>
    				<example type='incorrect'>
    					He had between <marker>fifteen, and twenty</marker> minutes to walk to school.
    				</example>
    				<example type='correct'>
    					He had between 15 and 20 minutes to walk to school.
    				</example>
    				<example type='correct'>
    					He had between fifteen and twenty minutes to walk to school.
    				</example>
    				<example type='correct'>
    					I like this 1st list: 1, 2, 3, 4, and 5, it does not cause  me grief.
    				</example>
    				<example type='correct'>
    					I like this 2nd list:  3 oranges or 4, 4 or 5, or 6 plums, it does not cause  me grief.
    				</example>
    				<example type='correct'>
    					I hated this 3rd list: 3 oranges or 4, 4 apples or 5, or 6 plums, it is caused me a lot ofgrief!
    				</example>
    				<example type='correct'>
    					I like this 4th list: 3 oranges or 4, 4 apples or 5, 5 oranges or 7, and 6 plums, it is no longer causing me grief!
    				</example>
    				<example type='correct'>
    					Congress extended the coverage formula and special provisions tied to it, such as the Section 5 preclearance requirement, for five years in 1970, seven years in 1975, and 25 years in both 1982 and 2006. 
    				</example>
    				<example type='correct'>
    					As with many ancient units, the shekel had a variety of values depending on era, government and region; weights between 9 and 17 grams, and values of 11,[3] 14, and 17 grams are common.
    				</example>
    				<example type='correct'>
    					I hate this 5th list: 1, 2 and 3, 4 apples or 5, neither 5 oranges nor 7, and 8 pears with sugar, it is causing me nothing but grief!
    				</example>
    				<example type='correct'>
    					Possible dates include 9 November 1799, when Bonaparte seized power on 18 Brumaire in France; or 18 May 1803, when Britain and France ended the one short period of peace between 1792 and 1814, or 2 December 1804, when Bonaparte crowned himself Emperor.
    				</example>
    				<example type='correct'>
    					Since the end of the Second World War the original VC has been awarded 14 times: four in the Korean War, one in the Indonesia-Malaysia confrontation in 1965, four to Australians in the Vietnam War, two during the Falklands War in 1982, one in the Iraq War in 2004, and two in the War in Afghanistan in 2006 and 2012.
    				</example>
    				<example type='correct'>
    					In the 2009 U.S. News and World Report "Graduate School Rankings", all fourteen of Princeton's doctoral programs evaluated were ranked in their respective top 20, 7 of them in the top 5, and 4 of them in the top spot (Mathematics, Economics, History, Political Science).
    				</example>
    			</rule>


		</rulegroup>

	</category>

(Daniel Naber) #4

Thanks for the detailed report, this seems to be caused by a subtle bug in the antipattern implementation when used with skip. I wrote a bug report (https://github.com/languagetool-org/languagetool/issues/189) and will also report it to the mailing list.


(Irvine) #5

I have been following the progress of the bug report and read the previous mailing list discussion, link. Just to keep you informed, I tried the suggestion of enclosing the antipattern in markers and it worked!

Edit
At the moment, it is more academic than anything else, but I should add: If I put a second 'skip' into the antipattern, using markers no longer seems to work. I tried several variations of this; for example, placing markers around each 'skip' section. It did not seem to make any difference, two 'skips' didn't work.
End-Edit

The single antipattern, below, passes all the correct examples.








and|or|nor


To be certain I have tied down the pattern, I updated the examples to include the simplest possible correct list, along with an incorrect fake a list. For completeness, in case you want to use this example in testing, I have included them below.

I will leave the debate as to whether it is a bug or expected behavior alone, (though I am curious why and tokens do not work with antipatterns?). I am about to give it a full run on my Wikipedia test bed, and if any other problems crop up I will keep you informed.

Updated list of incorrect/correct examples:


He had between 15, and 20 minutes to walk to school.


He had between fifteen, and twenty minutes to walk to school.


This is not a list: 2, or 3 apples.


He had between 15 and 20 minutes to walk to school.


He had between fifteen and twenty minutes to walk to school.


This is not a list: 2 or 3 apples.


This is the simplest list 1, 2, or 3 apples.


I like this 1st list: 1, 2, 3, 4, and 5, it does not cause me grief.


I like this 2nd list: 3 oranges or 4, 4 or 5, or 6 plums, it does not cause me grief.


I hated this 3rd list: 3 oranges or 4, 4 apples or 5, or 6 plums, it is caused me a lot ofgrief!


I like this 4th list: 3 oranges or 4, 4 apples or 5, 5 oranges or 7, and 6 plums, it is no longer causing me grief!


Congress extended the coverage formula and special provisions tied to it, such as the Section 5 preclearance requirement, for five years in 1970, seven years in 1975, and 25 years in both 1982 and 2006.


As with many ancient units, the shekel had a variety of values depending on era, government and region; weights between 9 and 17 grams, and values of 11,[3] 14, and 17 grams are common.


I hate this 5th list: 1, 2 and 3, 4 apples or 5, neither 5 oranges nor 7, and 8 pears with sugar, it is causing me nothing but grief!


Since the end of the Second World War the original VC has been awarded 14 times: four in the Korean War, one in the Indonesia-Malaysia confrontation in 1965, four to Australians in the Vietnam War, two during the Falklands War in 1982, one in the Iraq War in 2004, and two in the War in Afghanistan in 2006 and 2012.


In the 2009 U.S. News and World Report "Graduate School Rankings", all fourteen of Princeton's doctoral programs evaluated were ranked in their respective top 20, 7 of them in the top 5, and 4 of them in the top spot (Mathematics, Economics, History, Political Science).


Possible dates include 9 November 1799, when Bonaparte seized power on 18 Brumaire in France; or 18 May 1803, when Britain and France ended the one short period of peace between 1792 and 1814, or 2 December 1804, when Bonaparte crowned himself Emperor.


(Irvine) #6

Okay, I just ran a full, (321 articles, 11.5MB excluding references,) Wikipedia test and another one came up that should have been caught by the antipattern:


Yet another problem list: five million, six million, and seven million, bottles of beer.

The problem sentence that brought this to my attention was:


The countries' actual naval tonnages were 36,896 long tons (37,488 t) for Chile, 34,425 long tons (34,977 t) for Argentina, and 27,661 long tons (28,105 t) for Brazil, while the populations were estimated by Livermore at three million, five million, and fourteen million, respectively.

This sentence contains two lists, and I was going to write the following antipattern to deal with the first list:







and|or|nor


This 2nd antipattern catches my simple example as written, but it fails when I rewrite it as:


Yet another problem list: 5 million, 6 million, and 7 million, bottles of beer.

Having said that, both these simple lists should have been caught by the first antipattern!

Irvine


(Daniel Naber) #7

Irvine, we have fixed the issue #189, but I could only test with a very simple case so far. Could you maybe download the latest build (https://languagetool.org/download/snapshots/LanguageTool-20140918-snapshot.zip) and see if it fixes your cases? (also when not using ...)


(Irvine) #8

Sorry it took so long, I have a very slow internet connection and downloads take forever.

I have ran a moderately extensive set of tests, (results below,) and the 'fix' corrects the basic problem, though there is still some issues. I have reproduced the console output directly, with a highlighted edit to explain what was tested:

Microsoft Windows XP Version 5.1.2600 Copyright 1985-2001 Microsoft Corp.

C:\Documents and Settings\anonymous>CD C:\LTD

First test: Did it break what already worked?
C:\LTD>test-rules.bat EN
Running XML pattern tests...
Known languages: [English, English (US), English (GB), English (Australian), English (Canadian), English (New Zealand), English (South African), Persian, French
, German, German (Germany), German (Austria), German (Swiss), Simple German, Polish, Catalan, Catalan, Catalan (Valencian), Italian, Breton, Dutch, Portuguese,
Portuguese (Portugal), Portuguese (Brazil), Russian, Asturian, Belarusian, Chinese, Danish, Esperanto, Galician, Greek, Icelandic, Japanese, Khmer, Lithuanian,
Malayalam, Romanian, Slovak, Slovenian, Spanish, Swedish, Tamil, Tagalog, Ukrainian, Testlanguage]
Running XML validation for en/grammar.xml...
Running pattern rule tests for English... 1 rules tested.
Tests finished!
Running disambiguator rule tests...
Running disambiguation tests for English...
407 rules tested.
Tests successful.
Running XML bitext pattern tests...
Tests successful.
Validating false-friends.xml...
Validation successfully finished.

Second test: Did it 'fix' the problem with 'skip' when its used with adjectival numbers like: million, billion.... etc
C:\LTD>test-rules.bat EN
Running XML pattern tests...
Known languages: [English, English (US), English (GB), English (Australian), English (Canadian), English (New Zealand), English (South African), Persian, French
, German, German (Germany), German (Austria), German (Swiss), Simple German, Polish, Catalan, Catalan, Catalan (Valencian), Italian, Breton, Dutch, Portuguese,
Portuguese (Portugal), Portuguese (Brazil), Russian, Asturian, Belarusian, Chinese, Danish, Esperanto, Galician, Greek, Icelandic, Japanese, Khmer, Lithuanian,
Malayalam, Romanian, Slovak, Slovenian, Spanish, Swedish, Tamil, Tagalog, Ukrainian, Testlanguage]
Running XML validation for en/grammar.xml...
Running pattern rule tests for English... Exception in thread "main" junit.framework.AssertionFailedError: English: Did not expect error in:
Yet another problem list: 5 million, 6 million, and 7 million, bottles of beer.
Matching Rule: R1.3--Comma-CC /,, and|or|nor, /CD]:R1.3: Incorrect comma before coordinating conjunction
at junit.framework.Assert.fail(Assert.java:57)
at junit.framework.Assert.assertTrue(Assert.java:22)
at junit.framework.Assert.assertFalse(Assert.java:39)
at junit.framework.TestCase.assertFalse(TestCase.java:210)
at org.languagetool.rules.patterns.PatternRuleTest.testCorrectSentences(PatternRuleTest.java:415)
at org.languagetool.rules.patterns.PatternRuleTest.testGrammarRulesFromXML(PatternRuleTest.java:236)
at org.languagetool.rules.patterns.PatternRuleTest.runTestForLanguage(PatternRuleTest.java:173)
at org.languagetool.rules.patterns.PatternRuleTest.runGrammarRulesFromXmlTestIgnoringLanguages(PatternRuleTest.java:142)
at org.languagetool.rules.patterns.PatternRuleTest.main(PatternRuleTest.java:501)
Running disambiguator rule tests...
Running disambiguation tests for English...
407 rules tested.
Tests successful.
Running XML bitext pattern tests...
Tests successful.
Validating false-friends.xml...
Validation successfully finished.

Third test remove markers from basic antipattern
C:\LTD>test-rules.bat EN
Running XML pattern tests...
Known languages: [English, English (US), English (GB), English (Australian), English (Canadian), English (New Zealand), English (South African), Persian, French
, German, German (Germany), German (Austria), German (Swiss), Simple German, Polish, Catalan, Catalan, Catalan (Valencian), Italian, Breton, Dutch, Portuguese,
Portuguese (Portugal), Portuguese (Brazil), Russian, Asturian, Belarusian, Chinese, Danish, Esperanto, Galician, Greek, Icelandic, Japanese, Khmer, Lithuanian,
Malayalam, Romanian, Slovak, Slovenian, Spanish, Swedish, Tamil, Tagalog, Ukrainian, Testlanguage]
Running XML validation for en/grammar.xml...
Running pattern rule tests for English... 1 rules tested.
Tests finished!
Running disambiguator rule tests...
Running disambiguation tests for English...
407 rules tested.
Tests successful.
Running XML bitext pattern tests...
Tests successful.
Validating false-friends.xml...
Validation successfully finished.

Fourth test: Remove markers from special case antipattern for millions, billions.....
C:\LTD>test-rules.bat EN
Running XML pattern tests...
Known languages: [English, English (US), English (GB), English (Australian), English (Canadian), English (New Zealand), English (South African), Persian, French
, German, German (Germany), German (Austria), German (Swiss), Simple German, Polish, Catalan, Catalan, Catalan (Valencian), Italian, Breton, Dutch, Portuguese,
Portuguese (Portugal), Portuguese (Brazil), Russian, Asturian, Belarusian, Chinese, Danish, Esperanto, Galician, Greek, Icelandic, Japanese, Khmer, Lithuanian,
Malayalam, Romanian, Slovak, Slovenian, Spanish, Swedish, Tamil, Tagalog, Ukrainian, Testlanguage]
Running XML validation for en/grammar.xml...
Running pattern rule tests for English... 1 rules tested.
Tests finished!
Running disambiguator rule tests...
Running disambiguation tests for English...
407 rules tested.
Tests successful.
Running XML bitext pattern tests...
Tests successful.
Validating false-friends.xml...
Validation successfully finished.

5th test: Try adding a second skip. For this I used:










and|or|nor

Since the following is no longer a list, if the two skips are working, I expected it to fail. Which it did.


This is the simplest list 1, 2, or 3 apples.

And, if both skips are working as expected, only the following is a list:

I like this 1st list: 1, 2, 3, 4, and 5, it does not cause me grief.

This test was also successful

The special case for million|billion|... antipattern referred to is:

hundred|thousand|million|milliard|billion|trillion|quadrillion|quintillion|sextillion|septillion|octillion|nonillion|decillion|undecillion|duodecillion|tredecillion|quattuordecillion|quindecillion|sexdecillion|sedecillion|septendecillion|octodecillion|novemdecillion|novendecillion|vigintillion|centillion|googol|googolplex


hundred|thousand|million|milliard|billion|trillion|quadrillion|quintillion|sextillion|septillion|octillion|nonillion|decillion|undecillion|duodecillion|tredecillion|quattuordecillion|quindecillion|sexdecillion|sedecillion|septendecillion|octodecillion|novemdecillion|novendecillion|vigintillion|centillion|googol|googolplex

and|or|nor

I hope this helps. If you need anything more, just ask.
Irvine


(Daniel Naber) #9

Thanks for testing. Unfortunately, I had to revert my "fix" as it caused other issues. So for now, the workaround is to always use "..." in the antipattern if needed (i.e. when skip is used, or when it doesn't seem to work for some reason).


(Daniel Naber) #10

I have committed another fix that should make the antipatterns work even when no is used. Again, I have only tested this with simple cases so if you find cases where it doesn't work, please let me know (the latest snapshot needs to be used to make sure my fix is active).


(Irvine) #11

I will download the fix as soon as possible and give it the same tests as before. As I said, this may take me twenty four hours since I only have a very slow and ancient modem.

Irvine


(Daniel Naber) #12

You could also update the source code directly from github. That way you only get the small change set and not the whole 70MB file. However, that requires a developer setup with three tools that need to be installed, so this is totally optional.


(Irvine) #13

Below are the results of my testing. I would have got back to you earlier this morning, but I was slightly distracted!

As you can see, the revision fixes the basic problem with skip and markers, but there is still a problem when skip is used in an antipattern with adjectival numbers. I assume the two problems are related?

Before I get into the report. With regard to getting the fragmentation rules ready for 2.7 release: Because of my slow internet, I have currently a basic subversion directory to update 'only' the grammar.xml file, I believe Tortoise-svn lets me run 'Create patch' against the version in Github. I am about to start downloading Maven;if it installs easily, my internet will be tied up downloading the full repo. This will take a while, but I am working on the rules and should have a patch ready by the time the downloads finish.

First test: Did it break what already worked?

No, works fine.

Second test: remove markers from antipattern

No problems

Third test: a second skip in the antipattern

No problems

Fourth test: adjectival numbers with skip.

Output of test-rules.bat with workaround for 1 million, 2 million etc commented out

Running pattern rule tests for English... Exception in thread "main" junit.framework.AssertionFailedError: English: Did not expect error in:
Yet another problem list: 5 million, 6 million, and 7 million, bottles of beer.
Matching Rule: R1.3--Comma-CC /,, and|or|nor, /CD]:R1.3: Incorrect comma before coordinating conjunction
at junit.framework.Assert.fail(Assert.java:57)
at junit.framework.Assert.assertTrue(Assert.java:22)
at junit.framework.Assert.assertFalse(Assert.java:39)
at junit.framework.TestCase.assertFalse(TestCase.java:210)
at org.languagetool.rules.patterns.PatternRuleTest.testCorrectSentences(PatternRuleTest.java:415)
at org.languagetool.rules.patterns.PatternRuleTest.testGrammarRulesFromXML(PatternRuleTest.java:236)
at org.languagetool.rules.patterns.PatternRuleTest.runTestForLanguage(PatternRuleTest.java:173)
at org.languagetool.rules.patterns.PatternRuleTest.runGrammarRulesFromXmlTestIgnoringLanguages(PatternRuleTest.java:142)
at org.languagetool.rules.patterns.PatternRuleTest.main(PatternRuleTest.java:501)
Running disambiguator rule tests...

Output of test-rules.bat with workaround for one, two etc commented out

Running pattern rule tests for English... Exception in thread "main" junit.framework.AssertionFailedError: English: Did not expect error in:
A shipment is dispatched every one, two, or six months.
Matching Rule: R1.3--Comma-CC /,, and|or|nor, /CD]:R1.3: Incorrect comma before coordinating conjunction
at junit.framework.Assert.fail(Assert.java:57)
at junit.framework.Assert.assertTrue(Assert.java:22)
at junit.framework.Assert.assertFalse(Assert.java:39)
at junit.framework.TestCase.assertFalse(TestCase.java:210)
at org.languagetool.rules.patterns.PatternRuleTest.testCorrectSentences(PatternRuleTest.java:415)
at org.languagetool.rules.patterns.PatternRuleTest.testGrammarRulesFromXML(PatternRuleTest.java:236)
at org.languagetool.rules.patterns.PatternRuleTest.runTestForLanguage(PatternRuleTest.java:173)
at org.languagetool.rules.patterns.PatternRuleTest.runGrammarRulesFromXmlTestIgnoringLanguages(PatternRuleTest.java:142)
at org.languagetool.rules.patterns.PatternRuleTest.main(PatternRuleTest.java:501)
Running disambiguator rule tests...


(Daniel Naber) #14

Sorry, I'm getting confused, as the thread is getting longer. Could you post the complete rule that's affected with the sentence that you want to or don't want to match?


(Irvine) #15

You are lucky, I am in between downloads. Below is the complete "incorrect comma rule", this is the rule where 'skip', when used in an antipattern was not working properly.

<rulegroup id="R1.3--Comma-CC" name="R1.3: Incorrect comma before coordinating conjunction">    
			<!-- R1.3C No comma when and|or|nor connects cardinal numbers. -->
			<!-- Note:
			Wile it has been complicated by the 'skip' bug, this is a highly targeted rule, see list of correct examples. see here for details of bug: http://languagetool-user-forum.2306527.n4.nabble.com/What-is-the-difference-between-these-two-antipatterns-tp4641883.html
			-->
			<rule>
				<!-- Oxford comma 1: 
				-->
				<antipattern>
					<token postag='CD' />
					<token postag=',' skip='-1' />
					<token postag=',' >
						<exception scope='previous' negate_pos='yes' postag='CD'></exception>
					</token>
					<token regexp='yes'>and|or|nor</token>
					<token postag='CD' />
				</antipattern>
				<!-- Oxford comma 2: 
				At time of writing, a little bug in LT has problems with adjectival numbers like 6 million, skip and antipattern, see here http://languagetool-user-forum.2306527.n4.nabble.com/What-is-the-difference-between-these-two-antipatterns-tp4641883.html
				-->
				<antipattern>
					<token regexp='yes'>hundred|thousand|million|milliard|billion|trillion|quadrillion|quintillion|sextillion|septillion|octillion|nonillion|decillion|undecillion|duodecillion|tredecillion|quattuordecillion|quindecillion|sexdecillion|sedecillion|septendecillion|octodecillion|novemdecillion|novendecillion|vigintillion|centillion|googol|googolplex</token>
					<token postag=',' skip='-1' />
					<token postag=',' >
						<exception scope='previous' regexp='yes' negate='yes'>hundred|thousand|million|milliard|billion|trillion|quadrillion|quintillion|sextillion|septillion|octillion|nonillion|decillion|undecillion|duodecillion|tredecillion|quattuordecillion|quindecillion|sexdecillion|sedecillion|septendecillion|octodecillion|novemdecillion|novendecillion|vigintillion|centillion|googol|googolplex</exception>
					</token>
					<token regexp='yes'>and|or|nor</token>
					<token postag='CD' />
				</antipattern>
				<!-- Oxford comma 3: 
				The 'skip' problem again! Words as adjectival numbers, similar to the problem with millions: e.g. The list three, six, and two is a list of numbers. I am keeping the solutions separate partly because this works and partly because there are subtle differences between the two rules.
				-->
				<!--<antipattern>
					<token postag='CD' regexp='yes'>[A-Za-z]+</token>
					<token postag=',' skip='-1' />
					<token postag=',' >
						<exception scope='previous' negate_pos='yes' postag='CD' regexp='yes'>\d+</exception>
					</token>
					<token regexp='yes'>and|or|nor</token>
					<token postag='CD' regexp='yes'>[A-Za-z]+</token>
				</antipattern>-->
				<!-- Years and numbers: 
				Years e.g. 1956: The difference between 1956 and 1,956 is the comma.
				-->
				<antipattern>
						<token regexp='yes'>\d\d\d\d</token>
						<token postag=','/>
						<token regexp='yes'>and|or|nor</token>
						<token postag='CD' >
							<exception regexp='yes'>\d\d\d\d</exception>
						</token>
				</antipattern>
				<!-- Numbers and years: 
				Years e.g. 1956: The difference between 1956 and 1,956 is the comma.
				-->
				<antipattern>
						<token postag='CD' >
							<exception regexp='yes'>\d\d\d\d</exception>
						</token>
						<token postag=','/>
						<token regexp='yes'>and|or|nor</token>
						<token regexp='yes'>\d\d\d\d</token>
				</antipattern>
				<!-- Numbers and words: 
				e.g. The number is 56, and two is the word. In a consistent writing style, they cannot be talking about the same thing!
				-->
				<antipattern>
						<token regexp='yes'>\d+</token>
						<token postag=','/>
						<token regexp='yes'>and|or|nor</token>
						<token regexp='yes'>[A-Za-z]+<exception regexp='yes'>hundred|thousand|million|milliard|billion|trillion|quadrillion|quintillion|sextillion|septillion|octillion|nonillion|decillion|undecillion|duodecillion|tredecillion|quattuordecillion|quindecillion|sexdecillion|sedecillion|septendecillion|octodecillion|novemdecillion|novendecillion|vigintillion|centillion|googol|googolplex</exception></token>
				</antipattern>
				<!-- Words and numbers: 
				e.g. The word is fifty six, and 2 is the number. In a consistent writing style, they cannot be talking about the same thing!
				-->
				<antipattern>
						<token regexp='yes'>[A-Za-z]+<exception regexp='yes'>hundred|thousand|million|milliard|billion|trillion|quadrillion|quintillion|sextillion|septillion|octillion|nonillion|decillion|undecillion|duodecillion|tredecillion|quattuordecillion|quindecillion|sexdecillion|sedecillion|septendecillion|octodecillion|novemdecillion|novendecillion|vigintillion|centillion|googol|googolplex</exception></token>
						<token postag=','/>
						<token regexp='yes'>and|or|nor</token>
						<token regexp='yes'>\d+</token>
				</antipattern>
				<!-- Main pattern: numbers-->
				<pattern>
					<marker>
						<token postag='CD' ></token>
						<token postag=','></token>
						<token regexp='yes'>and|or|nor</token>
						<token postag='CD' ></token> 
					</marker>
				</pattern> 
				<message>
					1.3C: We should only use the comma before a coordinating conjunction, (\3,) when it joins two clauses. In this case, it appears you are joining two numbers, which is fine, but not clausal. As a result, the comma is confusing and should be removed. Be warned: Conjunction rules can be very sensitive to poor phrasing.”
				</message>
				<suggestion><match no="1"></match> <match no="3"></match> <match no="4"></match></suggestion>
				<url>http://writing.wisc.edu/Handbook/CoordConj.html</url>
				<short>
					1.3C: No comma when CC, (\3,) joins two numbers
				</short>
				<example type='incorrect'>
					He had between <marker>15, and 20</marker> minutes to walk to school.
				</example>
				<example type='incorrect'>
					He had between <marker>fifteen, and twenty</marker> minutes to walk to school.
				</example>
				<example type='incorrect'>
					This is not a list: <marker>2, or 3</marker> apples.
				</example>
				<example type='incorrect'>
					Not a list: six <marker>million, and seven</marker> million bottles of beer.
				</example>
				<example type='incorrect'>
					Not a list: 6 <marker>million, and 7</marker> million bottles of beer.
				</example>
				<example type='correct'>
					I like this 1st list: 1, 2, 3, 4, and 5, it does not cause  me grief.
				</example>
				<example type='correct'>
					He had between 15 and 20 minutes to walk to school.
				</example>
				<example type='correct'>
					He had between fifteen and twenty minutes to walk to school.
				</example>
				<example type='correct'>
					This is not a list: 2 or 3 apples.
				</example>
				<example type='correct'>
					This is the simplest list 1, 2, or 3 apples.
				</example>
				<example type='correct'>
					I like this 1st list: 1, 2, 3, 4, and 5, it does not cause  me grief.
				</example>
				<example type='correct'>
					I like this 2nd list:  3 oranges or 4, 4 or 5, or 6 plums, it does not cause  me grief.
				</example>
				<example type='correct'>
					I hated this 3rd list: 3 oranges or 4, 4 apples or 5, or 6 plums, it is caused me a lot ofgrief!
				</example>
				<example type='correct'>
					I like this 4th list: 3 oranges or 4, 4 apples or 5, 5 oranges or 7, and 6 plums, it is no longer causing me grief!
				</example>
				<example type='correct'>
					I hate this 5th list: 1, 2 and 3, 4 apples or 5, neither 5 oranges nor 7, and 8 pears with sugar, it is causing me nothing but grief!
				</example>
				<example type='correct'>
					Not a list: six million and seven million bottles of beer.
				</example>
				<example type='correct'>
					Not a list: 6 million and 7 million bottles of beer.
				</example>
				<example type='correct'>
					Yet another problem list: 5 million, 6 million, and 7 million, bottles of beer.
				</example>
				<example type='correct'>
					Yet another problem list: five million, six million, and seven million, bottles of beer.
				</example>
				<example type='correct'>
					Congress extended the coverage formula and special provisions tied to it, such as the Section 5 preclearance requirement, for five years in 1970, seven years in 1975, and 25 years in both 1982 and 2006. 
				</example>
				<example type='correct'>
					As with many ancient units, the shekel had a variety of values depending on era, government and region; weights between 9 and 17 grams, and values of 11,[3] 14, and 17 grams are common.
				</example>
				<example type='correct'>
					Since the end of the Second World War the original VC has been awarded 14 times: four in the Korean War, one in the Indonesia-Malaysia confrontation in 1965, four to Australians in the Vietnam War, two during the Falklands War in 1982, one in the Iraq War in 2004, and two in the War in Afghanistan in 2006 and 2012.
				</example>
				<example type='correct'>
					In the 2009 U.S. News and World Report "Graduate School Rankings", all fourteen of Princeton's doctoral programs evaluated were ranked in their respective top 20, 7 of them in the top 5, and 4 of them in the top spot (Mathematics, Economics, History, Political Science).
				</example>
				<example type='correct'>
					Possible dates include 9 November 1799, when Bonaparte seized power on 18 Brumaire in France; or 18 May 1803, when Britain and France ended the one short period of peace between 1792 and 1814, or 2 December 1804, when Bonaparte crowned himself Emperor.
				</example>
				<example type='correct'>
					The countries' actual naval tonnages were 36,896 long tons (37,488 t) for Chile, 34,425 long tons (34,977 t) for Argentina, and 27,661 long tons (28,105 t) for Brazil, while the populations were estimated by Livermore at three million, five million, and fourteen million, respectively.
				</example>
				<example type='correct'>
					I was born in the summer of 1956, and 12 years later there was the summer of love.
				</example>
				<example type='correct'>
					I was a dedicated follower of fashion, even at age 12, and 1968 saw summer of love.
				</example>
				<example type='correct'>
					Sixteen squadrons of Phantoms were permanently deployed between 1965 and 1973, and 17 others deployed on temporary combat assignments.
				</example>
				<example type='correct'>
					The Soviet Union's Luna programme was the first to reach the Moon with unmanned spacecraft in 1959; the United States' NASA Apollo program achieved the only manned missions to date, beginning with the first manned lunar orbiting mission by Apollo 8 in 1968, and six manned lunar landings between 1969 and 1972, with the first being Apollo 11.
				</example>
				<example type='correct'>
					The policy deflates grades only relative to their previous levels; indeed, as of 2009, or five years after the policy was instituted, the average graduating GPA saw a marginal decrease, from 3.46 to 3.39.
				</example>
				<example type='correct'>
					His stay on Mir, considered the smoothest of the entire Phase One program, featured weekly "Letters from the Outpost" from Thomas and passed two milestones for length of spaceflight—815 consecutive days in space by American astronauts since the launch of Shannon Lucid on the STS-76 mission in March 1996, and 907 days of Mir occupancy by American astronauts dating back to Norman Thagard's trip to Mir in March 1995.
				</example>
				<example type='correct'>
					To oppose them, Greece would have only had Salamis, which was being built in Germany and scheduled for completion in March 1915, and two entirely obsolete pre-dreadnoughts, Kilkis and Lemnos, purchased from the United States in May 1914 to avert what seemed to be an imminent war.
				</example>
				<example type='correct'>
					The word is fifty six, and 2 is the number.
				</example>
				<example type='correct'>
					The number is 56, and two is the word.
				</example>
				<example type='correct'>
					A shipment is dispatched every one, two, or six months.
				</example>
				<example type='correct'>
					A shipment is dispatched every twenty one, thirty two, or forty six months.
				</example>
				<example type='correct'>
					A shipment is dispatched every one, two, three, or six months.
				</example>
				<example type='correct'>
					Amazon's Subscribe and Save program offers a discounted price on an item (usually sold in bulk), free shipping on every Subscribe and Save shipment, and automatic shipment of the item every one, two, three, or six months.
				</example>
			</rule>

	</rulegroup>

(Daniel Naber) #16

I think the combination of skip="-1", scope='previous', and negate_pos in an exception has never been used. Would it mean "all of the skipped tokens need to have the given POS tag"? Maybe Marcin has an idea about this, but I'd simply guess this is not supported.