I am looking at my problem of wordiness in a slightly different way and trying to build a set of punctuation rules. At the moment, I am working on conjunctive adverbs, (see here: Conjunctive adverb - Wikipedia )
It’s still a work in progress; but currently I have two rules that test well: The first scans for a comma followed by an adverb, while the second scans for a list of adverbs commonly used as conjunctions. The problem is that many phrases are used as adverbs to join clauses. (They are called ‘conjunctive adverbial phrases’, ugh! )
For example: |on the other hand|in fact|as a result|in comparison|in contrast|just as|in addition|that is|…
Is there a way to turn such a list of phrases into a regular expression that can be used in a single rule?
Also, is there a limit to how many tokens can be used in a regular expression? For testing purpose I am just working with the examples from Wikipedia; but I have a spread sheet with a fuller list of over 100 entries, (both adverbs and phrases,) that is a combination of both Wikipedia and Cambridge dictionary examples.
Irvine
For information only, the two rules I have so far are:
<rule id="conjunctive_adverbs_1" name="punctuation_of_conjunctive_adverbs_1">
<pattern>
<marker>
<token regexp='yes'>,</token>
<token postag='RB|RBRR|RBS' postag_regexp='yes'></token>
</marker>
</pattern>
<message>Adverbs, when used as a conjunction between two clauses of a sentence, are normally preceded by a semicolon.</message>
<url>http://en.wikipedia.org/wiki/Conjunctive_adverb</url>
<short>Adverbs, when used as a conjunction between two clauses, are normally preceded be preceded by a semicolon.</short>
<example type='incorrect'>He can leap tall buildings in a single bound<marker>, furthermore</marker>, Dwight Schrute is a hog.</example>
<example type='correct'>He can leap tall buildings in a single bound; furthermore, Dwight Schrute is a hog.</example>
<example type='incorrect'>Oh, there's a butterfly.</example>
<example type='correct'>Oh. There's a butterfly.</example>
</rule>
<rule id="conjunctive_adverbs_2" name="punctuation_of_conjunctive_adverbs_2">
<pattern>
<marker>
<token>,</token>
<token regexp='yes'>accordingly|additionally|again|almost|although|anyway|besides|certainly|comparatively|consequently|contrarily|conversely|elsewhere|equally|eventually|finally|further|furthermore|hence|henceforth|however|incidentally|indeed|instead|likewise|meanwhile|moreover|namely|nevertheless|next|nonetheless|notably|now|otherwise|rather|similarly|still|subsequently|then|thereafter|therefore|thus|undoubtedly|uniquely</token>
</marker>
</pattern>
<message>Conjunctive adverbs should be preceded by a semicolon.</message>
<url>http://en.wikipedia.org/wiki/Conjunctive_adverb</url>
<short>Conjunctive adverbs should be preceded by a semicolon.</short>
<example type='incorrect'>He can leap tall buildings in a single bound<marker>, furthermore</marker>, Dwight Schrute is a hog.</example>
<example type='correct'>He can leap tall buildings in a single bound; furthermore, Dwight Schrute is a hog.</example>
</rule>
They are not finished yet, nor tested in my Openoffice instalation and still require supplementary rules to catch missing termination commas.