Back to LanguageTool Homepage - Privacy - Imprint

Is this a testrules.bat error or not?


(Irvine) #1

I am running testrules.bat against a set of punctuation rules and cannot really see what its problem is.

Here is the rule:

<rule id="Conjunctions-12--SENT_START--Because" name="Conjunctions R12 PoS analysis: Start sentence with Because: need two clauses">    
			<!-- Okay, does it start with Because?-->
			<pattern>
				<token postag='SENT_START'></token>
				<token >Because</token>
				<marker>
					<token postag='SENT_END' skip="-1"></token>
				</marker>
					<!-- Goto end and skip back to see if it is has two clauses?-->
				<and>
					<token postag=',|:' postag_regexp="yes"></token>
					<token postag='CC|IN' postag_regexp="yes">
						<exception>Because</exception>
					</token>
				</and>
			</pattern>
			<message>
				Conjunctions R12: The sentence is a fragment, it starts with Because. Has only one punctuated clause, and no coordinating or subordinating conjunction. Basically: "Because something. What?"
			</message>
			<url>http://grammar.ccc.commnet.edu/grammar/conjunctions.htm</url>
			<short>
				CC R12: Sentence is a fragment?
			</short>
			<example type='incorrect'>
				Because the dog likes to bark<marker>.</marker>
			</example>
			<example type='correct'>
					Because the dog likes to bark, it annoys the neighbours.
				</example>
		</rule>

And here is the output from testrules.bat:

Running XML validation for en/grammar.xml...
Running pattern rule tests for English... Exception in thread "main" junit.framework.AssertionFailedError: English rule Conjunctions-12--SENT_START--Because:
"Because the dog likes to bark."
Errors expected: 1
Errors found : 0
Message:
Conjunctions R12: The sentence is a fragment, it starts with Because. Has only has one punctuated clause, and no coordinating or subordinating conjunction. Basically: "Because something, what?"

Analyzed token readings: [/SENT_START*] Because[because/CC*,B-SBAR] [ /null*] the[the/DT,B-NP-plural] [ /null*] dog[dog/NN,E-NP-plural] [ /null*] likes[like/NNS,like/VBZ,B-VP] [ /null*] to[to/IN,to/TO,I-VP] [ /null*] bark[bark/NN:UN,bark/VB,bark/VBP,I-VP] .[./.,./SENT_END,O]
Matches: []
at junit.framework.Assert.fail(Assert.java:57)
at junit.framework.TestCase.fail(TestCase.java:227)
at org.languagetool.rules.patterns.PatternRuleTest.testBadSentences(PatternRuleTest.java:290)
at org.languagetool.rules.patterns.PatternRuleTest.testGrammarRulesFromXML(PatternRuleTest.java:237)
at org.languagetool.rules.patterns.PatternRuleTest.runTestForLanguage(PatternRuleTest.java:173)
at org.languagetool.rules.patterns.PatternRuleTest.runGrammarRulesFromXmlTestIgnoringLanguages(PatternRuleTest.java:142)
at org.languagetool.rules.patterns.PatternRuleTest.main(PatternRuleTest.java:500)
Running disambiguator rule tests...
Running disambiguation tests for English...
407 rules tested.
Tests successful.
Running XML bitext pattern tests...
Tests successful.
Validating false-friends.xml...
Validation successfully finished.

I was going to mark the entire sentence, but given how particular the pattern matching has proven to be, I am now concentrating on just marking the bad termination. Without, in this case, any success, I hasten to add.

Irvine


(Daniel Naber) #2

You have SENT_END as the third token, but then other tokens follow. This cannot match, there's never anything after SENT_END.


(Irvine) #3

Am I misunderstanding something, I thought with skip="-1", it went backwards?


(Daniel Naber) #4

Skip skips, i.e. it "ignores" the following maximum x tokens. For example, skip="2" will ignore up to two following tokens.


(Irvine) #5

My apologies, I think I understand. A negative value just means go forward and the correct rule is:

		<!-- Okay, does it start with Because and have a CC or SC?-->
		<antipattern>
			<token postag='SENT_START'></token>
			<token skip="-1">Because</token>
			<and>
				<token postag=',|:' postag_regexp="yes"></token>
				<token postag='CC|IN' postag_regexp="yes"></token>
			</and>
		</antipattern>
		<!-- No then does it still start with Because -->
		<pattern>
			<marker>
				<token postag='SENT_START'></token>
				<token skip="-1">Because</token>
				<token postag='SENT_END'></token>
			</marker>
		</pattern>