Back to LanguageTool Homepage - Privacy - Imprint

Is this a testrules.bat error or not?

(Irvine) #1

I am running testrules.bat against a set of punctuation rules and cannot really see what its problem is.

Here is the rule:

<rule id="Conjunctions-12--SENT_START--Because" name="Conjunctions R12 PoS analysis: Start sentence with Because: need two clauses">    
			<!-- Okay, does it start with Because?-->
				<token postag='SENT_START'></token>
				<token >Because</token>
					<token postag='SENT_END' skip="-1"></token>
					<!-- Goto end and skip back to see if it is has two clauses?-->
					<token postag=',|:' postag_regexp="yes"></token>
					<token postag='CC|IN' postag_regexp="yes">
				Conjunctions R12: The sentence is a fragment, it starts with Because. Has only one punctuated clause, and no coordinating or subordinating conjunction. Basically: "Because something. What?"
				CC R12: Sentence is a fragment?
			<example type='incorrect'>
				Because the dog likes to bark<marker>.</marker>
			<example type='correct'>
					Because the dog likes to bark, it annoys the neighbours.

And here is the output from testrules.bat:

Running XML validation for en/grammar.xml…
Running pattern rule tests for English… Exception in thread “main” junit.framework.AssertionFailedError: English rule Conjunctions-12–SENT_START–Because:
“Because the dog likes to bark.”
Errors expected: 1
Errors found : 0
Conjunctions R12: The sentence is a fragment, it starts with Because. Has only has one punctuated clause, and no coordinating or subordinating conjunction. Basically: “Because something, what?”

Analyzed token readings: [/SENT_START*] Because[because/CC*,B-SBAR] [ /null*] the[the/DT,B-NP-plural] [ /null*] dog[dog/NN,E-NP-plural] [ /null*] likes[like/NNS,like/VBZ,B-VP] [ /null*] to[to/IN,to/TO,I-VP] [ /null*] bark[bark/NN:UN,bark/VB,bark/VBP,I-VP] .[./.,./SENT_END,O]
Matches: []
at org.languagetool.rules.patterns.PatternRuleTest.testBadSentences(
at org.languagetool.rules.patterns.PatternRuleTest.testGrammarRulesFromXML(
at org.languagetool.rules.patterns.PatternRuleTest.runTestForLanguage(
at org.languagetool.rules.patterns.PatternRuleTest.runGrammarRulesFromXmlTestIgnoringLanguages(
at org.languagetool.rules.patterns.PatternRuleTest.main(
Running disambiguator rule tests…
Running disambiguation tests for English…
407 rules tested.
Tests successful.
Running XML bitext pattern tests…
Tests successful.
Validating false-friends.xml…
Validation successfully finished.

I was going to mark the entire sentence, but given how particular the pattern matching has proven to be, I am now concentrating on just marking the bad termination. Without, in this case, any success, I hasten to add.


(Daniel Naber) #2

You have SENT_END as the third token, but then other tokens follow. This cannot match, there’s never anything after SENT_END.

(Irvine) #3

Am I misunderstanding something, I thought with skip="-1", it went backwards?

(Daniel Naber) #4

Skip skips, i.e. it “ignores” the following maximum x tokens. For example, skip=“2” will ignore up to two following tokens.

(Irvine) #5

My apologies, I think I understand. A negative value just means go forward and the correct rule is:

		<!-- Okay, does it start with Because and have a CC or SC?-->
			<token postag='SENT_START'></token>
			<token skip="-1">Because</token>
				<token postag=',|:' postag_regexp="yes"></token>
				<token postag='CC|IN' postag_regexp="yes"></token>
		<!-- No then does it still start with Because -->
				<token postag='SENT_START'></token>
				<token skip="-1">Because</token>
				<token postag='SENT_END'></token>