Noun/verb disambiguation problem

Hello, I’m trying to write a rule that will flag when the word “licence” appears as a verb, and only a verb. I have written this rule:

<token postag="VB|VBD|VBG|VBN|VBP|VBZ" postag_regexp="yes">
    	<exception negate_pos="yes" postag="VB|VBD|VBG|VBN|VBP|VBZ" postag_regexp="yes"/>licence</token>

But the word is being flagged in this phrase:

A permanent US licence.

If I run this phrase in the analyse Text feature, it says that ‘licence’ could be NN:UN or VB or VBP, and so I would expect the rule to not flag it, because it could be a noun. I wondered if anyone could explain what is going wrong and how to fix it.

Hi @Maximum,

I don’t know what the rest of the rule looks like, but in that token, I wonder whether the problem is maybe “licence” being in the exception string, when it should be in the token string (i.e. before the exception).

This will only work in such cases where “licence” is correctly disambiguated. Try using the chunker instead. (more info here)

If you look for such contexts where “licence” is part of a VP (verb phrase), it should work better.

Also, VB|VBD|VBG|VBN|VBP|VBZ can be simplified to V.*.

This is a chunker approach, you can add an exception for N.*, but it might cost you a lot of true positives:

<token postag="V.*" postag_regexp="yes" chunk_re="[BI]-VP.*">licence</token>

Thanks for this. I’ll try the chunking approach. I’ve moved ‘licence’ out of the exception string but still having the same issue. Here is the entire revised rule for reference:

<rule> <!-- licence as verb -->
<pattern>
<token postag="VB|VBD|VBG|VBN|VBP|VBZ" postag_regexp="yes">licence
    	<exception negate_pos="yes" postag="VB|VBD|VBG|VBN|VBP|VBZ" postag_regexp="yes"/></token>
</pattern>

<message><match no="1"/>: In British English, licence is the noun and **<suggestion>license</suggestion>** the verb. So you need a licence to run a licensed bar, or you may need to visit the off-licence.</message>
</rule>

I was using this rule as a starting point, from the “tips and tricks” page:

<token postag="tag1|tag2" postag_regexp="yes">
    	<exception negate_pos="yes" postag="tag1|tag2" postag_regexp="yes"/>
    </token>

Which, it says, should only flag if it has one of the two POS tags. I’m still interested as to why it won’t work on my example above, though (which has more POS tags, but surely the same principle applies).

The rule works fine in the rule editor:

<rule> <!-- license as verb -->
    <pattern>
    <token postag="VB|VBD|VBG|VBN|VBP|VBZ" postag_regexp="yes">licence
    	<exception negate_pos="yes" postag="VB|VBD|VBG|VBN|VBP|VBZ" postag_regexp="yes"/>
	    </token>
    </pattern>
    <message><match no="1"/>: In British English, 'licence' is a noun and <suggestion>license</suggestion> a verb. So you need a licence to run a licensed bar, or you may need to visit the off-licence.</message>
    <example correction="license">She will not <marker>licence</marker> my business.</example>
    <example>A permanent US licence.</example><!-- incomplete sentence, thus not a good example -->
</rule>

That’s interesting… thanks. I’ll take another look. It’s misfiring in LT 5.1 in standalone Java, but I’ll try it in another environment.

So just trying it in a different environment, it seems as if the issue is that there is something particular about having the term ‘US’ in front of ‘licence’. Other words in front of licence don’t trigger it. I’m guessing because US and British uses are different, that is confusing it somehow.