How do you make a hyphen into a token

Hello

I want to write a rule for finding compound nouns with hyphens like ‘flower-pot’

So far I have this rule which can find ‘flower pot’ and it works

<rule name="Hyphen check">    
 
    <rule id="HYPEN_OR_NOT" name="Hyphen or not">    
     <pattern>
     <token postag="NN|NNS|NN:U|NN:UN|" postag_regexp="yes"></token>
     <token postag="NN|NNS|NN:U|NN:UN|" postag_regexp="yes"></token>
     </pattern>
    <message>Found</message>
    </rule>

</category>

But I don’t know how to add a segment for the hyphen. I’ve tried all sorts of things.

Can anyone help to expand this rule to find ‘flower-pot’ ?

Thanks

On Sa 17.11.2012, 08:08:36 you wrote:

I want to write a rule for finding compound nouns with hyphens like
‘flower-pot’

You can use this to find words with hyphens:

.+-.+

It’s not possible to check the POS tags of the first and second part though
(without programming at least), as this is considered a single word by LT.

Regards
Daniel


http://www.danielnaber.de

Hello

Can I have one more question tonite.

Going back to the first rule

Found

I noticed it found things like

‘take shelter’ , make money

Normally, take and make are verbs although they can be nouns so eg. to be on the take

But it also finds

‘in house’ and there is practically no way ‘in’ can be a noun

So why is it finding all these combinations which are not really noun + noun?

Thanks

On Sa 17.11.2012, 12:55:55 you wrote:

So why is it finding all these combinations which are not really noun +
noun?

Because guessing the word’s part-of-speech is error-prone, so we return all
readings. A disambiguator can sometimes be used to remove the invalid
readings (Developing a Disambiguator - LanguageTool Wiki).

Regards
Daniel


http://www.danielnaber.de