Back to LanguageTool Homepage - Privacy - Imprint

How do you make a hyphen into a token


(SafeTex) #1

Hello

I want to write a rule for finding compound nouns with hyphens like 'flower-pot'

So far I have this rule which can find 'flower pot' and it works

<rule name="Hyphen check">


    <rule id="HYPEN_OR_NOT" name="Hyphen or not">    
     <pattern>
     <token postag="NN|NNS|NN:U|NN:UN|" postag_regexp="yes"></token>
     <token postag="NN|NNS|NN:U|NN:UN|" postag_regexp="yes"></token>
     </pattern>
    <message>Found</message>
    </rule>

</category>

But I don't know how to add a segment for the hyphen. I've tried all sorts of things.

Can anyone help to expand this rule to find 'flower-pot' ?

Thanks


(Daniel Naber) #2

On Sa 17.11.2012, 08:08:36 you wrote:

I want to write a rule for finding compound nouns with hyphens like
'flower-pot'

You can use this to find words with hyphens:


.+-.+

It's not possible to check the POS tags of the first and second part though
(without programming at least), as this is considered a single word by LT.

Regards
Daniel

--
http://www.danielnaber.de


(SafeTex) #3

Hello

Can I have one more question tonite.

Going back to the first rule








Found

I noticed it found things like

'take shelter' , make money

Normally, take and make are verbs although they can be nouns so eg. to be on the take

But it also finds

'in house' and there is practically no way 'in' can be a noun

So why is it finding all these combinations which are not really noun + noun?

Thanks


(Daniel Naber) #4

On Sa 17.11.2012, 12:55:55 you wrote:

So why is it finding all these combinations which are not really noun +
noun?

Because guessing the word's part-of-speech is error-prone, so we return all
readings. A disambiguator can sometimes be used to remove the invalid
readings (http://languagetool.wikidot.com/developing-a-disambiguator).

Regards
Daniel

--
http://www.danielnaber.de