Thank you so much for the example. I was not able to get it to work. Below is what I tried and was hoping you could look it over.
<rule default="off" id="NOMINALIZATION" name="Nominalization">
args="no:1 regexp:([a-zA-Z]+)(?:ability|abilities|able|ably|ation|ations|ible|ibly|ment|ments)\b postag_regexp:VB."
This is a nominalization.
- First I took out the message and the filter. The idea was to understand what you are saying.
- This works as expected as the word ‘evaluation’ is
- Now, if I understand Class PartialPosTagFilter, it needs:
- no: token postion
- the regexp in question
- and the postag_regexp
- So the idea is, filter the matches from the original pattern to only show where the part of the token has the required tag. Note i changed the postag_regexp you had from VB to VB. - but i don’t think that makes a huge difference. This did not work.
- How do I tell why it did not work? That is, how do I see the partial pos which the filter is evaluating? I think this did not work as there is nothing which says look at the inflected version… Looking at dictionary.dump. I tried adding
inflected="yes" to the token tag but no luck.
evaluable evaluable JJ
evaluate evaluate VB
evaluate evaluate VBP
evaluated evaluate VBD
evaluated evaluate VBN
evaluates evaluate VBZ
evaluating evaluate VBG
evaluation evaluation NN:UN
evaluationally evaluationally RB
evaluations evaluation NNS
evaluative evaluative JJ
evaluator evaluator NN
evaluators evaluator NNS
- The other thing I don’t quite understand is the argument to the EnglishPartialPosTagFilter. How is no:1 splitting out to the relevant portion of the regexp, i.e. given the documentation suggests that the partialpostagger looks at the first (.*), why bother telling it the position of the token, unless it is for multiple tokens?
regexp: the regular expression to specify the part of the token to be considered. For example, (?:in|un)(.*) will consider the part of the token that comes after 'in' or 'un'. Note that always the first group is considered, so if you need more parenthesis you need to use non-capturing groups (?:...), as in the example.
- So I tried below, but this did not work either. I’m pretty sure it is just setting up the regexp correctly, but not having a lot of luck.
Thanks so much for your help. It is very much appreciated. I don’t know where you live, but if you’re in Sydney anytime, will definitely buy you beer.