(Nina) #1

Will there be a chunk_regexp (similar to existing postag_regexp) feature available to write tokens in a XML rule pattern?

(Daniel Naber) #2

Well, if it’s needed we’ll need to introduce it… On the other hand, a
chunk has only three possible values so far and using regular
expressions looks a bit like overkill. We’ll also introduce with
the upcoming version, so you should be able to express what you need
with that. Could you try that?

(Nina) #3

That would be great to have the logical . I can try it, if it is available soon.
Also, I came across another issue with postag_regexp; If there are multiple tags (pos or chunk) associated with a token then writing a regex to apply to the multiple strings separated by spaces if a challenge.

(Daniel Naber) #4

is already available in the current snapshots
(;O=D). So if you have
the chunk attribute, there should also be .

(Nina) #5

Example of using with chunks … would it be something like?

         <token chunk="E-NP-singular" or chunk="E-NP-plural"><exception postag="NNPS"/></token>

(Daniel Naber) #6

It should be like this:

(Nina) #7


(Nina) #8

did not work within …

Getting following error.

cvc-complex-type.2.4.a: Invalid content was found starting with
element ‘or’. One of ‘{unify, and, token, includephrases}’ is

I guess is supported only in the