I looked into improving MultiwordChunker but I don’t know how all the languages are using this chunker so it was safer to create another class MultiworkChunker2 where I implemented new features requested:
- tagging all tokens, not just first and last
- allow to modify how tag is formatted (by default all tokens get )
- allow to remove all other readings
I’ve also added unit tests for both chunker classes. We could merge those two together at some point if there’s agreement on this.
Jaume, please try it out and let me know how it works for you.