I am looking for the opposite of ‘filterall’, not reducing all postags to the ones specified, but leaving all, except the ones specified.
This to remove all unlikely and impossible pattern combinations from the tokens.
So when there is only one possible token for the word, it should be kept.
Example: ‘de man loopt’ has postag combinations:
de: determinator plural, determinator singular
man: singular noun, 1st person of verb
loopt: 3d person of verb
determinator singular, 1st person of verb
determinator plural, 1st person of verb
1st person of verb, 3d person of verb
This should leave as valid:
determintor singular, singular noun, 3d person of verb
Can this be done, and if so: how?