Disambiguation testing

I produced a list of common words that are ambiguous as far as postags go. I added the list as a comment in the disambiguation.xml for now, planning to tackle them one by one (or by type if possible) now antipatterns are there.

I need tests however, to be sure not to introduce more confusion. How can I add examples etc for testing to disambiguation rules?

Please see en/disambiguation.xml for examples. You can check type="ambiguous" for changes and type="untouched" for cases where the disambiguation pattern doesn’t match or doesn’t change anything.

Thanks. Does ‘untouched’ mean ‘by all disambiguation’ or ‘by this rule’? Latter would be most helpful.

I think “by this rule”, but please give it a try, I’m not 100% sure.

It is a lot of work to get examples from the corpus, change it into valid testing examples for the disambig rule. The format is very different from a tagged sentence.

Secondly, since rules are cascading, there is an error for the ‘incoming’ pattern when an earlier rule did something to the tags. That makes the testing of rules even harder.

Is there a way to test a disambiguation rule on live data a bit easier?