I have developed a simple rule that takes the compound file (just like our CompoundRule) but checks for n-dashes and m-dashes used instead of a hyphen, for example:
Papua — Nowa Gwinea
I am not sure if such behavior is fine for other languages. But if it is, the DashRule I’ve written can be very easily customized. It just takes the compound file and creates PatternRules that check for the pattern in question. Definitely, this makes development much easier (instead of writing these rules in the grammar file) and doesn’t really slow down LT as these rules are very simple.