I would like to write some advanced typographical rules for checking the white space used in some contexts.
To make the implementation usable by different languages, we should store the “white space character before” at the same time we store if “isWhitespaceBefore” in each token. A shared rule filter could be written that checks the white space character before, and each language could write its own XML rules that take advantage of the filter.
There are a few other contexts in which this kind of rules would be used. It will be very useful specially in French. See: Espace insécable — Wikipédia