At least for Dutch, sentences are cut using SRX so that sentences are not split after abbreviations that have a full stop.
But that is not true for tokenization; etc. is split into 2 tokens.
Of course, this could be different for language, but I assume it is commonly done like this.
But why? There is no use of the period as a token by itself. And the postag of the abbreviation (when applicable) has to be based upon the word plus the full stop.
max : wrong
max. : correct, maximal; adjective
Max: correct : proper name
How do other rule developers deal with this?