In English, contractions like “doesn’t” or “Harper’s” were tokenized this way:
<token>doesn</token> <token>'</token> <token>t</token>
<token>Harper</token> <token>'</token> <token>s</token>
Linguistically, this tokenization doesn’t make sense, and it makes sentence analysis and rule creation more difficult.
Starting today, the tokenization will be different. See for example:
does[do/VBZ]n’t[not/RB]
Harper[Harper/NNP,harper/NN]'s['s/POS]
It[it/PRP]'s[be/VBZ] good[good/JJ,good/NN:U]
“Can’t” and “won’t” are special cases:
ca[can/MD]n’t[not/RB]
wo[will/MD]n’t[not/RB]
Now, you can write patterns like these, valid for “doesn’t” and “does not” at the same time:
<pattern> <token>does</token> <token regexp="yes">not|n't</token> </pattern>
Or:
<pattern> <token regexp="yes">it|he|she</token> <token regexp="yes">is|'s</token> </pattern>
These rules are written only with straight apostrophes, and they will match both straight (typewriter) and curly (typographical) apostrophes.