<token postag="SENT_END" regexp="yes">[.!?]</token>
results in Wake up !to be matched.from ... waar slechts één taal gesproken wordt. Wake up!
So the sent start token is a space. Why?
I'm not sure I understand your issue. What would you want to match and what should not match? You can use https://community.languagetool.org/analysis/index?lang=nl to see how a sentence is analyzed internally.
I specified 4 tokens. The matched sentence has just 4. That is the issue.
<token postag="SENT_START"/> does not correspond to anything visible, in particular it is not the first word. It's a little confusing.
I can live with that.