Back to LanguageTool Homepage - Privacy - Imprint

Token count mismatch?


(Ruud Baars) #1
					<token postag="SENT_START"/>
					<token/>
					<token/>
					<token postag="SENT_END" regexp="yes">[.!?]</token>

results in
Wake up !
to be matched.
from
... waar slechts één taal gesproken wordt. Wake up!

So the sent start token is a space. Why?


(Daniel Naber) #2

I'm not sure I understand your issue. What would you want to match and what should not match? You can use https://community.languagetool.org/analysis/index?lang=nl to see how a sentence is analyzed internally.


(Ruud Baars) #3

I specified 4 tokens. The matched sentence has just 4. That is the issue.


(Jan Schreiber) #4

<token postag="SENT_START"/> does not correspond to anything visible, in particular it is not the first word. It's a little confusing.


(Ruud Baars) #5

I can live with that.