Example without SENT_START impossible?

Ruud_Baars · September 27, 2018, 6:08am

When I want to make a rule that excludes SENT_START as a possible postag, I cannot make the rule to work, since The required example always has a sentence start, and causes error count errors.

Am I missing something?

By the way, why is SENT_START not a property of the first real token, liek SENT_END? That would make things a lot easier…

dnaber · September 27, 2018, 6:48am

I’m not sure I understand - doesn’t every sentence or even sentence fragment always have SENT_START?

It’s just that it has always been this way and now it’s difficult to change because rules rely on it.

Ruud_Baars · September 27, 2018, 7:07am

I know every sentence has a sent_start, but I want to make a match pattern that certainly does not match at the start of the sentence.
Let’s say I want to check every range of two (real) tokens , just ; but I don’t want the sentence start to be matched… How can I make a rule for that? When you try this, the example can not be made…

What I am actually trying to do is get a match for all strings of words, and antipattern away the ‘normal’ ones…

Another strange phenomenon is that a sentence produces different results when entered from the command line in line by line mode, and when sending it to the server.

And antipattern is behaving differently at sent start! An antipattern either including or not including the sent start token is never matched at sent start! This might be a bug…

Ruud_Baars · September 27, 2018, 7:13am

Instead (of extra to) SENT_START and SENT_END there could be SPECIAL=“FIRST_TOKEN” and LAST_TOKEN. And maybe even SPELL_UNKNOWN for words not recognize by the speller routines.

But to be honest, I think the sentence start should never have been seen as a token. It is no token. And SENT_START is not a real postag…