Negate SENT_START

starkadur · November 25, 2015, 11:08am

I am trying to create a rule (for Icelandic) that looks for the word “Breskur” in a sentence, excluding it if it’s at the start of a sentence. But my rule is detecting the word Breskur both in “Breskur maður…” and “Maður er Breskur…”

<rule...>    
      <token postag="SENT_START" negate_pos="yes">Breskur</token>
 </rule>

The Icelandic language has not been tagged for the Language Tool, but should postag=“SENT_START” not work?

dnaber · November 25, 2015, 11:51am

I know it’s not necessarily logical, but the sentence start tag is its own token (unlike the sentence end tag). So

<token postag="SENT_START" negate_pos="yes"/>
<token>Breskur</token>

should work (not tested). You can use http://community.languagetool.org/analysis/index?lang=is to see LT’s internal analysis.

starkadur · December 1, 2015, 5:17pm

Thanks a lot. This works!