braincell
(Ibslla )
September 15, 2021, 8:29pm
1
I’m working in adding a new language to LT.
My devised POS tags:
FL.* - Verb
EM.* - Noun
ND.* - Determiner
I’m trying to create some disambiguation rules, almost same as in docs
Determiner + VERB/NOUN → NOUN
<rule name="determiner + verb/EM -> EM" id="ND_FL_EM">
<pattern>
<token postag="ND"></token>
<marker>
<and>
<token postag="FL.*" postag_regexp="yes"/>
<token postag="EM.*" postag_regexp="yes"><exception negate_pos="yes" postag_regexp="yes" postag="(FL|EM):.*"/></token>
</and>
</marker>
</pattern>
<disambig postag="EM" />
</rule>
As I understand, the rule will match a token with FL and EM tag(excluding any EM token with readings other than EM.* or FL.*)
<exception negate_pos="yes" postag_regexp="yes" postag="(FL|EM).*"/>
does not seem to work.
If I remove <exception>
, the rule works. Am I missing something?
Thanks
Ruud_Baars
(Ruud Baars)
September 16, 2021, 5:24am
2
Isn’t negate_pos in the exceptoin the issue?
braincell
(Ibslla )
September 16, 2021, 8:04am
3
Shouldn’t that work by excluding all EM tokens with readings other than FL/EM tags?
Ruud_Baars
(Ruud Baars)
September 16, 2021, 9:11am
4
I am having a hard time understanding what you are exactly trying to achieve. You could try to remove the negate from the exception and specify all other tags to test. Double negations are always hard to understand and tricky.
The SENT_END postag could be a problem. Perhaps you need to add it:
<exception negate_pos="yes" postag_regexp="yes" postag="(FL|EM):.*|SENT_END"/>
1 Like
braincell
(Ibslla )
September 16, 2021, 9:40am
6
jaumeortola:
(FL|EM):.*|SENT_END
That seems to be the case. Without SENT_END, disambiguator does not match the word if it’s the last one. Thanks a lot.
braincell
(Ibslla )
September 16, 2021, 9:43am
7
I agree about complexity, it took me time to wrap my head around the negated exception as explained in the docs.