Reverse postag lookup (please help)

(Ruud Baars) #1

I converted all postags to the : format for Dutch. Postag dictionary seems to work fine.
But reverse lookup for tags is a problem. None of those seem to be working anymore.

One of the errors is:

Dutch: Incorrect suggestions: [ergert zich aan] != [(ergeren) zich aan] for rule IRRITEERT_ZICH[1] on input: Hij irriteert zich aan deze fout. expected:< [ergert zich aan] > but was:< [(ergeren) zich aan] >

This is the complete rule:

< rulegroup id=“IRRITEERT_ZICH” name=“irriteert zich etc”>
< rule>
< pattern>
< token inflected=“yes”>irriteren< /token>
< token regexp=“yes”>zich|me< /token>
< token>aan< /token>
< /pattern>
< message>U bedoelt vast: < suggestion>< match no=“1” postag=“WKW.*” postag_regexp=“yes”>ergeren< /match> < match no=“2”/> < match no=“3”/>< /suggestion>?< /message>
< url>< /url>
< example correction=“ergert zich aan”>Hij < marker >irriteert zich aan< /marker> deze fout.< /example>
< /rule>
< /rulegroup>

This means the reverse lookup of the postag of ‘irriteert’ does not function.
In the dictionary input file are present:
irriteert< tab>irriteren< tab>WKW:TGW:3EP
ergert< tab>ergeren< tab>WKW:TGW:3EP

Can anyone tell me what is wrong here?

(Ruud Baars) #2

I already tried this:

  • last nights version
  • use a ; as separator instead of the +
  • dumped the generated dictionary (looks fine).

Did anyone using : in tags recently rebuild the reverse dictionary?

(Daniel Naber) #3

I understand you have made all changes only locally so far? Could you upload the new dictionaries somewhere (i.e. not to git, but to some dropbox or whatever)?

(Ruud Baars) #4

The entire set with all files is here:

It is the entire set, including the shell script that generates the dictionaries (

(Ruud Baars) #5

Somehow, the reverse dictionary is significantly larger than the normal one. To me, that appears strange…

(Ruud Baars) #6

For your info: changing the separator chars in the tags does not make any difference.

(Daniel Naber) #7

dutch_tags.txt also needs to be updated. It’s a list with all tags. I’m attaching an updated file, created with awk '{print $3}' dictionary.dump | sort | uniq.

(Daniel Naber) #8

The first attachment was buggy, please use this one: (511 Bytes)

(Ruud Baars) #9

Okay, I will put it in its place. But it generates the question why it is not put there when the -o is in the command… Never known it is any other than documentation.
But thanks a lot. It helps. Now I can start editing rules to pass the tests again.

(Daniel Naber) #10

It is written, but to a temp file (in /tmp on Linux). I’ll open an issue about this. => #746