Reverse postag lookup (please help)

I converted all postags to the : format for Dutch. Postag dictionary seems to work fine.
But reverse lookup for tags is a problem. None of those seem to be working anymore.

One of the errors is:

Dutch: Incorrect suggestions: [ergert zich aan] != [(ergeren) zich aan] for rule IRRITEERT_ZICH[1] on input: Hij irriteert zich aan deze fout. expected:< [ergert zich aan] > but was:< [(ergeren) zich aan] >

This is the complete rule:

< rulegroup id=“IRRITEERT_ZICH” name=“irriteert zich etc”>
< rule>
< pattern>
< token inflected=“yes”>irriteren< /token>
< token regexp=“yes”>zich|me< /token>
< token>aan< /token>
< /pattern>
< message>U bedoelt vast: < suggestion>< match no=“1” postag=“WKW.*” postag_regexp=“yes”>ergeren< /match> < match no=“2”/> < match no=“3”/>< /suggestion>?< /message>
< url>https://onzetaal.nl/taaladvies/advies/ik-irriteer-erger-me-aan-haar< /url>
< example correction=“ergert zich aan”>Hij < marker >irriteert zich aan< /marker> deze fout.< /example>
< /rule>
< /rulegroup>

This means the reverse lookup of the postag of ‘irriteert’ does not function.
In the dictionary input file are present:
irriteert< tab>irriteren< tab>WKW:TGW:3EP
ergert< tab>ergeren< tab>WKW:TGW:3EP

Can anyone tell me what is wrong here?

I already tried this:

  • last nights version
  • use a ; as separator instead of the +
  • dumped the generated dictionary (looks fine).

Did anyone using : in tags recently rebuild the reverse dictionary?

I understand you have made all changes only locally so far? Could you upload the new dictionaries somewhere (i.e. not to git, but to some dropbox or whatever)?

The entire set with all files is here:
taaltik.xs4all.nl/LT/LanguageTool-current.zip

It is the entire set, including the shell script that generates the dictionaries (redo_dictionaries.sh)

Somehow, the reverse dictionary is significantly larger than the normal one. To me, that appears strange…

For your info: changing the separator chars in the tags does not make any difference.

dutch_tags.txt also needs to be updated. It’s a list with all tags. I’m attaching an updated file, created with awk '{print $3}' dictionary.dump | sort | uniq.

(attachment removed)

The first attachment was buggy, please use this one:

dutch_tags.txt.zip (511 Bytes)

Okay, I will put it in its place. But it generates the question why it is not put there when the -o is in the command… Never known it is any other than documentation.
But thanks a lot. It helps. Now I can start editing rules to pass the tests again.

It is written, but to a temp file (in /tmp on Linux). I’ll open an issue about this. => #746