Back to LanguageTool Homepage - Privacy - Imprint

Finding different inflection for same word


(Ruud Baars) #1

There are 2 forms of an adjective: AJn: dik, and AJe: dikke
doe not result to 'dik' in a suggestion. What am I doing wrong?


(Lodewijk Arie van Brienen) #2

It's most likely that 'dikke' already makes it a valid sentence. (and dik can only be used as part of a compound modifier, not on it's own)
EG:
"De dik man" invalid
"De dikke man" he's fat
"De dik behaarde man." he has a thick bush of hair.
"De dikke behaarde man." he's fat and has a bush of hair.
"De dikke dik behaarde man." he's fat and he has a thick bush of hair

LT can't read our minds about what we intent, and thus has to rely on best guess. and if the sentence already makes sense then it will assume that it's good.

in speech the noun-form is sometimes put before the adjective for emphasis, but this should be avoided otherwise.
"De dik, dikke man." he's oh-so fat
"De geel, gele man." he's oh-so yellow (skinned)


(Ruud Baars) #3

That is not what I was looking for. The issue is 'de dikke kind' is wrong. So I just want to get 'dikke' as the alternative, by reverse lookup using the postag AJe. But I don't seem to get that working...


(Lodewijk Arie van Brienen) #4

this is a 'het/de' issue. you should first check the base 'object' (in this case 'de kind', which should be 'het kind') before checking the adjectives.


(Ruud Baars) #5

That is besides the issue.


(Tiago F. Santos) #6

@Ruud_Baars,
I believe that is also a problem I faced previously while suggesting inflected verbs.

Suggestions do not present all the valid alternatives for the given rule while running testrules.sh, so you have to make examples with that [limited] output, but they do appear on wikicheck, regression tests and on the standalone tool.
Not sure why this happens, but I suspect it is related with some bug in the POS synthetizer encoding/decoding.


(Lodewijk Arie van Brienen) #7

replacing "de dikke kind" with "het dikke kind" seems to be the most logical way to correct it, so how is it besides the issue?


(Jan Schreiber) #8

I'm not sure I fully understand the problem, but assuming that AJe is the intended form and AJn is wrong, I would try the following inside the <suggestion>:
<match no="1" postag="(.*)AJn(.*)" postag_regexp="yes" postag_replace="$1AJe$2"/>
That keeps all the POS tags from the first match, but replaces AJn with AJe.


(Tiago F. Santos) #9
  <message>Frase-feita. Procure alternativas.</message>
    <suggestion><match no='1' postag='V.+'>ser</match> descartado</suggestion>
  <example correction='for descartado'><marker>tomar um chega para lá</marker>.</example>

This is one example. Tomar and ser are both infinitive (VMN0000). The suggestion should be: ser descartado. Testrules.sh suggests: for descartado. for has these POSs: for[for/NCMS000, ir/VMSF1S0, ir/VMSF3S0, ser/VMSF1S0, ser/VMSF3S0]

This sentence is not corrected on stand-alone, though.


(Ruud Baars) #10

That is quite a nuisance if it is a bug.


(Tiago F. Santos) #11

It is. Fortunately, from my experience, it occurs only on rare situations, so there are still benefits injecting POS on other lemmas.
Again, I have not researched this, but I had a similar problem while building a thesaurus.
My guess is that it is related with sort order on the base file. The sorting algorithm of the base list has to match the decoder, or there will be words that are jumped. Maybe the tagger and the synthesizer use different sorting methods.
Note: this are just guesses, so I may be entirely wrong on this.


(jaumeortola) #12

The proper way to do this is, as Jan says:

<suggestion><match no="1" postag="(V.*)" postag_regexp="yes" postag_replace="$1">ser</match> descartado</suggestion>

postag="(V.*)" selects the lemma you want to use (among the different lemmas/postags of the token) and postag_replace="$1" sets the new postag.

The problem is that tomar (the original wrong word) has several postags (VMN0000, VMSF1S0, VMSF3S0). So you need to disambiguate among these postags (in disambiguation.xml). Or someway in the rule:

<suggestion><match no="1" postag="(V.[^S].*)" postag_regexp="yes" postag_replace="$1">ser</match> descartado</suggestion>

If you don't disambiguate, you'll need to write different rules in order to give the proper suggestion every time.


(Ruud Baars) #13

This helped. I don't really understand it, but it works.


(Tiago F. Santos) #14

That seemed redundant to me, but I will try it. This is a really useful tip. Many thanks.

However, a still problem exists. I believe disambiguation.xml does not affect synthesizer results, only the tagger. I tried that before to no avail. That was the reason I rebuilt the POS dictionaries on 3.6.
And this tomar problem does not occur on rules like:

<rule>
  <pattern>
      <token inflected='yes'>tomar</token>
      <token>uma</token>
      <token>decisão</token>
  </pattern>
  <message>Expressão desnecessariamente complexa. Procure alternativas.</message>
    <suggestion><match no='1' postag='V.+'>decidir</match></suggestion>
  <example correction='decidir'><marker>tomar uma decisão</marker>.</example>
</rule>

PS -

Tested momment ago. It is indeed redundant. Similar results to the simplified form I use to use:
<suggestion><match no='1' postag='V.+'>ser</match> descartado</suggestion>


(jaumeortola) #15

As for "dikke/dik", this works for me:

<rule>
  <pattern>
      <token>dikke</token>
  </pattern>
  <message>....</message>
    <suggestion><match no="1" postag="(A.*)" postag_regexp="yes" postag_replace="AJn"></match></suggestion>
  <example correction='dik'>de <marker>dikke</marker> kind.</example>
</rule>

For synthesis, you always need to select a lemma. In this case with postag="(A.*)".


(Ruud Baars) #16

Thanks. Have it working now.