Regexp replace in march

Ruud_Baars · September 30, 2019, 12:46pm

I would like to get the corresponding past tense verb for for a matched verb form.

To be specific, the postags matched could be WKW:TGW:1EP, WKW:TGW:INF, WKW:TGW:3EP, and I want to fetch the matched form where TGW has been replaced by VLT.

regexp_match and regexp_replace seem not to be allowed in this construction. What to do?

dnaber · September 30, 2019, 12:53pm

You mean in <match ..>? regexp_match and regexp_replace should work there. Can you post a complete rule where they don’t work?

Ruud_Baars · September 30, 2019, 2:35pm

They do normally, on tokens. I want to change the postag: I need to look up a different form of the same verb, to create a rule that suggests ‘went’ when the sentence is : ‘yesterday he goes’.
I think I would need 'postag_regexp_match and postag_regexp_replace …

But I split the rule in several subrules, and now at least it works.

Mike_Unwalla · September 30, 2019, 2:51pm

Can you use something like:
<suggestion><match no="2" postag="VBD"/></suggestion>

Refer to Development Overview - LanguageTool Wiki

arysin · September 30, 2019, 2:51pm

<match no="..." postag_regexp="yes" postag="..." postag_replace="..."/>

Ruud_Baars · October 1, 2019, 6:02am

This is what I am trying to do, but it does not work like I expected:
<match no="3" postag_regexp="yes" postag="(.*:)(TGW)(:.*)" postag_replace="$1VLT$3"/>

results in:
Dutch: Incorrect suggestions: [deed] != [(doet)] for rule VERLEDEN_TGWT_PREMIUM[1] on input: Verleden week doet hij het nog! expected:<[deed]> but was:<[(doet)]>

So it does not fetch the root form and the right child from that.

jaumeortola · October 1, 2019, 6:38am

It doesn’t work because doet is tagged in the dictionary as WKW:TGW:3EP and deed is tagged as WKW:VLT:1EP. I don’t know if the dictionary tags are correct or complete. But the synthesizer is working as expected.

Ruud_Baars · October 1, 2019, 9:15am

Blimey. I overlooked that completely. Iĺ try to adjust.
In practice, 3d person and 1st person past tense are equal; to save space I don’t have both. But I am in doubt about that lately; maybe I should also add 2nd person (2 forms) because some verbs (only about 5) are irregular in this.

Ruud_Baars · October 1, 2019, 9:37am

Maybe you can help me find the right solution; I am not that advanced with regexps.
Possible values and transformations are:
WKW:VLT:INF -> WKW:TGW:INF
WKW:VLT:3EP -> WKW:TGW:1EP
WKW:VLT:1EP -> WKW:TGW:1EP

jaumeortola · October 1, 2019, 10:31am

I would change the dictionary or I would just write different rules.

If you have too many rules, you could write something like:
<match no="3" postag_regexp="yes" postag="(.*:)(TGW):.(.*)" postag_replace="$1VLT:.$3"/>

Ruud_Baars · October 1, 2019, 10:34am

Yes, But that does not change 3 into 1. I guess I will have to stick to subrules for now. It is not that bad.