Back to LanguageTool Homepage - Privacy - Imprint

Suggesting a different form of a word


(Jan Schreiber) #1


I'm trying to write a rule that matches a verb in 3rd person singular (in whatever tense) and suggests the plural, but all the other grammatical properties of the matched verb form should carry over to the replacement. Here is what I have so far:




<rulegroup id="USA_PLURAL" name="Grammatik: 'Die USA ist (sind)' etc. (falscher Singular)">    
            <!-- This should be covered by DE_SUBJECT_VERB_AGREEMENT but apparently isn't. -->
            <!-- Perhaps the tagger dict isn't 100 % correct. -->
                <rule>
                    <pattern>
                        <token postag="SENT_START"/>
                        <token>Die</token>
                        <token regexp="yes" case_sensitive="yes">USA|UN|AGB|SBB</token>
                        <marker>
                            <token postag="VER:([A-Z]{3}:)?3:SIN:.+" postag_regexp="yes"><exception postag="VER:([A-Z]{3}:)?3:PLU:.+" postag_regexp="yes"/></token>
                        </marker>
                    </pattern>
                    <message>Müsste dieses Verb im Plural stehen? Stehen Kurzwörter wie 'USA', 'UN', 'AGB' als Subjekt im Satz, muss das dazugehörige Verb im Plural stehen.</message>
                    <suggestion><match no="4" postag="VER:([A-Z]{3}:)?3:PLU:.+" postag_regexp="yes"/></suggestion>
                    <url>http://www.duden.de/sprachwissen/sprachratgeber/usa-un-und-sbb</url>
                    <short>Müsste dieses Verb im Plural stehen?</short>
                    <example correction="haben|hatten|hätten">Die USA <marker>hat</marker> eine besondere Verantwortung.</example>
                    <example correction="tragen|trugen|trügen">Die USA <marker>trägt</marker> eine besondere Verantwortung.</example>
                </rule>

        </rulegroup>

It's very close to what I want, but I would like to remove the 'hatten' and 'hätten' from the suggestions. Is there a way to reference the POS tags of the word that actually matched, and just replace SIN with PLU?

Also, why isn't the example sentence covered by DE_SUBJECT_VERB_AGREEMENT?


(Jan Schreiber) #2

As it turns out, this is what I get when I tag 'USA':

USA/SUB:NOM:PLU:FEM

USA/SUB:AKK:PLU:FEM
USA/SUB:AKK:SIN:FEM
USA/SUB:NOM:SIN:FEM
USA/SUB:AKK:PLU:NOG
USA/SUB:NOM:PLU:NOG

I've always taken it for granted that 'USA' is a plural masculine, because 'state' translates to 'der Staat'. IMO, the correct and complete tags would be:

USA/SUB:NOM:PLU:MAS

USA/SUB:GEN:PLU:MAS
USA/SUB:DAT:PLU:MAS
USA/SUB:AKK:PLU:MAS

or perhaps

USA/SUB:NOM:PLU:NOG

USA/SUB:GEN:PLU:NOG
USA/SUB:DAT:PLU:NOG
USA/SUB:AKK:PLU:NOG

which would leave the question of gender open.

At least one person over at Duden seems to think along the same lines:
http://www.duden.de/sprachwissen/sprachratgeber/usa-un-und-sbb


(Daniel Naber) #3

"postag_regexp" and "postag_replace" should do that, maybe like (not tested):

postag_regexp="SIN" postag_replace="PLU"

(Daniel Naber) #4

"UN" also has the ":NOG" tags, I think that's the correct way. Actually "USA" is already in added.txt and could be changed to "...:NOG" I guess.


(Jan Schreiber) #5

Thanks, Daniel, I will try "postag_replace".

Do you agree that the entries that flag USA as SIN:FEM should be removed from german.dict? According to Duden 'USA' is plural-only ("Pluraletantum"). The problem is I don't know how to extract the dictionary.


(Daniel Naber) #6

Yes, but we don't want to touch the binary file for all the small errors we find. Instead, feel free to create a file "remove.txt" that is the opposite of added.txt. It needs some code to become active, I can put that on my TODO list.


(Jan Schreiber) #7

When I use 'postag_regexp="SIN"' I get:

cvc-enumeration-valid: Wert "SIN" ist nicht Facet-gültig in Bezug auf Enumeration "[yes, no]". Er muss ein Wert aus der Enumeration sein. Problem found at line 24772, column 85.

That makes sense, so I tried

<match no="4" postag="SIN" postag_replace="PLU"/>

but this gives the following error:

Exception in thread "main" junit.framework.AssertionFailedError: German: Incorrect suggestions: [haben] != [] for rule USA_PLURAL[1]:[/SENT_START, Die, USA|UN|AGB|SBB, /VER:([A-Z]{3}:)?:SIN:.+/exceptions=[/VER:([A-Z]{3}:)?3:PLU:.+]]:Grammatik: 'Die USA ist (sind)' etc. (falscher Singular) on input: Die USA hat eine besondere Verantwortung. expected:<[haben]> but was:<[]>

I'm stuck.


(Jan Schreiber) #8

I think I found something on the LT wiki.
For the record, this is what eventually worked:

<match no="4" postag="(.*)SIN(.*)" postag_regexp="yes" postag_replace="$1PLU$2"/>

It is actually the most straightforward solution when you think of it: The two (.*) match everything except the SIN, and $1 and $2 make sure everything from the postag of the match is preserved, except the SIN. A very powerful tool.