Back to LanguageTool Homepage - Privacy - Imprint

Detecting multiple occurrence of a word


(indiajoe) #1

I have just started exploring this wonderful LanguageTool's Rule editor.
I was wondering, is it possible to write rules which will detect multiple occurrence of a word in a sentence?
And also detect starting of two consecutive sentences using the same word?


(Daniel Naber) #2

You can do it like this if you switch to the "Expert mode", this will detect if "word" appears twice:

<rule id="ID" name="name">

    <pattern>
        <token>word</token>
        <token skip="-1"/>
        <marker>
            <token>word</token>
        </marker>
    </pattern>
    <message>...</message>
    <example correction="">This word is a <marker>word</marker>.</example>
</rule>

It only works on sentences, you cannot have rules about two consecutive sentences.


(indiajoe) #3

Thank you very much for the help.

So I guess if I want to detect repeated noun or verb in a single sentence, following is what I have to make.

<rule id="ID" name="Catch repetition of nouns and verb">

 <pattern>
  <token postag='NN|VB' postag_regexp='yes'></token>
 <token skip="-1"/>
        <marker>
            <token postag='NN|VB' postag_regexp='yes'></token>
        </marker>
 </pattern>
 <message>Try to re-frame sentence to avoid repeated noun/verb </message>
 <short>Avoid repetition</short>
 <example correction=''>Tom woke up early and Tom went for a walk.</example>
 <example>Tom woke up early and he went for a walk.</example>
</rule>

UPDATE: The above example wouldn't work, since it will raise error when there are any two nouns or verbs in a sentence. Is there a way to define to detect same noun or same verb twice?


(Daniel Naber) #4

You could try this (not tested):

(...)

<marker>
   <token postag='NN|VB' postag_regexp='yes'><match node="0"/></token>
</marker>

This should match only if this token contains the same word as the first token.


(indiajoe) #5

Hi,
Thanks a lot. I was able to get my first rule to work .

Following version, passed all the XML errors.

<rule id="ID" name="Catch repetition of nouns and verb">

 <pattern>
  <token postag='N.*|V.*' postag_regexp='yes'></token>
 <token skip="-1"/>
        <marker>
            <token postag='N.*|V.*' postag_regexp='yes'><match no="0"/></token>
        </marker>
 </pattern>
 <message>Try to re-frame sentence to avoid repeated <match no="1"/> </message>
 <short>Avoid repetition</short>
 <example correction=''>Tom woke up early and <marker>Tom</marker> went for a walk.</example>
 <example>Tom woke up early and he went for a walk.</example>
</rule>

For detecting repetition in consecutive sentences. Is TextLevelRule what I should try? (I found a mention of this at the end of the development-overview page). Is there any example gallery on how to use TextLevelRule?


(Daniel Naber) #6

You could see GenericUnpairedBracketsRule, but it has a lot of logic that may be confusing (https://github.com/languagetool-org/languagetool/blob/master/languagetool-core/src/main/java/org/languagetool/rules/GenericUnpairedBracketsRule.java). TextLevelRule is a very simply interface, you just need to implement it (so, this is not related to XML rules, it required Java programming). The most relevant method is match(List sentences) which gets the text and returns the errors it finds.