Back to LanguageTool Homepage - Privacy - Imprint

[Solved] How to indicate JJ minus NN?

(Kumara) #1

I want to create a rule that detects misspelling of "quite" as "quiet". To do that I want to indicate the following token as adjectives that aren't also nouns or prepositions. This doesn't seem to work:

 <token postag='JJ'><exception postag_regexp="yes" postag='NN|RP'/></token>`

"Winter" and "animal" still gets included. Am I missing sth?

Here's the whole rule (in case you want to know):

<!-- English rule, 2016-11-23 -->
    <rule id="MISSPELLING_QUIET_QUITE" name="Misspelling: quiet (quite)">
      <token postag="JJ"><exception postag_regexp="yes" postag="NN|RP"/></token>
     <message>Do you mean <suggestion>quite</suggestion>?</message>
     <short>Possible misspelling</short>
     <example correction=''>It has become <marker>quiet</marker> troublesome.</example>
     <example>It has become quite troublesome.</example>
     <example>It has become quiet.</example>

(Kumara) #2

For now, I'm settling for
<token postag='JJ'><exception postag="RP"/><exception regexp="yes">today|all|animal|winter|now</exception></token>

It's not so elegant, and will trigger false positive that's not among the test sentences.

(Daniel Naber) #3

You can use, it will show that "winter" and "animal" are tagged as NN:UN.

(Daniel Naber) #4

Feel free to write those rules of course, but please be aware that we already have an ngram based rule for that:

r=0.808 in that line means that we can detect about 80% of wrongly used quite/quiet pairs, i.e. the ngram "rule" works in both directions. Technical details are documented at

(Kumara) #5

Oh, now I get it. I should specify NN.* instead. I now have

 <token postag="JJ"><exception postag_regexp="yes" postag="NN.*|RP|DT|PDT|VB.*"/></token>

That took away a lot of false positives, but "winter" remains!

I understand this is probably due to the disambiguation rules (which I don't understand). Still I want to make this work. So what do I do? Use chunks instead?

(Kumara) #6

I believe that "ngram based rule" thing means using statistics. Not good enough for me. Doesn't detect "quiet problematic". (Somehow LT regards problematic as noun.)

(Kumara) #7

Nope. "Attribute 'chunk' is not allowed to appear in element 'exception'."

(Daniel Naber) #8

"This attitude is quiet problematic." is detected on Whether something is considered a noun doesn't matter for the ngram-based approach. will show you a "Disambiguator log" so you can get the ID of the disambiguation rule that causes this issue. You'll then need to see if you can improve that rule in disambiguation.xml. The way it works it documented in the wiki.

(Kumara) #9

Ah ha! There goes another one that gets flagged there (and in languagetool.jar too), but not on my LO. Bug in the OXT?

(Daniel Naber) #10

Do these errors get detected in your LibreOffice?

    I can't remember how to go their.
    I didn't now where it came from.
    Alabama has for of the world's largest stadiums.

If not, you might not have configured the ngram data directory in the LT settings inside LibreOffice.

(Kumara) #11

Right. And to do that I need to follow this?

(Daniel Naber) #12

Yes, exactly.

(Kumara) #13

Not an option for me. Don't have the luxury of an SSD.
Thanks, anyway.

(Mike Unwalla) #14

@Kumara, an SSD is not necessary. I use the n-gram data, and I don't have an SSD.

(Kumara) #15

I'm sure it's possible, Just don't want a slower computer.