Back to LanguageTool Homepage - Privacy - Imprint

Excluding URLs (with/out http)

Hi,

I am currently developing my first rules, one of them should ignore spell check on URLs without http at the beginning. So I tried to create a pattern of my working regex:

<rule name="Ignorieren von Internetadressen" id="IGNORE_URLS">            
  <pattern case_sensitive="yes">
    <!-- @see http://regexr.com/39a8d -->
    <token regexp="yes">https?\:\/\/|\w*</token>
    <token>.</token>
    <token regexp="yes">[^\/\s]+</token>
    <token regexp="yes">\/.*?</token>
  </pattern>
  <disambig action="ignore_spelling"/>
</rule>

But it does not match? What I am doing wrong? Is the “.” token the problem?

Is there an existing rule for URLs with HTTP prefix? Because they seams not to be checked, I did not find one…

kind regards

URLs are ignored in SpellingCheckRule.isUrl(), so your disambiguator rule should not be needed. This is only true for “proper” URLs that start with http, https, or ftp, though.

Ah ok, but ww need a solution to exclude URLs without “http://” nothing special, just a way to simply ignore notations like subdomain.domain.tld. How can we archieve that?

Maybe something like this (not tested):

<rule name="Ignorieren von Internetadressen" id="IGNORE_URLS">           
      <pattern>
        <token />
        <token spacebefore="no">.</token>
        <token />
        <token spacebefore="no">.</token>
        <token regexp="yes" spacebefore="no">(org|com|net|de)</token>
      </pattern>
      <disambig action="ignore_spelling"/>
    </rule>

Tank you works a expected, even if I have modified the last token:

<rule name="Ignorieren von Internetadressen" id="IGNORE_URLS">           
        <!-- @see http://regexr.com/39a8d -->
        <pattern>
            <token />
            <token spacebefore="no">.</token>
            <token />
            <token spacebefore="no">.</token>
            <token regexp="yes" spacebefore="no">[a-zA-Z]{2,}</token>
        </pattern>
        <disambig action="ignore_spelling"/>
    </rule>

Now it should be match the most obvious cases (we do not need a complete 100% solution here.