Back to LanguageTool Homepage - Privacy - Imprint

Identifying tokens ending with 'th'

How can we identify tokens ending with ‘th’ like fourth, tenth etc.

<token spacebefore="no">th</token>

The above rule does not seem to be working.

<token regexp="yes">\w+th</token>

Should work for most use cases.

Thanks for the help. What is the '\w’used for?

Any alphanumeric character. The plus sign means “one or more of the preceding expression”. So \w+th means “a string of letters and numbers of any length, followed by th.”
A slightly safer way to express \w would be [a-z0-9äöüßéèáà] if you need accented characters, because they are not included in \w afaik.
You might want to check out this interactive regex tutorial. Quite useful.

This worked fine but raised false alarms in words like ‘with’. Any way around?

<token regexp="yes">\w+th<exception regexp="yes">with|smith|width</exception></token>

It is not that simple. This is the best I have been able to do:

<rule>
    <pattern>
        <token regexp="yes" postag="JJ">\w+th<exception postag="NN.+|V.*" postag_regexp="yes"/><exception regexp="yes">north|south</exception></token>
    </pattern>
    <message>Did you mean <suggestion>aaa</suggestion>?</message>
    <example correction="aaa"><marker>sixth</marker>.</example>
    <example correction="aaa"><marker>6th</marker>.</example>
    <example>north, south, width, with, smooth</example>
    <example>The North Slope is mostly tundra peppered</example>
</rule>

Probably it’s better to do this:

<token regexp="yes">\d+th|eighteenth|eighth|eightieth|eleventh|fifteenth|fifth|fiftieth|fortieth|fourteenth|fourth|nineteenth|ninetieth|ninth|hundredth|millionth|thousandth|seventeenth|seventh|seventieth|sixteenth|sixth|sixtieth|tenth|thirteenth|thirtieth|twelfth|twentieth|.*-(eighth|fifth|fourth|ninth|seventh|sixth)</token>

Thank You.