Back to LanguageTool Homepage - Privacy - Imprint

Is it possible to extend the LT model to include new NLP constructs, which we can reference in XML?

Is it possible to extend the LT model to include new NLP constructs, which we can reference in XML?

As an example, we have a rule where it would be useful to match on the animacy of a noun, to determine whether we should abbreviate large quantities – for example, 1 million people and 1m albums sold are both correct.

Is there a way for us to extend LanguageTool to insert this data as something we could pattern match on – adding a new property to tokens, for example, so we can use them in rules? For example, in this case instead of having to maintain a list of animate things, as in

    <pattern>
        <marker><token regexp="yes" skip="1">(\d[\d.]*)m</token></marker>
        <token regexp="yes">(adults|Americans|animals|cats ... long list of animate things here)</token>
        <message>million: in copy use m for sums of money, units or inanimate objects, otherwise million<suggestion><match no="1" regexp_match="(\d[\d.]*)m" regexp_replace="$1"/> million</suggestion></message>**?
    </pattern>

we could add our custom annotations and write something like

    <pattern>
        <marker><token regexp="yes" skip="1">(\d[\d.]*)m</token></marker>
        <token custom-namespace_is-animate="true"></token>
        <message>million: in copy use m for sums of money, units or inanimate objects, otherwise million<suggestion><match no="1" regexp_match="(\d[\d.]*)m" regexp_replace="$1"/> million</suggestion></message>**?
    </pattern>

Alternatively, we could work towards contributing something to the core of LanguageTool, if the maintainers felt that was useful – but it would be extra processing effort that many users would likely not benefit from, which makes it feel like a natural candidate for an interface and a user-supplied extension.

Hi Jonathon, the POS tags in LT are basically just strings. You can modify these strings or add new tags by using the disambiguator, as described at https://dev.languagetool.org/developing-a-disambiguator#xml-syntax