Is it possible to extend the LT model to include new NLP constructs, which we can reference in XML?
As an example, we have a rule where it would be useful to match on the animacy of a noun, to determine whether we should abbreviate large quantities – for example, 1 million people and 1m albums sold are both correct.
Is there a way for us to extend LanguageTool to insert this data as something we could pattern match on – adding a new property to tokens, for example, so we can use them in rules? For example, in this case instead of having to maintain a list of animate things, as in
<pattern>
<marker><token regexp="yes" skip="1">(\d[\d.]*)m</token></marker>
<token regexp="yes">(adults|Americans|animals|cats ... long list of animate things here)</token>
<message>million: in copy use m for sums of money, units or inanimate objects, otherwise million<suggestion><match no="1" regexp_match="(\d[\d.]*)m" regexp_replace="$1"/> million</suggestion></message>**?
</pattern>
we could add our custom annotations and write something like
<pattern>
<marker><token regexp="yes" skip="1">(\d[\d.]*)m</token></marker>
<token custom-namespace_is-animate="true"></token>
<message>million: in copy use m for sums of money, units or inanimate objects, otherwise million<suggestion><match no="1" regexp_match="(\d[\d.]*)m" regexp_replace="$1"/> million</suggestion></message>**?
</pattern>
Alternatively, we could work towards contributing something to the core of LanguageTool, if the maintainers felt that was useful – but it would be extra processing effort that many users would likely not benefit from, which makes it feel like a natural candidate for an interface and a user-supplied extension.