Advice on reducing these false positives?

MicahBly · January 25, 2022, 12:03am

We’re evaluating using LanguageTool as an embedded tool in our CMS for technical writers. In doing some preliminary testing, I came across a few hits that seem like they are probably fixable with some kind of rule, but would like to ask what kind of fix would be appropriate in LT.

(I’m doing this testing currently with my personal LT account, so it’s going against whatever is in the cloud).

Software strings (button names, menu name, etc) get flagged for capitalization:
“To use Boost mode, press the 47°C (Boost Mode) key.”
“Press the On/Standby key to enter Ready mode.”
Hit: Only proper nouns start with an uppercase character (there are exceptions for headlines).
Software strings that look like they are part of surrounding sentence:
"If a patient requires cooling, press the Heat Off key. "
"To return to Ready mode, press the Fan Off key. "
Hit: The adjective or adverb “Off-key” (= out of tune) is spelled with a hyphen
Abbreviated word causes next word to be considered start of sentence.
“Max. current at 100 V = 8 A”
Hit: This sentence does not start with an uppercase letter.

I applied bold formatting here to the word that LT highlighted.

Would these be easy hits to fix, or something we’d have to get an expert’s help with? Just trying to gauge how much time we’ll be asked to spend tweaking rules to reduce noise hits.

dnaber · January 25, 2022, 8:19am

Hi Micah,

(3) is a bug that I’ll fix soon (by adding Max to segment.srx). The other cases would need changes to the Java code, by adding exceptions there. You would need to look at the JSON results to find the rule id that causes the match, then add an exception there. That would be tricky with updates and with the premium version. It would be great if we had a more generic solution for that, but we don’t have that yet.

MicahBly · January 25, 2022, 3:16pm

Hi Daniel.

SRX rules… ha, ha (the joke’s on me): I thought I had escaped those when I left the translation side of things 5 years ago.

The software strings are tough to detect without some kind of warning to the user. (Other tools get hits on these as well). We do have tags around them in our CMS, but we strip all the tags out to make a more readable version before passing to LT (or at least, that’s our plan), so they wouldn’t be any good to us anyway. But is there anything we could add before/after a software ref that could be used to tell LT to not complain about rule #419 (or whatever)? eg, “If a patient requires cooling, press the {{Heat Off}} key”? Obviously, we wouldn’t want it to create MORE noise…

dnaber · January 25, 2022, 3:32pm

If you don’t want to turn off the rule completely, I don’t see a good way to avoid these matches yet. You can send text and markup (using the data parameter instead of the text parameter), but that can have side effects, as the “markup” (which is actually just text you’d like to ignore) will then be totally ignored.

dnaber · January 25, 2022, 3:39pm

Of course, there’s always the way to post-process the LT results and just ignore these matches if they cover words that you have added to your own “ignore list”.

MicahBly · January 25, 2022, 3:58pm

Sounds like we’ll just have to turn the rule off (which is probably fine). We get 1000s of software strings so if we put every word into an ignore list, pretty soon the ignore list would be the size of the spell checking dictionary.

If I’m going to tilt at windmills, I’d rather try to get some of the software teams to use LT proactively, rather than the tech writers pushing back after strings are already written and software baked.

Anyway, thanks for the info. Good to know.