Back to LanguageTool Homepage - Privacy - Imprint

English: many acronyms are not known to the spell checker


(Daniel Naber) #1

About 4 weeks ago, we made a change that lets LT check all-uppercase words in English. This, however, has a side effect of new false alarms. Words like these used to be ignored, now they are considered errors:

PHP
CSS
JSON
CET
CEST

I’ve added those to spelling.txt so they are accepted now, but adding words one by one doesn’t seem to be a clever solution here. What I could do is download lists of acronyms from the net, check them, and add them to spelling.txt. Does anybody have a better idea? Or does anybody know a list of acronyms that would be appropriate for this?


(Tiago F. Santos) #2

This has a compatible license. Just has to be referenced on the header.


(Tiago F. Santos) #3

I was not remembering this former gifft.

The list is good and I have used it before in the a/an rule.

Added.


(Mike Unwalla) #4

For a given number of letters, the number of possible uppercase acronyms is more than the number of dictionary words. (Some combinations of letters are not standard English, but an acronym does not have that restriction.)

If you add all known acronyms, the spell checker will be useless.

You added 5 acronyms, and I easily found these 3 false negatives:
THE CHILDREN WANT A PET. WE DECIDED TO GET A PHP.
TELL JSON ABOUT THE PROBLEM.
“CET REAL!” screamed Alice.

I do not have a solution to the problem. Can we put acronyms in a different file (acronyms.txt) and give users the option to ignore them?


(Ruud Baars) #5

I solved this in the disambiguator. There is no use in spelling words with only captials of 2, 3, 4, maybe even 5 characters.


(Mike Unwalla) #6

Hi @Ruud_Baars , please clarify your comment:

  • What did you solve in the disambiguator? (The latest LT snapshot gives no warning for the examples.)
  • What does “There is no use in spelling words with only captials of 2, 3, 4, maybe even 5 characters” mean?

(Ruud Baars) #7

I disabled spellchecking for [A-Z]{1,4} in the disambiguator. You could have a look at that file (the other computer is too busy now)

Since any combination of capitals can be a valid acronym, ABC, CBA BAC etc, you don’t add them, just don’t report them, just like numbers…


(Tiago F. Santos) #8

@Mike_Unwalla Until 4 weeks ago, upper capsed texts were completely ignored and nobody batted an eyelash about that.

I believe there is obvious progress being made.
Removing features is never a solution.


(Daniel Naber) #9

Well, there have been reports from time to time from people who were wondering why their all-uppercase text wasn’t checked at all. That was the reason to make this change in the first place.


(Tiago F. Santos) #10

And I also find it a positive change. That is why I assisted with the acronym list, which further perfects this function.

The polemic is about false positives on upper cased texts, not before it. If acronyms can cause false negatives, so do rare words, and they are not treated differently.


(Aafreen) #11

Exact problem I am facing. LT is capturing all names as spell error. We can’t add all words in spelling.txt.

Japon is a good person. (If I add “Japon”(proper name in spelling.txt))

Japon is a country. ( the spell error in this sentence will not be captured…)


(Michael) #12

Isn’t it just wrong to write text in all-uppercase? Why should LT suggest “GET” if the right spelling is “get”?


(Lodewijk Arie van Brienen) #13

A. All-caps is commonly used to represent shouting. (EG “I CAN’T HEAR YOU! CAN YOU SPEAK LOUDER?”)

B. All-caps can also be used to make the text more noticeable and/or easier to read.
(EG:
HEARTMAN-STR. CLOSED
FOR SEWER-WORK

FROM
2018-04-11 08:00
TILL
2019-04-01 17:00

THROUGH-TRAFFIC: |A>
GREENDRIVE 22-128: |B>
HEARTMAN-STR. 48-76: |C>

requires less focus to be diverted from driving than:

Heartman-str. closed for
sewer-work

from
2018-04-11 08:00
till
2019-04-01 17:00

through-traffic: |A>
Greendrive 22-128: |B>
Heartman-str. 48-76: |C>
)


(Ruud Baars) #14

Yes, it is. But it considered bad practice by all language consultants.