English: many acronyms are not known to the spell checker

dnaber · February 28, 2018, 4:42pm

About 4 weeks ago, we made a change that lets LT check all-uppercase words in English. This, however, has a side effect of new false alarms. Words like these used to be ignored, now they are considered errors:

PHP
CSS
JSON
CET
CEST

I’ve added those to spelling.txt so they are accepted now, but adding words one by one doesn’t seem to be a clever solution here. What I could do is download lists of acronyms from the net, check them, and add them to spelling.txt. Does anybody have a better idea? Or does anybody know a list of acronyms that would be appropriate for this?

tiagosantos · February 28, 2018, 8:11pm

This has a compatible license. Just has to be referenced on the header.

tiagosantos · February 28, 2018, 9:48pm

I was not remembering this former gifft.

The list is good and I have used it before in the a/an rule.

Added.

Mike_Unwalla · March 1, 2018, 8:16am

For a given number of letters, the number of possible uppercase acronyms is more than the number of dictionary words. (Some combinations of letters are not standard English, but an acronym does not have that restriction.)

If you add all known acronyms, the spell checker will be useless.

You added 5 acronyms, and I easily found these 3 false negatives:
THE CHILDREN WANT A PET. WE DECIDED TO GET A PHP.
TELL JSON ABOUT THE PROBLEM.
“CET REAL!” screamed Alice.

I do not have a solution to the problem. Can we put acronyms in a different file (acronyms.txt) and give users the option to ignore them?

Ruud_Baars · March 1, 2018, 8:48am

I solved this in the disambiguator. There is no use in spelling words with only captials of 2, 3, 4, maybe even 5 characters.

Mike_Unwalla · March 1, 2018, 12:57pm

Hi @Ruud_Baars , please clarify your comment:

What did you solve in the disambiguator? (The latest LT snapshot gives no warning for the examples.)
What does “There is no use in spelling words with only captials of 2, 3, 4, maybe even 5 characters” mean?

Ruud_Baars · March 1, 2018, 1:07pm

I disabled spellchecking for [A-Z]{1,4} in the disambiguator. You could have a look at that file (the other computer is too busy now)

Since any combination of capitals can be a valid acronym, ABC, CBA BAC etc, you don’t add them, just don’t report them, just like numbers…

tiagosantos · March 1, 2018, 2:08pm

@Mike_Unwalla Until 4 weeks ago, upper capsed texts were completely ignored and nobody batted an eyelash about that.

I believe there is obvious progress being made.
Removing features is never a solution.

dnaber · March 1, 2018, 2:23pm

Well, there have been reports from time to time from people who were wondering why their all-uppercase text wasn’t checked at all. That was the reason to make this change in the first place.

tiagosantos · March 1, 2018, 2:25pm

And I also find it a positive change. That is why I assisted with the acronym list, which further perfects this function.

The polemic is about false positives on upper cased texts, not before it. If acronyms can cause false negatives, so do rare words, and they are not treated differently.

aafreen · March 9, 2018, 1:15pm

Exact problem I am facing. LT is capturing all names as spell error. We can’t add all words in spelling.txt.

Japon is a good person. (If I add “Japon”(proper name in spelling.txt))

Japon is a country. ( the spell error in this sentence will not be captured…)

Discostu · March 10, 2018, 7:27am

Isn’t it just wrong to write text in all-uppercase? Why should LT suggest “GET” if the right spelling is “get”?

SkyCharger001 · March 10, 2018, 9:49am

A. All-caps is commonly used to represent shouting. (EG “I CAN’T HEAR YOU! CAN YOU SPEAK LOUDER?”)

B. All-caps can also be used to make the text more noticeable and/or easier to read.
(EG:
HEARTMAN-STR. CLOSED
FOR SEWER-WORK

FROM
2018-04-11 08:00
TILL
2019-04-01 17:00

THROUGH-TRAFFIC: |A>
GREENDRIVE 22-128: |B>
HEARTMAN-STR. 48-76: |C>

requires less focus to be diverted from driving than:

Heartman-str. closed for
sewer-work

from
2018-04-11 08:00
till
2019-04-01 17:00

through-traffic: |A>
Greendrive 22-128: |B>
Heartman-str. 48-76: |C>
)

Ruud_Baars · March 11, 2018, 5:18pm

Yes, it is. But it considered bad practice by all language consultants.