Wrong speel checker (POLISH and RUSSIAN and ENGLISH + ALL LANGUAGE) Problem still not solved

nnina084 · September 25, 2020, 5:21pm

The problem with detecting capital letters after the dot, etc.
It does not detect missing spaces after the DOT.
Detection issue with duplicate spaces.

The problem has been around for a long, long time. I reported this issue, it was supposed to be solved but still unresolved.

https://i.postimg.cc/PqzZpRdT/Screen-Shot-09-25-20-at-07-16-PM.jpg

nnina084 · September 25, 2020, 5:32pm

https://i.postimg.cc/6Txwszg1/Screen-Shot-09-25-20-at-07-31-PM.jpg

Ruud_Baars · September 27, 2020, 10:06am

This is not really an easy problem, since all kinds of correct word groups contain a full stop, like URLs, abbreviations etc.

Could you post a text as text, not as screenshot, or even better: both?

nnina084 · September 27, 2020, 1:34pm

Mini example:

Text:

Hey how.are you.

It does not detect an error

I have 10,000,000 such errors in the text and I don’t know how to correct it.

Jan_Schreiber · September 27, 2020, 8:56pm

Maybe you can use a regular expression, e. g. \w\.\w

Ruud_Baars · September 28, 2020, 6:08am

Unfortunately, how.are could easily be a file name. So if we would report this, every filename would cause a report.

Ruud_Baars · September 28, 2020, 6:11am

I think it would be possible to report this kind of error between very common words. Common file extensions could be excluded, ad well as strings having this multiple times (URLs)
I think it is almost language independent as well.

I just made a test rule for Dutch, and will see what it does in the nightly test. If successful, I will post that rule for other languages.

Ruud_Baars · September 28, 2020, 6:54am

You could try search and replace. It looks like you are editing text from OCR, am I right?
Doing some changes in a text editor is sometimes very helpful then.
There are some good command line tools as well, especially for Linux. And some Windows editors can use macro’s to special search and replace.
But still… I built a test rule for Dutch, which will be tested in tonight’s run.

nnina084 · September 28, 2020, 7:10am

There are many more problems in the text, but LT does not detect it! For example, too many spaces so reduce only standard 1 space, an uppercase letter should be proposed after the dot, and so on.
Besides, I use the best and fastest editor in the world for huge text files EmEditor, and I don’t know how to integrate language checking with Emurasoft EmEditor. These are writing errors (human), not any OCR.

Ruud_Baars · September 28, 2020, 10:15am

Multiple spaces is very easy to replace with single. Just globally replace 2 spaces by one, snd repeat that until no space is found.
LT has, like all tools, some limitations.

nnina084 · September 28, 2020, 10:31am

The problem is that you have to distinguish the spaces from the required tabs (regex: \t). For example, the book form.

Ruud_Baars · September 28, 2020, 11:00am

A good editor does that. Kate of KDE in Linux and Notepad++ on Windows e.g.

Yakov · September 28, 2020, 12:52pm

The general rule for all languages can detect a missing space after a dot if the next character is a capital letter.

nnina084 · September 28, 2020, 4:31pm

H. This is not good correction LT, other correctors can detect it e.g.*

[Automatically fixed errors (1)]

This list shows the automatically corrected errors.

To zdanie nie zaczyna się wielką literą

Hej. ja będę.

nnina084 · October 2, 2020, 8:45pm

Девятая аудитория располагалась на третьем этаже главного корпуса, но была маленькой и неуютной.

Or example:

Все способы имеют свои преимущества и недостатки. Например, световое зонирование не даст полного ощущения деления комнаты на зоны, так что данный способ используют, как правило, в сочетании с другими. А слишком много перегородок, наоборот, приведет к чрезмерному дроблению пространства, в итоге - комната будет выглядеть слишком маленькой и неуютной. Так что при выборе того или иного способа необходимо учитывать эти нюансы.

https://www.8marta.ru/articles/zonirovanie-komnatnogo-prostranstva-s-pomoschyu-mebeli.htm
How to fix problem?

gabix · October 3, 2020, 6:20am

And what is the problem? The phrases with the word in bold are correct.

Ruud_Baars · October 3, 2020, 7:01am

Please check this Github issue: #3655

Yakov · November 28, 2020, 9:37am

New rule added for the Russian language.

Yakov · November 28, 2020, 9:40am

The rule for this case is switched to picky mode, and disabled by default for online checking form.