I think it would be possible to report this kind of error between very common words. Common file extensions could be excluded, ad well as strings having this multiple times (URLs)
I think it is almost language independent as well.
I just made a test rule for Dutch, and will see what it does in the nightly test. If successful, I will post that rule for other languages.
You could try search and replace. It looks like you are editing text from OCR, am I right?
Doing some changes in a text editor is sometimes very helpful then.
There are some good command line tools as well, especially for Linux. And some Windows editors can use macro’s to special search and replace.
But still… I built a test rule for Dutch, which will be tested in tonight’s run.
There are many more problems in the text, but LT does not detect it! For example, too many spaces so reduce only standard 1 space, an uppercase letter should be proposed after the dot, and so on.
Besides, I use the best and fastest editor in the world for huge text files EmEditor, and I don’t know how to integrate language checking with Emurasoft EmEditor. These are writing errors (human), not any OCR.
Multiple spaces is very easy to replace with single. Just globally replace 2 spaces by one, snd repeat that until no space is found.
LT has, like all tools, some limitations.
Девятая аудитория располагалась на третьем этаже главного корпуса, но была маленькой и неуютной.
Or example:
Все способы имеют свои преимущества и недостатки. Например, световое зонирование не даст полного ощущения деления комнаты на зоны, так что данный способ используют, как правило, в сочетании с другими. А слишком много перегородок, наоборот, приведет к чрезмерному дроблению пространства, в итоге - комната будет выглядеть слишком маленькой и неуютной. Так что при выборе того или иного способа необходимо учитывать эти нюансы.