Back to LanguageTool Homepage - Privacy - Imprint

Detect zeros instead of "o" for english


#1

Hi all
I have the following sentence

I don't like th0se books dfdfdfdf

As you see in word "th0se" zero is instead of "o". However, checktool shows error only in last word "dfdfdfdf". I have a lot of zeros after ocr recognition and want to make language tool to detect these zeros. Is it possible?


(jaumeortola) #2

The current configuration in English ignores every word containing a digit. This behavior has pros and cons... To change this behavior you need to add this line:

fsa.dict.speller.ignore-numbers=false

in this file (for American English):

I think you can solve your problem easily using a text editor. Replace any '0' (zero) joined to a letter with a 'o'. Use a regular expression like this: replace "(\w)0" with "$1o", and replace "0(\w)" with "o$1".


(Josep Bofarull Gall├ęs) #3

Good idea Jaume, after replaced words using text editor, we can check again the text with LT. I don't know if OCR recognition can use an advanced corrector, only a dictionary or do nothing.


(Lodewijk Arie van Brienen) #4

nitpicks:
A. what about zeros in a subscriptless rendition of chemical formulas ? I've once seen a formula that had an element ten times per molecule.

B. try the following sentence: "The system has only f0e bytes of ram in total." treat it as a typo and you'll get a nonsense phrase, but treat it as hex/3835 and it makes a lot if sense.


(jaumeortola) #5

You'll need to supervise the corrections anyway. You cannot do it automatically. Either with the regexp strategy or with the spell-checker (properly configured), you have to oversee the changes. There is no magic bullet.


(Lodewijk Arie van Brienen) #6

That's one of the reasons why I call them nitpicks.