Back to LanguageTool Homepage - Privacy - Imprint

Dealing with punctuation


(SafeTex) #1

Hello

I have a few problems with strange characters appearing where I have quotation marks like ' or "

I also have a problem as sometimes, my documents have both straight and curly quotation marks, round and double square brackets << >>

I have a general idea of the problem and I've read a post on the forum entitled

"Suggestions for 16-bit (wide, utf8, unicode) files?"

But the solution doesn't seem to be a 'search' macro but a running command line code solution.

And there, I'm lost.

Any chance of an explanation on how to make sure that all my quotation marks are straight (not curly) and my brackets are round (not square) ?

Thanks


(Daniel Naber) #2

On Di 27.11.2012, 00:43:22 you wrote:

I have a few problems with strange characters appearing where I have
quotation marks like ' or "

If that's caused by wrong encoding, please use the -c option to specify the
encoding of the input file.

Any other cleanup will need to be done by means outside of LT, like sed or
similar (http://en.wikipedia.org/wiki/Sed).

Regards
Daniel

--
http://www.danielnaber.de


(SafeTex) #3

Thanks Daniel but I have my hands full with just learning LT and a load of other stuff as well as working as a translator

Running line commands means nothing to me. I'll have to wait for someone to bring out an easier software to use.

By the way, I normally save my .docx to UTF 8 before checking it with LT

You say I have a coding problem so my question is what is the best code for LT in that case ?

Thanks

SafeTex


(Daniel Naber) #4

On Di 27.11.2012, 13:52:37 you wrote:

By the way, I normally save my .docx to UTF 8 before checking it with LT

In that case, just add "-c utf-8" to the other options (if any) when
checking the plain text file.

--
http://www.danielnaber.de


(SafeTex) #5

Hello Daniel

You said

In that case, just add "-c utf-8" to the other options (if any) when
checking the plain text file.

I'm not a programmer. The only options when I run LT for me are the ones in the menu>option.

So how do I add " "-c utf-8" there? I doubt this is what you mean anyway.

Pretty please. Can u make it a bit clearer for me?

Thanks


(Daniel Naber) #6

On Sa 01.12.2012, 12:04:35 you wrote:

I'm not a programmer. The only options when I run LT for me are the ones
in the menu>option.

When using the stand-alone version, LT assumes the files are in the encoding
of your system. If you're on Windows, that's cp-1252. Thus you should save
your text files in that encoding.

--
http://www.danielnaber.de


(SafeTex) #7

Hello Daniel

Just can't get this code to work and yet you gave me the basics

I have

<rule id="EXAMPLE_RULE" name="Find French Bracket">

		<pattern> 
				<token>«</token>
		</pattern> 
			<message>Did you mean <suggestion>"</suggestion>?</message>
			<example type="incorrect">French<marker>«</marker> is wrong.</example>
			<example type="correct">English " is right.</example>
		</rule>

What might be causing the problem is that when I load the text, I can see that just before « a strange character appears. It looks like Â

I take it out but the code still does not find « ?

Any ideas please?

Thanks


(Daniel Naber) #8

On Mo 03.12.2012, 12:20:14 you wrote:

I take it out but the code still does not find « ?

Sorry, I tested this with German and text tokenization didn't work the same
there. This should now be fixed, so please try again with tomorrow's
snapshot from http://www.languagetool.org/download/snapshots/?C=M;O=D

Regards
Daniel

--
http://www.danielnaber.de


(SafeTex) #9

Hi

I've finally got the software up and running again and yes, the new 'snapshot' as you call it does work

<rule id="EXAMPLE_RULE" name="Find French Bracket">

		<pattern> 
				<token regexp="yes">«|»|“|”|‹|›</token>
		</pattern> 
		</rule>

Once you can type the punctuation marks directly into the rules, it is so much easier

Thanks