Back to LanguageTool Homepage - Privacy - Imprint

Editing an existing rule?

(Irvine) #1

I am an amateur writer of fiction and would like to use the 40 word sentence rule but find it unwieldy since it does not recognize a colon as a breaking a sentence into two separate sentences.

An example:

I found what the character was saying to be eminently believable, it was entirely possible that he held the secret and nodding in agreement, told him: “This passage is interpreted by language tool as being one continuous sentence, when in fact the use of the colon separates the passage into two distinct sentences, each less than forty words.”

End example

Is there an easy way I can modify the forty word rule to take account of the colon, and, if possible, semi-colons?

(Daniel Naber) #2

It’s probably not very easy for someone not familiar with it, but sentence detection is controlled by the segment.srx file. It’s documented at

(Irvine) #3

Thanks for your reply, as you apparently guessed, editing the SRX file is above my pay-grade. To give an example of my level, it took me four hours to find the correct syntax for rules to correctly detect ‘And’ and ‘But’ at the beginning of a sentence, (see other post here: )

As a thought, while researching how to write rules I noticed that LanguageTool error checking is hierarchical and it often takes several layers to correctly flag an error. How easy would it be to add an exception to the forty word rule that parse for the presence of colons and semi-colons?


(Daniel Naber) #4

Does a colon always end a sentence? Or is it possible to specify when a colon ends a sentence? If we can specify patterns for that, that would fix your original problem. In other words, we probably shouldn’t add an exception to the rule, but rather improve the text analysis that the rule relies on.

(Irvine) #5

If the following is a more detailed response than what you had in mind, I apologise in advance.

“Does a colon always end a sentence?”

This is actually a technically fraught question about what is meant by ending a sentence. Arguably, colons and semicolons never end a sentence. To explain: going back to my school-day’s in the early sixties, before their use started to diminish in favour of simplicity, I was taught that they were to be used for breaking up sentences into manageable chunks; Victorian writers like Dickens, were lauded for both their love of long descriptive sentences with multiple clauses and their skilled use of colons and semicolons to make these sentences readable.

These days, although verbose prose is unfashionable, colons and semicolons are still useful; even though their use comes at the expense of increasing the average length of a sentence.

Some hard rules about the use of colons and semicolons from which we can derive possible patterns.

After a quick bit of research I found this page from the University of Leicester student guide

Which, with some additions, I summarise as follows:

The semi-colon:
To separate items in a list
To link sentences which are closely related, [linking two independent clauses.]
For use with otherwise, however, therefore…

The colon:
To introduce a list
To introduce an explanation, conclusion or amplification
In addition:
From the 1st link below: the colon is used to link an independent clause with a quotation
From the 3rd link below: to separate two independent clauses that are directly related.

Note: The choice between using a colon or a semicolon to link two independent clauses is decided on the basis of whether the clauses are either directly or indirectly related.

A deeper explanation of the use of the colon and semicolon can be found at the following:


Some practicalities.
(Note: Because of a slow internet connection, for the moment, I’m having problems downloading Ratel, but that’s my problem.)

I have downloaded the segment.srx file and am currently studying it in Notepad++. This has allowed me to identify the appropriate section and get a rough feel for the basic rules, along with how they are organised.

The first, (or biggest,) problem that I can see with simply redefining “SENT_END” would be what it would do to higher level rules; in particular capitalisation, though there are probably other rules that would be affected.

Whether to capitalize after a colon is by and large a matter of personal taste, a quick internet search shows little formal agreement among authoritative sources: Some people argue that in British English, only proper nouns are capitalised; however, I am a UK educated Scot who, (possibly as a result of American text-books,) has always tended to capitalise the first letter of an independent clause following a colon.

On the other hand, except in the case of proper nouns, one never capitalises after the semicolon.
(See here )

As I said, I apologise if you feel I have gone into too much depth, but, if you are not offended, I look forward to hearing your thoughts on how to proceed.

By the way, where exactly would I find the “forty word rule” in the open office add-on? I have been looking through the directory and can’t see it, similarly it does not appear to be in the grammar.xml file.


(Daniel Naber) #6

I agree re-defining a “sentence” will probably have side-effects. The sentence length rule isn’t in grammar.xml as it’s one of the few rules implemented directly in Java:

In line 87, a counter gets increased for (almost) any token. It would be easy to change that, e.g. to reset the counter if a colon is reached. I wonder what the error message should say in that case, can it still claim that the sentence is over 40 words long? Does resetting the counter make sense? It would mean you can have sentences of any length, as long as you use a colon or semicolon every 30 words…

Actually, I just see that the sentence length rule is disabled by default, probably a reason why it doesn’t get much attention.

(Irvine) #7

"It would mean you can have sentences of any length, as long as you use a colon or semicolon every 30 words… "

I believe some of the more ‘wordy’ Victorian writers were known to do just that. A descriptive paragraph might consists of a short introductory sentence, followed by a single, extremely long, descriptive sentence. It’s probably part of the reason why intermediate level punctuation fell out of fashion.

“I wonder what the error message should say in that case, can it still claim that the sentence is over 40 words long?”

It’s not exactly sentences that are too long that are bad, but wordiness. By rewriting the sentence length rule to take account of colons and semicolons, would it be possible to introduce a higher level group of rules to deal with their proper use?

I believe that functionally, this is what Microsoft Office does. ie When a sentence is marked as too long, one of the possible solutions is to use intermediate level punctuation. Though, I am not sure about this, since it is a while since I used MS Office.

I am fairly familiar with languages like Pascal and Python. Although I have never used Java, I will have a look at the rule and give more thought to the problem. If I have any ideas, I will get back to you. I have other projects on the go, so it may take few day’s, but ideas usually taste better when well cooked.