[English] Removal/Editing of rules in grammar.xml

pav-ved · December 27, 2016, 9:45am

Hi all, so we have downloaded the LT packages and have setup a local environment for the same. As we are primarily copyeditors and want the system to aid us with some of the common tasks of copyediting, we are going through all the grammar rules one by one.

Now the LT is a very comprehensive system, and has almost 1500+ rules in the grammar. xml, so we wanted to have a limited sample space for testing, also we would like to tailor it a bit for copyeditors. Our efforts at understanding the system have raised some questions and it would be very helpful to get some clarity on these issues. The questions are as follows:

Is it possible to remove almost all the rules from grammar.xml and just retain a select few?
Is there any documentation for various rules that are not a part of grammar.xml but are part of the main .jar files? If yes can we access them?
Is there any documentation for non-technical users (with some base working knowledge of coding and Xmls) for editing the grammar.xml?
Our observations were that there are many rules in the grammar.xml, which are very much instance specific. And we are attempting to generate rules which could be more comprehensive using POS tags and Chunker tags.
However, is there any documentation apart from the wiki pages which describe the function of the various tags in the grammar.xml?

dnaber · December 27, 2016, 10:13am

Yes, the rules are all independent of each other, so you can remove any rule(s).

There’s a Java-level documentation at LanguageTool 6.4-SNAPSHOT API, every class whose name ends with Rule is an error detection rule.

You already know the Wiki at Development Overview - LanguageTool Wiki. Developing a Disambiguator - LanguageTool Wiki is also important for many languages. The documentation should be complete, let us know if anything is missing.

pav-ved · December 27, 2016, 10:20am

Thanks @dnaber, that is very helpful. I will get back in case we have anymore questions

pav-ved · December 28, 2016, 7:33am

Hi @dnaber , just had a quick question, if I am not wrong, we cannot remove/edit/modify any of the rules in .jar files without impacting the whole system right? Or is it possible?

We have gone through all the error detection rules from the JAVA level documentation and out of the 155 error detection rules only 54 are for English language, so is there any way to have only those and remove the rest? A separate module (set of jar files) just for the English language maybe?

dnaber · December 28, 2016, 8:15am

That’s right, but you can just disable the rules you don’t need, see Command-Line Options - LanguageTool Wiki

No, an English-only version is not available. But why would you want to remove the non-English rules, they won’t be activated for English anyway?

pav-ved · December 28, 2016, 8:34am

Actually some of the manuscripts that we would like to process sometimes contain paragraphs in German/French/Spanish and as the English copyeditors are supposed to work on English text only, we would like to ensure that the LT skips foreign text altogether.

But yes. That will solve the problem, we can use the system just by disabling the other rules. Thanks again