Error in generating report

the79D37 · May 19, 2016, 12:21pm

Hello
i am trying to generate a report and as far as I understand, the command should be something like this:

c:\languageTool>java -jar languagetool-commandline.jar --api -a -l en c:/test/test.properties c:/test/output.txt

but I get following error:
java.lang.IllegalArgumentException: API format makes no sense for automatic application of suggestions

What’s the problem there, please, any suggestion?

dnaber · May 19, 2016, 4:31pm

You can only use --api (XML output) or -a (automatically apply suggestions), not both, as there can only be one output: XML report or text with suggestions applied.

the79D37 · May 20, 2016, 8:21am

Hmm, my grammar.xml rules work only if -a is on, I mean, the text is validated only by selected rules, when -a is off, it gets me tons of errors =(

dnaber · May 20, 2016, 8:30am

The point of LanguageTool is to find errors, not to correct them automatically. I suggest you forget about the -a option, it’s only useful for very few, very specific cases: when you only have rules that are 100% correct and never introduce new errors.

the79D37 · May 20, 2016, 8:38am

Ok, Daniel, for now I have the following situation: I opened LT in GUI, unchecked some rules, so grammar.xml has changed, it got default=“off” for some rules, also I added my own rule to disambiguation.xml.
Now I want to run batch check for some folder with my selected rules and if it is possible, get some kind of report file.
Can I do it?

Thank you for your help, I appreciate it

jaumeortola · May 20, 2016, 9:31am

Hi, Alex.
You will need some batch script to do a batch check. I wrote one for Windows some time ago:

github.com

jaumeortola/languagetool-commandline-scripts/blob/master/Windows/revisa.bat

@echo off

CLS

:: ***** DIRECTORIS *****
SET dir_principal=C:\Users\jaume\corrector\
SET dir_textoriginal=%dir_principal%text_original\
SET dir_textpla=%dir_principal%text_pla\
SET dir_resultats=%dir_principal%resultats\
SET jar_tika=%dir_principal%programes\tika-app-1.6.jar
SET jar_lt=%dir_principal%programes\languagetool\languagetool-commandline.jar
SET informe_html=%dir_principal%programes\results-to-html.pl

:: ***** CONFIGURACI� IEC *****
SET langcode=ca-ES
SET enabledRules=GUIONET_GUIO,PUNTS_SUSPENSIUS,EXIGEIX_PLURALS_S,COMETES_TIPOGRAFIQUES,APOSTROF_TIPOGRAFIC
SET disabledRules=MORFOLOGIK_RULE_CA_ES
SET lt_opt=-u -b -c utf-8 -l %langcode% -d %disabledRules% -e %enabledRules%
::ECHO %lt_opt%

This file has been truncated. show original

This script reads every file on a folder, converts them to plain text (using Tika library), and sorts the results and shows them in html with a Perl script (there is also a Python version).

You can take it as a guide and adapt it to your needs.

the79D37 · May 20, 2016, 10:24am

Thanks for your suggestion, jaumeortola!

I see that your manually select and deselect rules. Is it possible to load them using grammar.xml somehow?

jaumeortola · May 20, 2016, 10:51am

You can see all the options of the command-line running “java -jar languagetool-commandline.jar”. In my script the default rules in grammar.xml are used and then some of them are enabled and some of them disabled. You can also use “-eo, --enabledonly” which means “disable all rules except those enabled explicitly in -e”.

$ java -jar languagetool-commandline.jar 
Usage: java -jar languagetool-commandline.jar [OPTION]... FILE
 FILE                      plain text file to be checked
 Available options:
  -r, --recursive          work recursively on directory, not on a single file
  -c, --encoding ENC       character set of the input text, e.g. utf-8 or latin1
  -b                       assume that a single line break marks the end of a paragraph
  -l, --language LANG      the language code of the text, e.g. en for English, en-GB for British English
  --list                   print all available languages and exit
  -adl, --autoDetect       auto-detect the language of the input text
  -m, --mothertongue LANG  the language code of your first language, used to activate false-friend checking
  -d, --disable RULES      a comma-separated list of rule ids to be disabled (use no spaces between ids)
  -e, --enable RULES       a comma-separated list of rule ids to be enabled (use no spaces between ids)
  -eo, --enabledonly       disable all rules except those enabled explicitly in -e
  --enablecategories CAT   a comma-separated list of category ids to be enabled (use no spaces between ids)
  --disablecategories CAT  a comma-separated list of category ids to be disabled (use no spaces between ids)
  -t, --taggeronly         don't check, but only print text analysis (sentences, part-of-speech tags)
  -u, --list-unknown       also print a summary of words from the input that LanguageTool doesn't know
  -b2, --bitext            check bilingual texts with a tab-separated input file,
                           see http://languagetool.wikidot.com/checking-translations-bilingual-texts
  --api                    print results as XML
  -p, --profile            print performance measurements
  -v, --verbose            print text analysis (sentences, part-of-speech tags) to STDERR
  --version                print LanguageTool version number and exit
  -a, --apply              automatically apply suggestions if available, printing result to STDOUT
  --rulefile FILE          use an additional grammar file; if the filename contains a known language code,
                             it is used in addition of standard rules
  --falsefriends FILE      use external false friend file to be used along with the built-in rules
  --bitextrules  FILE      use external bitext XML rule file (useful only in bitext mode)
  --languagemodel DIR      a directory with e.g. 'en' sub directory (i.e. a language code) that contains
                           '1grams'...'3grams' sub directories with Lucene indexes with
                           ngram occurrence counts; activates the confusion rule if supported
  --xmlfilter              remove XML/HTML elements from input before checking (this is deprecated)
  --line-by-line           work on file line by line (for development, e.g. inside an IDE)

the79D37 · May 26, 2016, 12:21pm

Hi
I faced another problem. Currently I am using only following categories:
TYPOS,MISC,PLAIN_ENGLISH,GRAMMAR

But I 've got result such as
[error fromy=“0” fromx=“0” toy=“0” tox=“2” ruleId=“UPPERCASE_SENTENCE_START”]
or
[error fromy=“0” fromx=“25” toy=“0” tox=“31” ruleId=“SENTENCE_WHITESPACE”

as far as I understand these rules aren’t applied. but anyway, the check was performed. Also, the rule I added at disambiguation.xml doesn’t work when I try to launch with command line.

Any suggestion, please?

dnaber · May 26, 2016, 9:32pm

What’s the exact command line you’re running? Does it contain -eo? To get more information about disambiguation rules being applied, you can specify -v.

the79D37 · May 27, 2016, 11:33am

Hi

Exact command is:
-u -b --api -c utf-8 --enablecategories TYPOS,MISC,PLAIN_ENGLISH,GRAMMAR -l en-Gb