I ran LT on the full en Wikipedia, as shown on Checking The Complete Wikipedia or a Corpus - LanguageTool Wiki.
The start of the output from LT shows that there is no limit to the sentences to check:
These rules are disabled:
All spelling rules are disabled
Working on: …/enwiki-20160920-pages-articles-multistream.xml
Sentence limit: no limit
Error limit: no limit
But, processing stops before LT tests all the Wikipedia data:
23,440,000 sentences checked...
23,445,000 sentences checked...
23,450,000 sentences checked...
Exception in thread "main" java.lang.RuntimeException: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[664
70483,50]
Message: JAXP00010004: The accumulated size of entities is "50,000,001" that exceeded the "50,000,000" limit set by "FEA
TURE_SECURE_PROCESSING".
at org.languagetool.dev.dumpcheck.WikipediaSentenceSource.hasNext(WikipediaSentenceSource.java:84)
at org.languagetool.dev.dumpcheck.MixingSentenceSource.hasNext(MixingSentenceSource.java:75)
at org.languagetool.dev.dumpcheck.SentenceSourceChecker.run(SentenceSourceChecker.java:175)
at org.languagetool.dev.dumpcheck.SentenceSourceChecker.main(SentenceSourceChecker.java:80)
at org.languagetool.dev.wikipedia.Main.main(Main.java:45)
Caused by: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[66470483,50]
Message: JAXP00010004: The accumulated size of entities is "50,000,001" that exceeded the "50,000,000" limit set by "FEA
TURE_SECURE_PROCESSING".
at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(Unknown Source)
at com.sun.xml.internal.stream.XMLEventReaderImpl.nextEvent(Unknown Source)
at org.languagetool.dev.dumpcheck.WikipediaSentenceSource.handleTextElement(WikipediaSentenceSource.java:144)
at org.languagetool.dev.dumpcheck.WikipediaSentenceSource.fillSentences(WikipediaSentenceSource.java:127)
at org.languagetool.dev.dumpcheck.WikipediaSentenceSource.hasNext(WikipediaSentenceSource.java:82)
... 4 more
D:\LanguageTool-wikipedia-3.6-SNAPSHOT>
As best I can tell, FEATURE_SECURE_PROCESSING is used by 3rd-party software that LT uses. (I did not find FEATURE_SECURE_PROCESSING in the LT Github repository.)
What, if anything, can I do to check LT rules against all the Wikipedia data?