Java LanguageTool Extension

Mariamriaz · September 6, 2021, 2:25pm

Hello Everyone,

I am extending LanguageTool in Java by making new rules for our own company texts. I am facing an issue where in around 0.09% percent of my whole text throws a RunTimeException while parsing through the text.

I want to find a solution for this, as the new developed rules work for most of the cases I want to ignore these error for few cases that it occurs but I am unable to catch this exception in a try and catch block.

I tried putting the code part of a rule that i create under this method
“public RuleMatch[] match(List sentences) throws IOException”
in a try and catch block but it didnt catch the error.

I want language tool to not check or ignore the rule which is creating runtime exception for a particular text. Does anyone know here how can i do it ??

dnaber · September 6, 2021, 2:41pm

It sounds like it makes more sense to avoid the error in the first place. What’s the full stack trace of the error?

Mariamriaz · September 6, 2021, 2:48pm

Exception in thread “main” java.lang.RuntimeException: java.lang.StringIndexOutOfBoundsException: begin 0, end 4, length 3
at org.languagetool.JLanguageTool.performCheck(JLanguageTool.java:1233)
at org.languagetool.JLanguageTool.checkInternal(JLanguageTool.java:973)
at org.languagetool.JLanguageTool.check(JLanguageTool.java:903)
at org.languagetool.JLanguageTool.check(JLanguageTool.java:885)
at org.languagetool.JLanguageTool.check(JLanguageTool.java:872)
at org.languagetool.JLanguageTool.check(JLanguageTool.java:862)
at org.languagetool.JLanguageTool.check(JLanguageTool.java:844)
at org.languagetool.JLanguageTool.check(JLanguageTool.java:801)
at org.languagetool.JLanguageTool.check(JLanguageTool.java:785)
at application.Jlangtool.main(Jlangtool.java:59)
Caused by: java.lang.StringIndexOutOfBoundsException: begin 0, end 4, length 3
at java.base/java.lang.String.checkBoundsBeginEnd(String.java:3751)
at java.base/java.lang.String.substring(String.java:1907)
at org.languagetool.JLanguageTool$TextCheckCallable.findLineColumn(JLanguageTool.java:1896)
at org.languagetool.JLanguageTool$TextCheckCallable.getTextLevelRuleMatches(JLanguageTool.java:1800)
at org.languagetool.JLanguageTool$TextCheckCallable.call(JLanguageTool.java:1767)
at org.languagetool.JLanguageTool$TextCheckCallable.call(JLanguageTool.java:1739)
at org.languagetool.JLanguageTool.performCheck(JLanguageTool.java:1229)
… 9 more

Mariamriaz · September 6, 2021, 2:49pm

It happens when the toRuleMatchArray(ruleMatches) gets the added rules

dnaber · September 6, 2021, 2:58pm

Maybe that’s caused by some rare Unicode characters. Which text exactly causes this?

Mariamriaz · September 7, 2021, 8:32am

There is some special text where there is a tab in between or no start token.
E.g
“Hello my name is Mariam”
“Hello my name is Mariam.I am a developer”

Something like this. But I want to do this as a strategy purpose as well. Because I can’t alter my code for all text senarios and I want languagetool to skip to check the rules that are creating an error and give a result for all he other rules for a smooth running at user end.