Back to LanguageTool Homepage - Privacy - Imprint

Difference between languagetool and languagetool-en jars


(surekha) #1

I am trying to work on JLanguageTool spell checker. I need to do purely spelling checking. When I searched for spell checkers in java I got this tool and this is very good.
I developed spell checker using

<dependency>
    <groupId>org.languagetool</groupId>
    <artifactId>languagetool</artifactId>
    <version>2.0.1</version>
</dependency>

AND

<dependency>
    <groupId>org.languagetool</groupId>
    <artifactId>language-en</artifactId>
    <version>4.3</version>
</dependency>

I have following doubts on this:

  1. Both are working fine but languagetool is heavy when I convert my project as jar but it is very fast compare to language-en.
  2. language-en is purely checking for spelling issue but languagetool checking special cases like {TEST_WORD} even I selected SpellingCheckRule.

language-en is doing what exactly I want but execusion is slower than languagetool
What is the reason. Do I need to add any other dependancies?
Please help me.


(Daniel Naber) #2

2.0.1 is a very old version, please use the latest release, which is 4.4. Also, could you post the code you’re using to check text?


(surekha) #3

Thank you so much for your quick reply.
I am using jar from here
https://mvnrepository.com/artifact/org.languagetool/languagetool

Is this renamed to languagetool-core(https://mvnrepository.com/artifact/org.languagetool/languagetool-core)

Here is my code:

JLanguageTool langTool;
try {
	   Language lang = Languages.getLanguageForLocale(Locale.US);
           langTool = new JLanguageTool(lang);			
	  for ( Rule rule : langTool.getAllActiveRules()) {	
	       if (! (rule instanceof SpellingCheckRule)) {
		    	 langTool.disableRule(rule.getId());			   			     
		   }
	  }		
    	  List<RuleMatch> matches;
   	  matches = langTool.check(source);
	  for (RuleMatch match : matches) {
 		System.out.println(match.getMessage() + "-"
				+ source.substring(match.getFromPos(),						 
                                   match.getToPos()));
	}		
}catch (IOException e) {
     e.printStackTrace();
}

(Daniel Naber) #4

Please use version 4.4 (e.g. here). Your code looks okay, is it still not fast enough when using 4.4?


(surekha) #5

Yeah. I executed same logic on same set of strings using languageTool jar, it is completed in 0 sec 600 millis where as languagetool-en took 3sec .

Thank you.


(Daniel Naber) #6

The first few uses will take a bit time, so you should check more than just a few words. If you need to check single words and re-create all the LT objects for every word, it will not be fast.


(surekha) #7

Yeah. I noticed that it is taking time for first few uses by adding timer. I did not mean to say it is slow :slight_smile: but compare to languageTool jar it is taking bit more time even logic, apis are same.

`00 min, 01 sec , 1865 millis
Possible spelling mistake found at 83: util

00 min, 00 sec , 426 millis
Possible spelling mistake found at 86: xlf
Possible spelling mistake found at 86: ProjectName
Possible spelling mistake found at 86: jdaABakdjkAFD
Possible spelling mistake found at 86: fkjad

00 min, 00 sec , 03 millis`


(surekha) #8

I need one more help. Spell check is failing on these type of words ${ProjectName}.
Should I use regex to ignore these type of strings or any apis available to ignore them?
I saw addIgnoreTokens , acceptPhrases apis but I can not hardcode values. I want to do it like ${.*}


(Daniel Naber) #9

I think you need to ignore them yourself, there’s no API for that.


(surekha) #10

Ok. Thank you so much for your help.


(surekha) #11

Hi
Is there any api to ignore special characters like _, $ etc? Spell check is failing on special characters.
My aim is, file can contain anything but spell check should only pick if it has alphabets. This will make my life more easy.

Thank you.


(Daniel Naber) #12

No, you need to ignore those matches manually or filter the text before you send it to LanguageTool.


(surekha) #13

Thanks a lot for your help. Could you also reply on this.