The languagetool is taking the complete text and does the grammar check instead of tokenizing the text in sentences. Since it doesn’t tokenize the text, only first word of the complete text is checked for the uppercase rule. If there are two sentences in text, it doesn’t check uppercase rule for second sentence.
Now it works fine when I create new web project and include the library. But when I include it in existing project, it doesn’t work. Looking at the code for SRXSentenceTokenizer, I see it loads segment rules for segment tokenization. But I can’t find the rule file. Does anyone know the directory path for the file if any?
On 2013-08-28 22:58, rahul [via LanguageTool User Forum] wrote:
library. But when I include it in existing project, it doesn’t work.
The segmentation file is org/languagetool/resource/segment.srx inside
languagetool-core-2.2.jar. If the file is not found, you’ll get an
Exception. Could you post how you use LanguageTool in your code?
//Load American English
enLangTool = new JLanguageTool(new AmericanEnglish());
enLangTool.activateDefaultPatternRules();
enLangTool.disableRule(“MORFOLOGIK_RULE_EN_US”); //Disable Spell Check
matches = enLangTool.check(textToAnalyze);
I also tried to use SRXSentenceTokenizer independently but same result. It doesn’t tokenize.
SRXSentenceTokenizer senTokenizer = new SRXSentenceTokenizer(new AmericanEnglish());
List sentences = senTokenizer.tokenize(“This is test to sentence tokenizer. Does it tokenize properly? Let’s check.”);
for(String s: sentences){
System.out.println(s);
}
I don’t use maven in my project. I have included all required libraries in classpath.
That’s strange, your example with using the sentence tokenizer directly
works for me. Maybe there’s a conflict with other libraries of your
project. Can you list those dependencies? Is segment.jar in your
classpath? But I guess you will need to use a debugger to find the
problem.