Rules for SRXSentenceTokenizer

rahul · August 28, 2013, 8:58pm

Hi

The languagetool is taking the complete text and does the grammar check instead of tokenizing the text in sentences. Since it doesn’t tokenize the text, only first word of the complete text is checked for the uppercase rule. If there are two sentences in text, it doesn’t check uppercase rule for second sentence.

Now it works fine when I create new web project and include the library. But when I include it in existing project, it doesn’t work. Looking at the code for SRXSentenceTokenizer, I see it loads segment rules for segment tokenization. But I can’t find the rule file. Does anyone know the directory path for the file if any?

Thank you.

dnaber · August 28, 2013, 9:06pm

On 2013-08-28 22:58, rahul [via LanguageTool User Forum] wrote:

library. But when I include it in existing project, it doesn’t work.

The segmentation file is org/languagetool/resource/segment.srx inside
languagetool-core-2.2.jar. If the file is not found, you’ll get an
Exception. Could you post how you use LanguageTool in your code?

Regards
Daniel

rahul · August 28, 2013, 9:19pm

I use below code in my project-

//Load American English
enLangTool = new JLanguageTool(new AmericanEnglish());
enLangTool.activateDefaultPatternRules();
enLangTool.disableRule(“MORFOLOGIK_RULE_EN_US”); //Disable Spell Check

matches = enLangTool.check(textToAnalyze);

I also tried to use SRXSentenceTokenizer independently but same result. It doesn’t tokenize.

SRXSentenceTokenizer senTokenizer = new SRXSentenceTokenizer(new AmericanEnglish());
List sentences = senTokenizer.tokenize(“This is test to sentence tokenizer. Does it tokenize properly? Let’s check.”);
for(String s: sentences){
System.out.println(s);
}

I don’t use maven in my project. I have included all required libraries in classpath.

rahul · August 28, 2013, 9:21pm

Also it doesn’t throw any exception as well.

dnaber · August 28, 2013, 9:44pm

That’s strange, your example with using the sentence tokenizer directly
works for me. Maybe there’s a conflict with other libraries of your
project. Can you list those dependencies? Is segment.jar in your
classpath? But I guess you will need to use a debugger to find the
problem.

Regards
Daniel