Back to LanguageTool Homepage - Privacy - Imprint

"-b" Flag Programmatically?


#1

If I’m using the JLanguageTool Java API is there any way to specify my desire to treat single line breaks as separate paragraphs (as opposed to two line breaks) just like the command-line tool option -b? As described here: http://wiki.languagetool.org/command-line-options


#2

Found it: https://languagetool.org/development/api/org/languagetool/tokenizers/SentenceTokenizer.html#setSingleLineBreaksMarksParagraph-boolean-


#3

Ok I have a followup… this flag doesn’t seem to be working correctly, here’s an example test:

@Test
public void testSetLineBreaksMarksParagraphTrue() {
    AmericanEnglish english = new AmericanEnglish();
    english.getSentenceTokenizer().setSingleLineBreaksMarksParagraph(true);
    JLanguageTool lang = new JLanguageTool(english);
    
    try {
        List<RuleMatch> matches = lang.check("tuesday, march 3, 2018\n\nhello");
        
        // passes fine, found wrong day of week and capital first letter
        assertEquals(2, matches.size());
        
        matches = lang.check("tuesday, march 3, 2018\nhello");
        
        // why doesn't this pass when I have "setSingleLineBreaksMarksParagraph" set to true?
        // only finding first match
        assertEquals(2, matches.size());
    } catch (IOException io) {}
}

(Daniel Naber) #4

Someone once added a special case to that rule. It will work if you end the first or second sentence with a dot. So it’s basically unrelated the paragraph interpretation.


#5

I’m confused, how is it unrelated to paragraph interpretation? Here check this test case out, it might help illustrate what I’m talking about. I apologize if I misunderstood your point.

Note that there is no period after any of these sentences, yet it finds the uppercase issue for the first set of assertions.

@Test
public void testSetLineBreaksMarksParagraphTrue() {
    AmericanEnglish english = new AmericanEnglish();
    english.getSentenceTokenizer().setSingleLineBreaksMarksParagraph(true);
    JLanguageTool lang = new JLanguageTool(english);
    
    try {
        List<RuleMatch> matches = lang.check("hello I am the first sentence\n\nhi I am the second sentence\n\nhi I am the third sentence");
        
        // passes, finds all three
        assertEquals(3, matches.size());
        assertEquals("This sentence does not start with an uppercase letter", matches.get(0).getMessage());
        assertEquals("This sentence does not start with an uppercase letter", matches.get(1).getMessage());
        assertEquals("This sentence does not start with an uppercase letter", matches.get(2).getMessage());
        
        matches = lang.check("hello I am the first sentence\nhi I am the second sentence\nhi I am the third sentence");
        
        // fails, finds only first
        assertEquals(3, matches.size());
        assertEquals("This sentence does not start with an uppercase letter", matches.get(0).getMessage());
        assertEquals("This sentence does not start with an uppercase letter", matches.get(1).getMessage());
        assertEquals("This sentence does not start with an uppercase letter", matches.get(2).getMessage());
    } catch (IOException io) {}
}

(Daniel Naber) #6

I’m just saying that this is because of a special case added to the uppercase rule, not because the paragraph detection wouldn’t work. The special case, it seems, was added to not have the rule complain about list items where the previous item doesn’t end with a dot. In your case, this prevents an error that seems useful.