Back to LanguageTool Homepage - Privacy - Imprint

Multiple whitespace rule tweaks


(Tiago F. Santos) #1

Hi @dnaber,
Currently the rule that detects multiple white spaces detects correctly indenting Tabs and one white spaces in line start, but it does not detect multiple with spaces in line start.

I made a few tweaks to the rule and I would like to know if I can push it to the main class, or if I should restrict this changes to [pt]. Still annoys on the first line, but the exception works well for all other lines.

The changes are this:

@@ -57,6 +57,7 @@ public class MultipleWhitespaceRule extends Rule {
     List<RuleMatch> ruleMatches = new ArrayList<>();
     AnalyzedTokenReadings[] tokens = sentence.getTokens();
     boolean prevWhite = false;
+    boolean isLineBreakContinuation = false;
     int prevLen = 0;
     int prevPos = 0;
     //note: we start from token 1
@@ -65,8 +66,9 @@ public class MultipleWhitespaceRule extends Rule {
     while (i < tokens.length) {
       boolean tokenIsTab = tokens[i].getToken().equals("\t");
       boolean prevTokenIsLinebreak = tokens[i -1].isLinebreak();
+      isLineBreakContinuation = (prevTokenIsLinebreak || isLineBreakContinuation) && tokens[i].isWhitespace() && !tokenIsTab;
       if ((tokens[i].isWhitespace() ||
-          StringTools.isNonBreakingWhitespace(tokens[i].getToken())) && prevWhite && !tokenIsTab && !prevTokenIsLinebreak) {
+          StringTools.isNonBreakingWhitespace(tokens[i].getToken())) && prevWhite && !tokenIsTab && !prevTokenIsLinebreak && !isLineBreakContinuation) {
         int pos = tokens[i -1].getStartPos();
         while (i < tokens.length && (tokens[i].isWhitespace() ||
             StringTools.isNonBreakingWhitespace(tokens[i].getToken()))) {
@@ -46,13 +46,20 @@ public class MultipleWhitespaceRuleTest {
     assertEquals(0, matches.length);
     matches = rule.match(langTool.getAnalyzedSentence("Multiple tabs\t\tare okay"));
     assertEquals(0, matches.length);
+    matches = rule.match(langTool.getAnalyzedSentence("\n This is a test sentence..."));
+    assertEquals(0, matches.length);
+    matches = rule.match(langTool.getAnalyzedSentence("\n    This is a test sentence..."));
+    assertEquals(0, matches.length);
 
     // incorrect sentences:
     matches = rule.match(langTool.getAnalyzedSentence("This  is a test sentence."));
     assertEquals(1, matches.length);
     assertEquals(4, matches[0].getFromPos());
     assertEquals(6, matches[0].getToPos());
-    
+    matches = rule.match(langTool.getAnalyzedSentence("\n   This  is a test sentence."));
+    assertEquals(1, matches.length);
+    assertEquals(7, matches[0].getFromPos());
+    assertEquals(9, matches[0].getToPos());
     matches = rule.match(langTool.getAnalyzedSentence("This is a test   sentence."));
     assertEquals(1, matches.length);
     assertEquals(14, matches[0].getFromPos());

(Daniel Naber) #2

Thanks, feel free to commit. Don't forget to run all tests (mvn test) instead of just the language-specific tests for PT.


(Tiago F. Santos) #3

Thank you, Daniel. I will run the remaining tests now. Best regards.