Back to LanguageTool Homepage - Privacy - Imprint

SENT_END and PARA_END


(Andriy) #1

Currently setting SENT_END on the last token makes some rules a bit flaky.
Consider these two sentences in https://languagetool.org/

He pointed to it’s reddest area.

and

He pointed to it’s reddest area

First generates an error, while second does not.
The reason is that negate_pos=“yes” on the last token in the rule need to take to account SENT_END and many rules (if not most) do not. To allow the rule to work for sentence that ends on the last word you have to add “|SENT_END” to the postag attribute. That’s a bit tricky to remember.

See the patch below to illustrate this in grammar.xml

What’s worse your last token may also get PARA_END (I suspect you can’t trigger that in grammar.xml but it happens on real texts via command line or REST API).
So technically on any rule that has negate_pos in the last token you need to add “|SENT_END|PARA_END”.

This technically may also apply to some Java rules (I know I noticed this moment with SENT_END while writing some of Ukrainian Java rules, but I don’t think even accounted for PARA_END).

diff --git a/languagetool-language-modules/en/src/main/resources/org/languagetool/rules/en/grammar.xml b/languagetool-language-modules/en/src/main/resources/org/languagetool/rules/en/grammar.xml
index d4e8d21..68603d2 100644
--- a/languagetool-language-modules/en/src/main/resources/org/languagetool/rules/en/grammar.xml
+++ b/languagetool-language-modules/en/src/main/resources/org/languagetool/rules/en/grammar.xml
@@ -2990,6 +2990,8 @@
                 </pattern>
                 <message>Did you mean <suggestion>its <match no="4"/> <match no="5"/></suggestion>?</message>
                 <example correction="its reddest area">For the painting, <marker>it's reddest area</marker> was in the upper left.</example>
+                <example correction="its reddest area">He pointed to <marker>it's reddest area</marker>.</example>
+                <example correction="its reddest area">He pointed to <marker>it's reddest area</marker></example>
             </rule>
             <!-- for it's .*/JJ|NN|NNS::word=for its::pivots=\1,its -->
             <rule id="FOR_ITS_NN" name="for its NN (possessive)">

(Yakov) #2

issue #1205

I think that SENT_START and SENT_END should be handled similarly.


(Andriy) #3

Here’s another interesting moment. Sometimes the sentence gets \n and PARA_END after SEND_END, here’s the AnalyzedSentence (tagged as part of bigger text):

[<S> Псевдосервіс[Псевдосервіс/null],[,/null] будь[бути/verb:imperf:impr:s:2] ласка[ласка/noun:anim:f:v_naz:xp1,ласка/noun:inanim:f:v_naz:xp2,</S>] <P/> ]

Here “ласка” gets SENT_END, but then the sentence has one more token “\n”, marked as PARA_END. Interestingly even though this last token is \n and isWhitespace() returns true, it’s returned as part of sentence.getTokensWithoutWhitespace(). So rules that will get \n as a regular token.