Back to LanguageTool Homepage - Privacy - Imprint

AnnotatedTextBuilder issue?

(Rick Meyer) #1

We use LanguageTool against text that will be presented on a web page, so it naturally includes HTML elements. We use the AnnotatedTextBuilder to separate the text from the HTML. We noticed a bit of odd behavior recently though. If the following HTML is used:
<p>Magma doesn't taste good</p><p>I don't recommend it.</p>

With the <p> and </p> tags being placed in the markup field and the rest in the text, it appears that the logic is concatenating the text elements so the word "goodI" is being considered and obviously then marked as misspelled.

Clearly in this case there should be a . at the end of the first sentence, but we have observed other instances like this where there was no punctuation necessary. This is just one example where I was able to recreate the issue.

(Daniel Naber) #2

Yes, this is a problem that's not so easy to solve. Imagine the elements were not <p> but <span> - the behavior would be correct then. So I cannot really offer a solution other than suggesting preparing the HTML so that block-level elements like <p> also have according line breaks.