To prevent LT from splitting a sentence on a word that contains a full stop, we use segment.srx (http://wiki.languagetool.org/customizing-sentence-segmentation-in-srx-rules).
File location \languagetool-core\src\main\resources\org\languagetool\resource\segment.srx is in the GitHub clone, but it not is in the daily snapshot. If I add a rule to segment.srx, how do I test it to make sure that the new rule is correct?
I see that the Ratel editor lets me “Apply the rules on input files for testing” (http://okapiframework.org/wiki/index.php?title=Ratel). It that testing the reason for using an SRX editor?
LT has incorrect segmentation for the second sentence:
Correct: It’s a case of cats v. dogs.
Incorrect: It’s a case of Jones v. Smith.
Is this rule correct to prevent a break on the ‘v.’ in Jones v. Smith? (At this stage, I do not want to spend time downloading and learning how to use an SRX editor.)
<rule break="no"> <beforebreak>\b[A-Z][a-z]+\sv\.\s[A-Z][a-z]+</beforebreak> <afterbreak></afterbreak> </rule>
- The SRX rules are cascading. Where should the rule go in the English section of segment.srx? At the end of the ‘no’ rules?