Back to LanguageTool Homepage - Privacy - Imprint

Example 'generator'

(Ruud Baars) #1

One of the most tedious tasks working on rules is finding real life examples and adding those.
Using LT as a local server, the api and an enormous text file with sentences, a PHP program writes the api output; a different one ‘flattens’ this to records, a third one transforms these into prototype examples and adds those to the grammar.xml as comment in the appropriate rule.

The amount of example is limited to 10% of the found errors, 10 when possible, not more then 100 anyway. Shortest examples get the priority, selecting different marked areas too.

It is far from perfect. There are some assumptions in it, and the way I change the xml is certainly not professional programming. But if anyone is interested, feel free to contact me.

(Tiago F. Santos) #2

It could use an opt-in feature to be done, but I almost bet that sensitive information (even if not necessarily private) would be pushed to git permanent record and eventually distributed via daily builds and/or standard releases. How does one mitigate that?

(Lodewijk Arie van Brienen) #3

most sensitive information requires more than one sentence for context, so I think you could simply do a ‘distributed spread’ for selecting the sentences to be used… ‘destroying’ the sensitive information in the progress.

(Ruud Baars) #4
  • If there is sensitive info, it is also in the corpus.
  • The examples have to be edited anyway, because most will be too long of will contain multiple errors.
  • you could easily remove the commented examples before distribution.
  • If you think it is not useful, just don’t use it.
    And a, great plus: it shows false positives as well, a good way to improve the rule with exceptions

(Ruud Baars) #5

Do you feel like helping to get more real life examples in the grammar.xml? In that case I will submit a version soon.

(Tiago F. Santos) #6


Not at all. This is great. It is great to make sure that there are no significant regressions.
I have misinterpreted and thought this would be fed with the online queries, and they could be exploited to cause hindrance to the project.

(Ruud Baars) #7

You can check it out in the current Dutch grammar.xml. There is a lot of comments in there, marked as <!-- possible examples .

I am working through those from the top downwards, leaving at least 5 examples (when available).