Ideas for better regression checks

I’m looking for ideas on how our nightly regression check could be improved. I can’t make any promises about whether this will actually be implemented, because it might become a lot of work… but please let me know how you’d like the result of the tests to be displayed. There’s no limit to static HTML, it could be a small web app that offers filtering and whatever you can think of.

As a reminder, here’s how it looks like now - it’s just the colored output of the Linux diff command:
https://internal1.languagetool.org/regression-tests//20190422/result_nl_20190422.html

Group the rules per id and subid. Add a summary report, so one can concentrate on most occurring errors.
Have a checksum on the lt output for every input line. Allow for the output to be marked as true positive. In that case, when checksum did not change, do not report it again.