Multithreaded LT optimization (take 3)

So with fork-join pool the order of the rule execution may be different, but I thought we had code to remove overlapping errors with deterministic logic so it should have not mattered.
The performance can also vary due to different number of cores and conditions (especially if run in the cloud). My benchmarks were performed on i7 with 8 logical cores (4 physical).