To get the most out of my corpus, I would like to be able to split it into one that has only sentences with a mistake, one that only has lines without mistake.
The purpose of this, is to find more mistakes that are not seen yet.
Trying to get this done using the local web server is to slow. Running 8 concurrent instances of the PHP program that feeds sentences to the server, only gets the server up to 5 cpu’s, and results in just a few MB’s processed per day. And there is 20GB to do.
LT’s command line is much faster, but does not output full sentences, nor is it able to split into 2 parts.
Is anyone able to create such a utility?
Maybe it is even possible to store the output of the ones having a mistake into files per rule/subrule, or in just one file, putting rule and subrule in front of the sentence?