Done (https://github.com/languagetool-org/languagetool/commit/ea3c64ba3fbd46b884456c71b079370a584ca7ff). Thanks @tiagosantos.
For English, CompoundRule.java “Checks that compounds (if in the list) are not written as separate words.” The list of terms is in compounds.txt.
Many of the terms in ‘Hiphenated Compound Extractor 1.0 - EN-GB Example.ods’ can be separate. For example, ‘air-cooled’ as an adjective is correct. But, as a noun (air) and a verb (cooled), the words are separate: “After the air cooled, moisture began to form on the sides of the vessel”. I did not add such terms to compounds.txt.
compounds.txt is applicable to all varieties of English. Thus, I did not add terms such as ‘aero-engine’, which can be spelled as ‘aero engine’ in AmE (and possibly other varieties of English).
For the words to ‘cow-lick’, I often looked at the NOW corpus (English-Corpora: NOW). But, the process was very slow (the best part of 2 days). Thereafter, I did not check the terms as carefully. If I found one counter-example where the term could be used with a space, I did not add the term.
Hiphenated Compound Extractor 1.0 - EN-GB Example-mfu-comments.ods (85.5 KB)
The file ‘Hiphenated Compound Extractor 1.0 - EN-GB Example-mfu-comments.ods’ shows the terms from Column C from ‘Hiphenated Compound Extractor 1.0 - EN-GB Example.ods’, the terms that I put in compounds.txt, and a comment or a counter-example for each term that I did not put in compounds.txt.
I expect that there will be some false-positive warnings in the regression test, and I will correct those errors tomorrow.
Aside. This task took a long time, and so I am unlikely to devote more time to LT until the new year.