The rule could be improved. I looked on COCA, and most instances of ‘English speaking’ with no hyphen are not correct. All instances of ‘French speaking’ should have a hyphen. (No instances of ‘Portuguese speaking’!)
The rule is in \resource\en\compounds.txt.The list of candidate languages for the rule is long: http://www.nationsonline.org/oneworld/language_code.htm. As an alternative to adding a long list of words to compounds.txt, I think that a better method is to have a grammar rule. The languages could be specified in grammar.xml in an entity definition, in the same way that we have entity definitions for weekdays and for months.
@danielnaber, what do you think is the best method?
The entity approach sounds fine. We probably shouldn’t list thousands of languages, as it might slow down the regular expression (I haven’t tested that, though).