Roman Numeral Centuries
Should LanguageTool catch Roman Numeral centuries? I know other languages have varying century formats:
The Industrial Revolution occurred in the 19th century. (English)
The Industrial Revolution occurred in the XIX century. (Polish)
The Industrial Revolution occurred in the XIXe siècle. (French)
The Industrial Revolution occurred in the siglo XIX. (Spanish)
A relatively common typo I've seen creep into English texts is "XIXth century".
I currently use this Regex to catch many of them:
Ordinal Form or Typed Out
Another style choice is having the number format or fully typed out (typically in more formal/academic texts):
The Roman Empire fell in the 6th century. (English)
The Roman Empire fell in the sixth century. (English, Formal)
I typically use this Regex to spot the ordinal form:
And to spot the typed out versions, I use this one:
What also has to be looked out for is centuries separated by "and" or "to":
It is estimated that there were eight to ten thousand gondolas during the 17th and 18th centuries in Venice.
In the 13th to 14th centuries, Egypt exported a tremendous amount of sugar to Europe.
Denmark revolutionized its agricultural sector between the late-18th and mid-19th centuries.
\d+(st|nd|rd|th) (and|to) (mid-)*\d+(st|nd|rd|th) [Cc]entur(y|ies)
There may be a few other edge cases of "early" or "late" being used with the centuries as well..