I'm pretty sure that our Wikipedia snapshot contains a large corpus. Can I extract the entire database into a readable text format? If yes, then please let me know.
Yes, you can fetch the wikidump (e.g. at http://dumps.wikimedia.org/enwiki/latest/) and then parse the xml.Then there are various wikipedia xml parsing libraries for different programming languages.