Official Google Blog: Find out what’s in a word, or five, with the Google Books Ngram Viewer
Since 2004, Google has digitized more than 15 million books worldwide. The datasets we’re making available today to further humanities research are based on a subset of that corpus, weighing in at 500 billion words from 5.2 million books in Chinese, English, French, German, Russian, and Spanish. The datasets contain phrases of up to five words with counts of how often they occurred in each year.
The Ngram Viewer lets you graph and compare phrases from these datasets over time, showing how their usage has waxed and waned over the years. One of the advantages of having data online is that it lowers the barrier to serendipity: you can stumble across something in these 500 billion words and be the first person ever to make that discovery.
As much as I get to armchair rib Google for Chrome OS and the App Store, it’s stuff like this: that a company the size of Google can afford to let brilliance go to work in every little corner of the organization, that makes it an exceptionally interesting time in our history. They’re a great model for humanities research right now.