Google's Ngram Viewer allows anyone to see how often words and phrases appear in millions of books published over the course of centuries.
"The browser is designed to enable you to examine the frequency of words (banana) or phrases ('United States of America') in books over time," wrote the Harvard team of scholars responsible for the analysis that powers the viewer.
According to the scholars, users can search through more than 5.2 million books – about 4 percent of all published books.
If you're into the science, you can read the report. Once you register, it's a free download.
Dubbed "Culturomics," the scholars apply the computational power of computers to analyze large amounts of data in order to better understand human culture. In this instance, selecting a subset of the more than 15 million books digitized by Google in order to quantify the use of language in the literature.
"The resulting corpus contains over 500 billion words, in English (361 billion), French (45 billion), Spanish (45 billion), German (37 billion), Chinese (13 billion), Russian (35 billion) and Hebrew (2 billion)," according to the paper published in the scholarly journal Science.
Taking the tool for a spin on California-related topics, here are some quick and dirty results. I used it mainly to compare words and phrases, and as a lens into the time periods in which they most often appear.
Comparing the frequency with which some California city names appear in the literature predictably shows San Francisco and Los Angeles dominating the landscape, but the links below the visualization throw some very interesting results.
Take this book, bearing a copyright date of 1879 and the title, "The Elite Directory for San Francisco and Oakland."
The Bostonian can go back with his lineage to the ragged refugees who landed on Plymouth Rock; the New Yorker can trace his aristocratic blood direct to the Dutch market gardener knocking about among the cabbage patches of Manhattan Island; the Virginian is proud of his pedigree direct from the gentle dame sold on the auction-block in Jamestown for plug tobacco; the Louisianan can still see, beneath his tingling finger-tips, the tinge of the Creole tide; the Carolinian tells of the Huguenot parent driven from pillar to post; the Washingtonian can prate of the beauty and the chivalry developed by the politician's potent touch, and Kentucky's proud flesh, as we all know, is nothing but blue grass; but where in the name of reason, and research, is the fountain-head of California refinement and respectability?
Yosemite dominates the landscape of references to California parks and outdoor destinations in this example.
When it comes to mountain ranges, the Rockies appear more frequently than the Sierra Nevada range.
References to Californios, the name given to the original Hispanic settlers of early California, appear more in the literature than the zoot suit fashion popular with young Mexican-Americans in Los Angeles during World War II.
If you throw Chicano into the mix, it dwarfs all the other terms. It's also interesting to note that both the term Californio and Chicano really start to show up in books published after 1960.
Finally, references to Hollywood pop up around 1920 and increase over time, while Opera has maintained a pretty steady presence since 1800.
There are a number of things to keep in mind while playing with the Ngram Viewer, and any conclusions drawn from it have to be done with careful consideration of the limitations of the data. Still, it's an addicting tool that's very useful for browsing through books published in different eras.