n-Grams and Fashion: Term Frequency in Vogue
One of the earliest projects I completed at Yale was a term frequency viewer for the fashion magazine Vogue. I used the excellent open-source software Bookworm, written by Ben Schmidt and others. I was lucky to have at hand over 400,000 pages of Vogue magazine, originally digitized and OCR'd as part of the ProQuest Vogue Archive product. The task was to transform these XML files into the raw text that would be ingested into Bookworm – and to figure out how to architect the system so that readers could click through to individual articles on ProQuest's systems for closer reading after they discover a trend that interests them. (These links would only resolve on campuses that subscribe to the ProQuest product, of course – Vogue is under several layers of copyright and license protection.)
Here are some examples of interesting patterns the tool reveals – click on each image to go to the interactive tool:
Fur and animal Pelts
The mid-century obsession with mink is in stark relief here. And despite the archaic sound of shearling to my ears, it emerges in the 1970s. (Blame Uggs...)
Sexy does not appear in print till the 1960s, and check out the early lead for pretty around the turn of the century before plummeting in the 1920s.