Posts Tagged ‘Stanford Literary Lab’

Big Data Analysis meets the Liberal Arts: Will Lit Crit Ever Be the Same?

Friday, May 23rd, 2014

shakespeare-lg

As a tech B2B writer here at McBru, I write a good deal about Big Data, the enormous information flows derived from sensors, social media feeds, and device-to-device communications. For example, in an average airplane there are more than 50,000 sensors constantly monitoring everything from electrical flows to air quality. These sensors create between 5 to 6 petabytes of data per flight (a petabyte is a million gigabytes), and sifting through this information to detect patterns and anomalies is the job of big data analytics, a major new frontier for technical research.

But not so many years ago, while in university, I studied French and English literature and read hundreds of novels, scores of poems, and piles and piles of learned treatises on my way toward a graduate degree. We didn’t have the concept of big data back then, but I can see now that I was acting as my own big data analytics solution as I powered through all those pages in search of “actionable intelligence.”

So I shouldn’t be surprised that cutting-edge literary theory has embraced big data analytics to render new insights into the old-school study of literature. I have chanced across a number of articles recently that describe how the established literary canon has yielded new secrets after being processed through the algorithms of big data analysis.

  • In “Shakespeare’s Data,” in May’s hardcover edition of Wired magazine, Clive Thompson describes how two PhD students at the Stanford Literary Lab fed the content of 2,958 19th century novels through a series of big data analytics tools. One interesting pattern to emerge was that, as the century progressed, words describing action and body parts became more prevalent. The researchers concluded that increasing urbanization during the 19th century brought people closer together physically and people’s bodies and actions were increasingly difficult to ignore. Seemingly, after the industrial revolution, no one was far from the maddening crowd.
  • In The Data-Mining’s The Thing: Shakespeare Takes Center Stage In The Digital Age from Fast Company, Neil Ungerleider writes that the “same techniques used by businesses to analyze web content and by marketers to target audiences […] have big ramifications for Shakespeare–and have helped settle long-standing academic arguments.” Officials at the Folger Shakespeare Library fed portions of the Bard’s plays through rhetorical analysis tools and data-mining technics to discover distinct linguistic similarities between the tragedy Othello and Shakespeare’s comedy plays. In particular, the comedy Twelfth Night recycles a number of linguistic conventions and themes found in Othello.
  • Shakespeare, Herman Melville and today’s hip-hop artists were on the mind of data scientist Matt Daniels. He wanted to determine how the vocabulary of hip-hop artists stacked up against these two giants of literature. Using a research methodology called token analysis, Daniels compared 35,000-word data sets from the writings of Shakespeare, Melville, and 85 hip-hop performers (he used the first 5,000 words of seven of Shakespeare’s plays, the first 35,000 words of Moby Dick, and 35,000 words from the lyrics of published songs by the 85 performers in question). The biggest vocabulary? Somewhat surprisingly, the rapper Aesop Rock came out on top with 7,392 unique words used within his data set. Melville was certainly near the head of the class, with 6,022 unique words, while that slouch Shakespeare was closer to the middle of the pack with the use of 5,170 unique words in his data set.

Big data analysis will probably never dislodge more traditional literary theory from the classroom, but it can help tease out unexpected patterns and linguistic relationships and offer insights into language and themes that are invisible to more conventional critiques. And it’s kind of cool to realize that every book in your library is in fact, for better or worse, a big data flow.