Interpreting Lit by Computer

Tracy and Hepburn in “Desk Set”


The political news these days is so grim that I’m taking a break and turning my attention to a Washington Post article I saw a couple of months ago about how “researchers have quantified what makes us love Harry Potter.” My negative reaction upon encountering the headline didn’t dissipate as I read the article, but it gives me the opportunity to share a very funny episode in David Lodge’s campus novel Small World (1984).

According to the researchers, our emotional journey through a novel can be “analyzed, quantified and graphed”:

In a fascinating new study, researchers use machine-learning techniques to analyze 1,327 literary works — including Romeo and Juliet, Frankenstein and Harry Potter — and reveal what exactly it is about popular stories that makes us love them most.

The researchers appear to have been inspired by Kurt Vonnegut, who used to diagram our emotional ride through “Cinderella,” Hamlet, the Bible, and other such stories. He didn’t have the advantage of computers, however:

The research draws on a kind of glossary of emotion they created by crowdsourcing emotional ratings for 10,000 of the most common words in the English language. Words such as “death,” “rape,” “cancer” and “die” rank at the bottom of the scale, while words like “love,” “laugh” and “happiness” are at the top. 

The researchers use the glossary to create a snapshot of more than a thousand literary works, mostly fiction, available from the free digital library Project Gutenberg. The result is thousands of graphs of what Andrew Reagan, one of the researchers, calls “the emotional experience of the reader.”

While they discovered that “individual stories may have very complex emotional arcs,” looking at the data from a broader angle revealed that

there were six types that fit 85 percent of the books they had analyzed…

Roughly one-third of the stories were either rags-to-riches stories, in which the emotional arc rises through the bulk of the story, or the opposite, riches-to-rags stories, in which it broadly falls. Romeo and Juliet and many of Shakespeare’s tragedies show up in this second category.

The researchers also graphed

–“man-in-a-hole” stories (Vonnegut’s phrase), “where the emotional arc of a story falls, then rises.” The Sherlock Holmes stories fit this pattern;
–Icarus stories “in which the emotional arc rises, then falls”;
–Cinderella stories, “where emotions rise, then fall, then rise again”; and
–Oedipus stories, a “fall-rise-fall pattern.”

The article concludes with the researchers saying, “there is much more work to be done” (now there’s a familiar story arc) and envisioning future projects that will examine

the popularity of story arcs across cultures and time. But to investigate whether certain story types are more popular than others, they analyze how often stories with certain emotional arcs are downloaded from Project Gutenberg, and find that stories with the Icarus, Oedipus and Man-in-a-hole arcs are downloaded most.

This research project is a successor, not only to Vonnegut, but also to the structuralism of the 1970s and 1980s. But while Vonnegut at least had a whimsical sense of humor about what he was up to, structuralism was a dry-as-dust endeavor that generally led people to that overwhelming reaction, “Who cares?” (a.k.a. “so what!”). Reducing literature to story arcs is like reducing love to evolutionary biology or God to a special gene: there may be smidgeon of insight but the most important part of the endeavor gets lost in the process.

Vonnegut at least was tickled by jolting juxtapositions, which are of a piece with his surreal fiction. Once turned into an academic science, however, such pattern finding becomes Wordsworth’s “we murder to dissect.”

Which leads me to the murder in Lodge’s 1984 novel. The perpetrators are two “computational stylistics” English professors at a fictional English University, the victim an author from the “angry young men” school who is no longer young. I quote at length to give you the full flavor. The author describes being introduced to Robin Dempsey’s research:

“Anyway,” [Dempsey] went on, “when we heard that the University was going to give you an honorary degree, we decided to make yours the first complete corpus in our tape archive.” “What does that mean?” I said. “It means,” he said, holding up a flat metal canister rather like the sort you keep film spools in, “It means that every word you’ve ever published is in here.” His eyes gleamed with a kind of manic glee, like he was Frankenstein, or some kind of wizard, as if he had me locked up in that flat metal box. Which, in a way, he had.”

Thanks to Dempsey, the author learns that the most common word in his fiction (other than articles, conjunctions, etc.) is “grease” and its various forms and applications:

“I didn’t believe him at first. I laughed in his face. Then he pressed a button and the machine began listing all the phrases in my works in which the word grease appears in one form or another. There they were, streaming across the screen in front of me, faster than I could read them, with page references and line numbers. The greasy gloom, the roads greasy with rain, the grease-stained cuff, the greasy jam butty, his greasy smile, the grease-smeared table, the greasy small change of their conversation, even, would you believe it, his body moved in hers like a well-greased piston. I was flabbergasted, I can tell you. My entire oeuvre seemed saturated in grease. I’d never realized I was so obsessed with the stuff. Dempsey was chortling with glee, pressing buttons to show what my other favorite words were. Grey and grime were high on the list, I seem to remember. I seemed to have a penchant for depressing words beginning with a hard ‘g.’ Also sink, smoke, feel, struggle, run and sensual.

Now the murderous effect. The author becomes so self-conscious about his predictability that he can’t write anymore:

Robin Dempsey gave me a printout of the whole thing, popped it into folder and gave it to me to take home. “A little souvenir of the day,” he was pleased to call it. Well, I took it home, read it on the train, and the next morning, when I sat down at my desk and tried to get on with my novel, I found I couldn’t. Every time I wanted an adjective, greasy would spring into my mind. Every time I wrote he said, I would scratch it out and write he groaned or he laughed, but it didn’t seem right—but when I went back to he said, that didn’t seem right either, it seemed predictable and mechanical. Robin and [co-researcher] Josh had really fucked me up between them. I’m never been to write fiction since.

Literary interpretation for me is a human endeavor, a way to simultaneously find meaning in the work and in the world. Diagramming emotional journeys doesn’t begin to do justice to the complex relationships we develop with Harry Potter, Elizabeth Bennett, Jane Eyre, and all the others.

So much for “fascinating new studies” that “reveal what exactly it is about popular stories that makes us love them most.”

This entry was posted in Lodge (David) and tagged , , , , . Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.


  • Literature is as vital to our lives as food and shelter. Stories and poems help us work through the challenges we face, from everyday irritations to loneliness, heartache, and death. Literature is meant to mix it up with life. This website explores how it does so.

    Please feel free to e-mail me [rrbates (at) smcm (dot) edu]. I would be honored to hear your thoughts and questions about literature.

  • Sign up for weekly newsletter

    Your email will not be shared or sold.
    * = required field

    powered by MailChimp!
  • Twitter Authentication data is incomplete