All Your Texts Are Belong To Us - Hacking Literature With Perl

In 1996, Don Foster correctly identified Joe Klein as the author of the bestselling political novel "Primary Colors," bringing instant notoriety to himself and to the sub-branch of statistical natural language he called 'literary forensics.' Since then, Internet search engines, open source databases, and enormous digitization efforts like the Gutenberg project have made it easier than ever to unleash computers on text, with fascinating results.

This talk will show how to apply algorithms from fields as diverse as graph theory, signal processing, and information retrieval to literary texts, unlocking their secrets without forcing the programmer to do any actual reading. Whether tracing thematic connections, figuring out whether an author really wrote a given passage, or creating Cliff's Notes-like summaries of long novels, computers can fake a surprising degree of literary acumen. Many of the half-forgotten natural language processing techniques from the 1970's turn out to be perfectly suited to literary analysis, and quite simple to implement.

Come see our open source literary toolkit in action, learn about clever ways to play with natural language, and help bring us closer to the goal of replacing the graduate student in literature with a small Perl script.

You liberal artsy types thought you were immune from outsourcing, didn't you? Via email from Mikey.

But just wait until the Perl script comes up for tenure...

Posted by: Steve Bates on July 6, 2004 9:21 PM

Sounds to me like the old saying needs to be updated: If an infinite number of Perl scripts were typing on an infinite number of computers, one of them could reproduce Hamlet. :-)

Posted by: William Hughes on July 7, 2004 4:30 AM