(Library of Congress)

Everyone's a historian now

How the Internet - and you - will make history deeper, richer, and more accurate.

Email|Print|Single Page| Text size + By Stephen Mihm
May 25, 2008

UNTIL RECENTLY, IF you were a historian and you wanted to write a fresh account of, say, the Battle of Leyte Gulf in World War II, research was a pretty straightforward business. You would pack your bags and head to the National Archives, and spend months looking for something new in the official combat reports.

Today, however, you might first do something very different: Get online and pull up any of the unofficial websites of the ships that participated in the battle - the USS Pennsylvania, for example, or the USS Washington. Lovingly maintained by former crew members and their descendants, these sites are sprawling, loosely organized repositories of photographs, personal recollections, transcribed log books, and miniature biographies of virtually every person who served on board the ship. Some of these sites even include contact information for surviving crew members and their relatives - perfect for tracking down new diaries, photographs, and letters.

Online gathering spots like these represent a potentially radical change to historical research, a craft that has changed little for decades, if not centuries. By aggregating the grass-roots knowledge and recollections of hundreds, even thousands of people, "crowdsourcing," as it's increasingly called, may transform a discipline that has long been defined and limited by the labors of a single historian toiling in the dusty archives.

Some venerable research institutions are already starting to harness the power of crowds in an organized way. The Library of Congress recently launched a project on the photo-sharing site Flickr that invites visitors to identify and analyze photographs in its collection, while the National Archives, working in partnership with a for-profit company, is inviting people to do the same to online versions of its documents. And a growing number of projects are taking the logical next step, creating "raw archives" of photographs and documents for momentous events: Sept. 11, for example, or Hurricane Katrina.

"When a historian writes about a particular period of American history, he has a few hundred pages to do so, and things inevitably get left out," says Daniel Cohen, director of the Center for History and New Media at George Mason University. Projects that employ crowds of researchers and writers, he says, "allow for a wider array of details and perspectives that a single master narrative doesn't allow."

So far, only a handful of professional historians have begun to exploit crowdsourcing, which remains a relatively crude tool for gathering and organizing knowledge. But as the power of crowds meets the practice of history, these online repositories represent a remarkable change not only in how historical materials are gathered and organized, but, perhaps most important, in how deeply and broadly the past can be understood.

. . .

The closest thing to crowdsourcing that most people have encountered is Wikipedia, the site that relies on the unpaid labor of thousands of volunteers to write and edit encyclopedia-style entries on a variety of topics. Many entries are historical in nature, and the scope of the collaboration is staggering: By one estimate, the entry on Franklin Delano Roosevelt depended on the labor of well over 500 people posting more than a thousand edits in a four-year period.

But Wikipedia does not actually produce new knowledge. In fact the site bans "original research," which means that entries have to distill what has already been published on a given topic.

It's easy to imagine Wikipedia's open-ended, collaborative spirit being harnessed to produce fresh historical information, and in the last year, a growing number of sites have emerged to do just that. Perhaps the best known is the pilot photo-identification project launched last year by the Library of Congress, home to some 12 million photographs - of which half are of limited use to researchers because they haven't been fully identified.

Late last year, the Library of Congress posted several thousand of its photographs on Flickr and asked the public for help: What is this? Who is this? When was it taken? Curator Helena Zinkham, who oversaw the program, was stunned to discover how quickly the gaps were filled by amateur enthusiasts - and in some cases, people with firsthand recollections.

This was particularly the case where the images attracted the attention of a particular group of enthusiasts: military aviation buffs, for example, or aficionados of early baseball. One collection depicted early-20th-century boxers, many without vital information - perhaps just a last name, like "Wells."

"By the time the conversation was done," Zinkham says, "we were able to tell Matt Wells from Bombardier Billy Wells."

Other famed photo collections have started to follow suit. The Powerhouse Museum in Sydney, Australia, is also using Flickr, uploading its massive collection of glass plate negatives depicting early life in Australia. Similar projects aimed at genealogists attempt to turn mystery family photos over to the digital masses for identification.

As archivists submit to the wisdom of crowds, they're finding that the result is both richer and more complicated than mere accurate identification. George Oates, a senior program manager at Flickr who oversaw the Library of Congress project, believes that it's somewhat unusual for a single person to make a precise identification. Instead, it's more common for many people to contribute small pieces of information: related stories, links, anecdotes, and opinions about the subject of the photograph.

Take one of the color photographs from World War II that the Library of Congress posted on Flickr. The Library knew that it depicted a woman assembling the bombardier nose of a Flying Fortress, or B-17. Any historian tempted to take the photograph as a reflection of "reality" would do well to read the comments now posted on the website, which convincingly reveal the ways the photograph was staged - the model's makeup, nail polish, and prominently displayed wedding ring - as well as even more esoteric features of the photo, such as the number of flash bulbs used for the shot and the function of the woman's curious red cap. Thanks to these comments, a photograph that might seem like a revealing glimpse of women at work during wartime becomes instead a reflection of the methods of home-front propaganda.

This remarkable enthusiasm hasn't been lost on the private sector. A new company,, was established last year, hoping to make money by charging for access to documents scanned from the collections of the National Archives, with which it signed a controversial agreement last year. The company has digitized some 33 million pages of documents so far, and hopes to have 50 million available by the end of the year.

Footnote has some "teaser" features available for free on its site, the most notable of which is its Interactive Vietnam Wall project, which permits visitors to zero in on a particular name and attach a recollection, a document, a photograph, or a link to other names on the wall. Footnote hopes that by providing scaffolding on which visitors might hang historical information, an elaborate interconnected archive will emerge. More broadly, Footnote is hoping to build its own archive by inviting users to post documents and photographs in their possession.

Beyond just asking online volunteers for help compiling and sorting information, a handful of historians have been asking: What if we used this approach to capture history as it happens?

Several years ago, in the aftermath of Sept. 11, George Mason University history professor Roy Rosenzweig set up a site where people could post photographs, videos, documents, e-mails, and recollections of that day and its aftermath.

The site, now known as the September 11 Digital Archive, was so successful that the Library of Congress selected it as its first significant "digital acquisition." Rosenzweig died last year, but projects that he set up with his successor, Daniel Cohen, continue to set the standard for online archives. Their most recent success is the Hurricane Digital Memory Bank, now a premier online archive of materials relating to Hurricane Katrina.

It's hard to know how historians will use these sites. Cohen believes that the sheer quantity of material has an importance unto itself: The Sept. 11 site was able to amass a quarter million private photographs relating to the tragedy of a single day. Cohen believes that this granularity has its advantages. "There's a quality that comes with the quantity," he says.

. . .

Absent a watchful librarian or archivist, it's natural to wonder about the reliability of the information posted by the crowd. In the case of the 9/11 archive, for example, Cohen recalls that a handful of photos posted to the site had been digitally doctored. After some debate, he decided to keep them in the collection. "Just because it's digital doesn't mean you check your brain at the door," he says. "Plus, there are lies, forgeries, and false things in regular paper archives."

But what of crowds and the actual interpretation of history? How do we know that the people contributing things know what they're talking about? Rosenzweig had raised this issue in a famous article titled "Can History Be Open Source?" in which he evaluated the accuracy of historical entries on Wikipedia. He found that while the style of the entries - much less the grammar - left something to be desired, the facts often checked out.

Helena Zinkham believes that crowdsourcing has the power to be self-regulating. "At times, people who have 'identified' photos have been wrong, but other people have weighed in and disputed the identification," she said. "Because the comments stay with the photograph, there's a degree of certainty about the information. It's not like a book, which might have hidden the uncertainty. You get to see the whole line of thinking, and you can draw your own conclusion."

As history opens itself up to the wisdom of crowds, it is exposing an interesting fact about the profession: Often, amateurs know much more detail about a particular event than the academic historians who interpret it for posterity. Take Civil War battles, for example. A historian of the Civil War might be able to turn to crowds of amateur historians in order to obtain documents, clarify details, and otherwise harness the knowledge of those individuals who are exclusively preoccupied with the minutiae of Pickett's Charge on the third day of the Battle of Gettysburg.

Cohen sees the potential for partnerships between the lone professional historian and crowds of helpers, particularly as the quantity of historical material increases. It's possible, for example, for a historian of Colonial America to read every document written by the founders of the Massachusetts Bay Colony (though such a task would still be time-consuming). It's altogether another thing for a historian of modern America to tackle the vast output of the Bush White House. "One person can't read it," explains Cohen, "but a hundred or thousand could read individual documents and tag them with keywords."

Though Cohen welcomes what he terms a "multiplicity" of historical perspectives, he still thinks that there will always be a place for the individual historian in weaving all those disparate strands into a coherent narrative. "Having the crowd on your side is a good thing at certain stages of the research and publication process," says Cohen. "But at other times, historians will still want to be by themselves, sitting at their computer screen, using their own words to knit things together and make sense of the past."

Stephen Mihm is a history professor at the University of Georgia and author of "A Nation of Counterfeiters" (Harvard, 2008).

more stories like this

  • Email
  • Email
  • Print
  • Print
  • Single page
  • Single page
  • Reprints
  • Reprints
  • Share
  • Share
  • Comment
  • Comment
  • Share on DiggShare on Digg
  • Tag with Save this article
  • powered by
Your Name Your e-mail address (for return address purposes) E-mail address of recipients (separate multiple addresses with commas) Name and both e-mail fields are required.
Message (optional)
Disclaimer: does not share this information or keep it permanently, as it is for the sole purpose of sending this one time e-mail.