Q. You just published a paper in Science analyzing word use from 5 million books written since the 1500s. What did you hope to learn from that kind of analysis?
Lieberman-Aiden: We had been studying the evolution of language and culture and [wanted to] see whether it was possible to create quantitative accounts of the evolution of language.
Q. Your first major problem was that
Lieberman-Aiden: We needed a data set that was going to be interesting enough to support interesting work, but not so interesting as to be illegal.
Q. So, your solution was to analyze the words individually, rather than as entire texts — but doesn’t that mean there is no context, just isolated words?
Michel: We have short phrases, groups of words. If you’re interested in “Englishmen’’ and you want to find how people talk about it, you can see two words before and two words after. You can’t study long passages, but nonetheless, we do have some local context.
Q. Critics have complained about other bugs as well. What’s your response to them?
Lieberman-Aiden: People should try to maintain the long view. The fact that there are things we still want to do, improvements that we can still envision, is not a bug. It’s a feature of the fact that this is an interesting way of looking at the world and I think that a lot of those features are going to be enabled in the coming years.
Q. Is it possible to learn anything about human health from this data?
Lieberman-Aiden: There is the possibility of doing historical epidemiology [studying the causes and distribution of disease]. If you type in “influenza,’’ you can see that there are big jumps in the frequency of the word “influenza’’ [over time, tied to major outbreaks].
Q. You’ve said this kind of data might be used in the study of humanities the way genetic data is used in biology.
Lieberman-Aiden: Biology went through a transformation in the last couple of decades, a transformation similar to what we anticipate happening in this particular corner of the humanities, where all of a sudden, data sets appear and the quantitative analysis of those data sets starts to illuminate phenomena in completely new ways.
Q. You’ve made the raw data available to the public with a simple graphing program at ngrams.googlelabs.com. That site is quite addictive.
Lieberman-Aiden: It’s been incredibly gratifying to see the extent to which it’s addicted people. There were more than a million queries in the first 24 hours. [It was named] one of the best time-wasters on the Internet. Once you start, it’s very very hard to stop. . . . This was actually a huge source of delays. The difficult state of data delayed us by a year, and the captivating nature of the results delayed us by six months. We would show up to meetings saying we have to get stuff done on the paper. Oh, but we have to check this first, and three hours later, we’d have done 15 analyses [and not worked on the paper at all].
Q. So, have you two broken your addictions yet?
Michel: I haven’t broken mine.
Q. With the public now running queries on your program, have any interesting new questions popped up that you didn’t think to ask?
Lieberman-Aiden: One of my favorites is if you look at the phrases “save the world’’ versus “save the country’’ — since World War II, “save the world’’ has become more prominent than “save the country,’’ where the reverse had been true earlier. We live in a more global world. Saving the country sounds a little bit provincial, whereas it might have sounded expansive to the ears of someone living in 1850.
Q. It also seems like it might inspire people to do more research, to find out why things change at particular times.
Michel: I have certainly tried to look up many more historical facts during this year than I have during the previous 10 years.
Lieberman-Aiden: This is a fun portal to history.
Interview was edited and condensed. Karen Weintraub can be reached at firstname.lastname@example.org.