News your connection to The Boston Globe

Harvard project to scan millions of medical files

Harvard scientists are building a powerful computer system that will use artificial intelligence to scan the private medical files of 2.5 million people at local hospitals, as part of a government-funded effort to find the genetic roots of asthma and other diseases.

The $20 million project -- which would probe more deeply and more quickly into medical records than human researchers are capable of -- is designed to find links between patients' DNA and illnesses. Although the effort could raise concerns about privacy, researchers say the new program, called ''I2B2" (for ''Informatics for Integrating Biology and the Bedside") would respect the strict guidelines set out in federal and state laws, and could be a powerful tool for many kinds of research.

Hospitals gather huge amounts of information from patients each day -- from blood tests to chest X-rays and brain scans. For decades, researchers have pored through these records and gleaned insights that have helped millions of Americans. Now, the Harvard team hopes to put far more information at the fingertips of researchers, and to speed the process with sophisticated automation.

Scientists said the Harvard work and similar efforts elsewhere increase the stakes in the nation's move to medical records stored electronically.

With mounting examples of personal financial information being compromised, work such as this will have to be done with extreme care. Scientists also said, however, that if the project is successful, it would be widely copied -- and it could mean that studies that now take months or years could be done in weeks or even minutes.

''If we could use routine clinical care to generate new findings without having to do multimillion-dollar studies, that would be a true change in the way medical discovery is done," said Dr. Isaac Kohane, an associate professor at Harvard Medical School who is one of the project's directors. ''We want to use the healthcare system as a living laboratory."

All of the records -- from patients at Massachusetts General Hospital, Brigham and Women's Hospital, and several Partners HealthCare hospitals -- are protected by multiple layers of security designed to prevent private medical information from being released, the scientists said. None of the information will be sold, said John Glaser, the project's other director, and the chief information officer for Partners HealthCare.

Funding for the five-year I2B2 project began in the fall of 2004; researchers are now getting the first hints of success and are forming plans to contact patients.

The first study to be carried out under the project is an effort to understand the genetic roots of asthma, which afflicts about 20 million Americans. For reasons that are not well understood, some asthma patients do not respond well to the usual treatments and suffer repeated, frightening attacks that send them to the emergency room, said Dr. Scott Weiss, a scientist at the Channing Laboratory at Brigham and Women's Hospital who is leading the asthma team.

Weiss said he thinks there is a genetic signature shared by these severe asthma patients. And if doctors had a way to identify these patients ahead of time, he said, it would be a tremendous boon.

''Instead of relying on trial and error over two years, you could learn this upfront, saving the patient a lot of agony, and the physician a lot of time and effort," Weiss said.

However, Weiss said, traditional methods to study this problem are time-consuming and expensive, often prohibitively so. First, he would need to recruit hundreds or thousands of asthma patients, using everything from signs on a subway to ads in asthma newsletters. Then, he said, researchers would need to sort through the volunteers to get the right mix of people for scientific study, and follow them for years to get reliable medical histories.

In theory, much of the information a researcher like Weiss needs may already be sitting in patient files. But typically it would take a doctor to carefully review each file -- looking for clues here and there, resolving seeming contradictions -- to draw out key information, such as why a patient has been hospitalized in the past and whether he or she smokes.

This is how Kohane and the other scientists hope to use artificial intelligence. For example, the computer will use a technique called ''natural language processing" to determine whether a patient is a smoker, according to Dr. Shawn Murphy, an assistant professor of neurology at Harvard Medical School. The computer, he said, is being programmed to seek out many phrases -- such as ''smoker" or ''no bad habits" -- and then weigh them, in their context, to come to a conclusion.

''Just to make that one decision, it takes a lot of computer power," Murphy said.

In one sense, the project is quite risky, because it will work only if they can coax the computer to extract fairly accurate information from files that were not designed to be used for research.

But an early test of one of the artificial intelligence techniques found that the computer was able to deduce the main reason for a hospitalization almost as accurately as a doctor, missing only a few times out of 200 case files, Weiss said. Now the team is working on software to extract smoking histories and other key information.

By fall, Weiss said he plans to have a working database of asthma patients, including all the information extracted by the computer. After testing and refining this, he said, he hopes to begin approaching doctors by next spring to see whether their patients will participate and give blood.

There are other teams in I2B2 working on genetic studies of hypertension, type 2 diabetes, and Huntington's disease, a devastating degenerative disease of the brain.

The same set of tools, though, could speed clinical research into many areas, Murphy said.

For example, he said, if the software can determine what medications a patient is taking and what health problems the patient is having, the computer system could be told to search the entire patient population for side effects of new drugs, alerting the team if it found an unusual pattern.

In theory, such studies could be done in minutes and the team could set up thousands to run constantly, Murphy said.

For the genetics work, though, such instant studies are not possible because patient DNA is not in their file. But the I2B2 team is exploring the possibility of setting up a large DNA bank.

Doctors or nurses would ask patients who come through the Partners HealthCare system if they are willing to have a small amount of blood drawn and kept for future research, said Kohane, who is also chairman of the informatics program at Children's Hospital Boston.

If this were established as a part of I2B2, this would allow scientists like Weiss to do an entire study -- recruiting patients, getting their medical history, analyzing their DNA, and finding a result that might improve asthma treatment -- with unprecedented speed and without troubling a single patient.

But setting up such a bank would raise thorny logistical, technical, legal, and ethical questions, said Dr. Pearl O'Rourke, director of human research affairs for Partners HealthCare. O'Rourke said that Partners has come to see such ''biobanks" as a vital research tool, and that she has been involved in intensive discussions of the issue over the past six months. She will meet with the I2B2 team this month, she said.

The data used by I2B2 are protected by a number of methods, according to members of the team. It is scrambled via advanced encryption and is password protected. It is kept behind a firewall that protects outsiders from gaining access. The healthcare data are kept separate from the patients' identities. And nobody can get access to detailed information unless they have submitted a lengthy description of their research project and received approval from the Partners review board.

This body ensures that human subjects are protected and that no unauthorized medical information is ever released.

Dr. Norman Fost, a bioethicist at the University of Wisconsin at Madison, who is not involved in the project, said that I2B2 does not raise privacy issues that are not already raised by electronic medical records.

One advantage of electronic medical records, he said, is that it is easier to monitor who is gaining access to them, and to look for patterns of unusual activity.

Other medical centers around the country, particularly the Mayo Clinic, with locations in Minnesota, Arizona, and Florida, are working to develop better ways to use patient records to speed research. But the National Institutes of Health chose Boston as the place to see how much computers can extract from health records. The grant was given with the understanding that all the new software will be freely available to the nation's hospitals.

''Ultimately," Kohane said, ''the public will have to decide: Do they want research done this way or not?"

Gareth Cook can be reached at

Pop-up GLOBE GRAPHIC: Taming the paper tiger
Today (free)
Yesterday (free)
Past 30 days
Last 12 months
 Advanced search / Historic Archives