Journalists find stories in all sorts of places. I have spent much of my journalism career, both in newsrooms and classrooms, finding stories in data.
That’s how I found myself crunching data for California Watch, a nonprofit investigative news operation that is part of the Center for Investigative Reporting.
Help us do more.
Reporters Lance Williams and Christina Jewett have spent more than a year investigating the billing practices of Prime Healthcare Services. I was drafted to analyze millions of rows of Medicare patient data to look for patterns and trends that might emerge.
In the fall of 2010, I was living in Lisbon on a four-month Fulbright professorship, lecturing to students and journalism professionals all over Portugal. One day, I got an e-mail from Mark Katches, the editorial director of California Watch, asking if I could be persuaded to help the reporters with the Prime investigation and a couple of other investigative projects.
Mark and I had known each other since our days on the board of directors of the Investigative Reporters & Editors organization. He knew I had done some heavy-lifting data projects during my 20 years with The Miami Herald, before I started teaching at the Walter Cronkite School of Journalism and Mass Communication at Arizona State University in 1996. And I certainly respected his stellar track record as the leader of numerous prize-winning newspaper investigative projects before he joined California Watch and the Center for Investigative Reporting.
I was delighted to accept Mark’s invitation. I’m a strong believer that journalism professors should remain active in the profession, both to keep skills sharp and to be a credible role model to students. And I also believe in the importance of watchdog journalism in our democracy as a way to expose problems and right wrongs. Working with California Watch was a great way to satisfy both needs.
My specialty in investigative reporting is the analysis of large government datasets. This technique, called precision journalism, involves using computer software to reveal patterns in the data. Before journalists began using such tools, investigative stories often relied on anecdotes and claims from sources. Precision journalism, on the other hand, can produce actual evidence from the patterns in the data. For example, as a reporter in Miami, I analyzed the damage patterns from Hurricane Andrew to prove how weakened building codes had magnified the disaster, and I studied thousands of arrest records to uncover major flaws in the criminal justice system.
The California Watch investigation of Medicare billing at hospitals in the state was particularly interesting to me for a variety of reasons. As a citizen and taxpayer – and a soon-to-be Medicare enrollee – I am as concerned as anyone that the nation’s medical system should remain solvent and be administered properly. As an investigative reporter, I was aware that many problems – from waste to outright fraud – had been found in Medicare billing in other states. And as a data analyst, I was eager to tackle the challenge of parsing tens of millions of records.
Much of the work I did for California Watch involved independent testing of tips about billing practices being passed by various sources to Christina and Lance. Sources always have their own reasons for talking with reporters, so Mark insisted on independent confirmation before publishing any stories based on such claims.
Luckily, California’s Office of Statewide Health Planning and Development has gathered an extensive and well-documented collection of public data about every hospital or emergency room patient seen. Great care has been taken to mask the identities of every patient, but the collective patterns in the database of all these anonymous records reveal a lot about how different hospitals are diagnosing and treating patients and billing insurers or the government.
The analysis was done with industrial-grade database software called SAS, which I have been using for such projects for more than 20 years. Using SAS requires writing programs that tell the computer how to read the raw data and how to produce answers to our questions.
Over the past year, I wound up writing about 120 SAS programs to plow through more than 50 million records, finding patterns that showed Prime hospitals were diagnosing certain conditions at rates significantly higher than most other hospitals. And in an effort to go beyond the specific conditions our sources were telling us about, I wrote a program that uncovered a number of other high-rate conditions that bear further investigation.
For one analysis, I used SAS to count how often more than 6,400 diagnosis codes were used in the cases of about 750,000 individual patients, creating a huge database of more than 14.6 gigabytes.
This kind of evidence would be impossible to gather from a warehouse full of file drawers filled with millions of pieces of paper; finding the telltale patterns in a mountain of documents is beyond the human attention span. But in this age of electronic public records, seasoned reporters who know how to use powerful computer tools can see not only the trees, but the whole forest. As an investigative reporter, it’s wonderful to use such tools to uncover problems that otherwise might remain hidden. But as a taxpayer, I often wish government agencies would be doing the same kind of analysis.
Stephen K. Doig is a Pulitzer Prize-winning investigative reporter and the Knight Chair in journalism at Arizona State University’s Walter Cronkite School of Journalism and Mass Communication.