NICAR 2015: Data from scratch — How to crowdsource data

We know data tells us a lot. We write programs to automate data scraping. We spend hours creating data visualizations that help readers see what they need to see. We use data to make claims and generate stories that are reliable and have impact.

Data is important and we seem to be surrounded by it. But that's not quite true. Sometimes, there is no data?

A session at NICAR that really resonated with me was Data from Scratch: When data doesn’t exist, led by Griff Palmer, Ricardo Brom and Lisa Pickoff-White. Pickoff-White shared her experience building PriceCheck, a crowdsourced project that KQED launched last year to answer the question “How much does health care cost?”

The team wanted to compare and contrast the costs of certain procedures or services with and without insurance. The biggest problem was that contracts between patients and their insurance providers were confidential, so no one could get to the information. Except the patient, that is.

Source: blogs.kqed.org

So they crowdsourced for data. KQED was able to get hundreds of users to enter their personal information about insurance benefits (see above) onto the site and then create a database to search for procedures and their respective costs. For example, a particular chest x-ray within 50 miles of my own hometown in the Bay Area costs $107, but one patient paid $36 because her insurance was able to cover the rest.

Strategies

For this project, Pickoff-White mentioned specific strategies they used to make sure they got enough accurate and useful data to “make apples-to-apples comparisons.” Here are seven strategies and tactics she used to get clean data that you can apply to your own projects:

  • Get the ball rolling. This is pretty simple. The team used social media and their on-air broadcast presence to take advantage of the trust that people already had in KQED.
  • Report during the process. Here is a list of stories the team wrote when they found particularly large disparities in costs with and without insurance. This kept their information relevant and allowed them to push updates to users as they continued to ask for data.
  • Ask for one thing at a time. For instance, they would push a request for information on chest x-ray costs and then later another on mammograms. By splitting up the procedures they were asking for, they could target specific people in a wide range of patients and get all the information on one thing at a time.


Source: blogs.kqed.org

  • Do the hard work for the users. The team found that users were making mistakes when filling out the survey on their health benefits, so they implemented autocompleting for procedures they looked for and used Google places to standardize the input of the medical care providers.
  • Explain benefits to users. A lot of times, they found that patients didn’t know how to properly read their benefits. This was an opportunity for them to help their target audience learn and contribute their information at the same time.
  • Use common sense. The data reporters had a general understanding about health benefits and costs, so they were able to pick out careless mistakes. For example, Pickoff-White noticed that $3592.50 was unusually steep for a certain procedure. She contacted the user, who corrected the mistake to $359.25. This led to the next strategy in which they would...
  • Ask for contact information. They included a space for the user to enter in an email address, to correct errors exactly like the one above.


“Some data is better than no data”

Pickoff-White admits it’s hard to determine how much data is enough data to mean something. But because of the specificity of each patient’s experience, the database displays all cases separately which reflects the transparency they aim for. It’s main function is so that someone can search a database and see information on and cost disparities for someone else in a similar situation. Their goal is to get as much data as possible but not necessarily to generalize every patient’s experience into one whopping conclusion.

Here is the powerpoint from the rest of the Data from Scratch session at NICAR.

About the author

Ashley Wu

Undergraduate Fellow

Designing, developing and studying journalism at Northwestern. Also constantly scouting the campus for free food.

Latest Posts

  • A Brief Introduction to NewsgamesCan video games be used to tell the news?

    When the Financial Times released The Uber Game in 2017, the game immediately gained widespread popularity with more than 360,000 visits, rising up the ranks as the paper’s most popular interactive piece of the year. David Blood, the game’s lead developer, said that the average time spent on the page was about 20 minutes, which was substantially longer than what most Financial Times interactives tend to receive, according to Blood. The Uber Game was so successful that the Financial...

    Continue Reading

  • With the 25th CAR Conference upon us, let’s recall the first oneWhen the Web was young, data journalism pioneers gathered in Raleigh

    For a few days in October 1993, if you were interested in journalism and technology, Raleigh, North Carolina was the place you had to be. The first Computer-Assisted Reporting Conference offered by Investigative Reporters & Editors brought more than 400 journalists to Raleigh for 3½ days of panels, demos and hands-on lessons in how to use computers to find stories in data. That seminal event will be commemorated this week at the 25th CAR Conference, which...

    Continue Reading

  • Prototyping Augmented Reality

    Something that really frustrates me is that, while I’m excited about the potential AR has for storytelling, I don’t feel like I have really great AR experiences that I can point people to. We know that AR is great for taking a selfie with a Pikachu and it’s pretty good at measuring spaces (as long as your room is really well lit and your phone is fully charged) but beyond that, we’re really still figuring...

    Continue Reading

  • Capturing the Soundfield: Recording Ambisonics for VR

    When building experiences in virtual reality we’re confronted with the challenge of mimicking how sounds hit us in the real world from all directions. One useful tool for us to attempt this mimicry is called a soundfield microphone. We tested one of these microphones to explore how audio plays into building immersive experiences for virtual reality. Approaching ambisonics with the soundfield microphone has become popular in development for VR particularly for 360 videos. With it,...

    Continue Reading

  • Audience Engagement and Onboarding with Hearken Auditing the News Resurrecting History for VR Civic Engagement with City Bureau Automated Fact Checking Conversational Interface for News Creative Co-Author Crowdsourcing for Journalism Environmental Reporting with Sensors Augmented Reality Visualizations Exploring Data Visualization in VR Fact Flow Storytelling with GIFs Historical Census Data Information Spaces in AR/VR Contrasting Forms Of Interactive 3D Storytelling Interactive Audio Juxtapose Legislator Tracker Storytelling with Augmented Reality Music Magazine Navigating Virtual Reality Open Data Reporter Oscillations Personalize My Story Photo Bingo Photojournalism in 3D for VR and Beyond Podcast Discoverability Privacy Mirror Projection Mapping ProPublica Illinois Rethinking Election Coverage SensorGrid API and Dashboard Sidebar Smarter News Exploring Software Defined Radio Story for You Storyline: Charts that tell stories. Storytelling Layers on 360 Video Talking to Data Visual Recipes Watch Me Work Writing and Designing for Chatbots
  • Prototyping Spatial Audio for Movement Art

    One of Oscillations’ technical goals for this quarter’s Knight Lab Studio class was an exploration of spatial audio. Spatial audio is sound that exists in three dimensions. It is a perfect complement to 360 video, because sound sources can be localized to certain parts of the video. Oscillations is especially interested in using spatial audio to enhance the neuroscientific principles of audiovisual synchrony that they aim to emphasize in their productions. Existing work in spatial......

    Continue Reading

  • Audience Engagement and Onboarding with Hearken Auditing the News Resurrecting History for VR Civic Engagement with City Bureau Automated Fact Checking Conversational Interface for News Creative Co-Author Crowdsourcing for Journalism Environmental Reporting with Sensors Augmented Reality Visualizations Exploring Data Visualization in VR Fact Flow Storytelling with GIFs Historical Census Data Information Spaces in AR/VR Contrasting Forms Of Interactive 3D Storytelling Interactive Audio Juxtapose Legislator Tracker Storytelling with Augmented Reality Music Magazine Navigating Virtual Reality Open Data Reporter Oscillations Personalize My Story Photo Bingo Photojournalism in 3D for VR and Beyond Podcast Discoverability Privacy Mirror Projection Mapping ProPublica Illinois Rethinking Election Coverage SensorGrid API and Dashboard Sidebar Smarter News Exploring Software Defined Radio Story for You Storyline: Charts that tell stories. Storytelling Layers on 360 Video Talking to Data Visual Recipes Watch Me Work Writing and Designing for Chatbots
  • Oscillations Audience Engagement Research Findings

    During the Winter 2018 quarter, the Oscillations Knight Lab team was tasked in exploring the question: what constitutes an engaging live movement arts performance for audiences? Oscillations’ Chief Technology Officer, Ilya Fomin, told the team at quarter’s start that the startup aims to create performing arts experiences that are “better than reality.” In response, our team spent the quarter seeking to understand what is reality with qualitative research. Three members of the team interviewed more......

    Continue Reading

Storytelling Tools

We build easy-to-use tools that can help you tell better stories.

View More