NICAR 2015: Data from scratch — How to crowdsource data

We know data tells us a lot. We write programs to automate data scraping. We spend hours creating data visualizations that help readers see what they need to see. We use data to make claims and generate stories that are reliable and have impact.

Data is important and we seem to be surrounded by it. But that's not quite true. Sometimes, there is no data?

A session at NICAR that really resonated with me was Data from Scratch: When data doesn’t exist, led by Griff Palmer, Ricardo Brom and Lisa Pickoff-White. Pickoff-White shared her experience building PriceCheck, a crowdsourced project that KQED launched last year to answer the question “How much does health care cost?”

The team wanted to compare and contrast the costs of certain procedures or services with and without insurance. The biggest problem was that contracts between patients and their insurance providers were confidential, so no one could get to the information. Except the patient, that is.

Source: blogs.kqed.org

So they crowdsourced for data. KQED was able to get hundreds of users to enter their personal information about insurance benefits (see above) onto the site and then create a database to search for procedures and their respective costs. For example, a particular chest x-ray within 50 miles of my own hometown in the Bay Area costs $107, but one patient paid $36 because her insurance was able to cover the rest.

Strategies

For this project, Pickoff-White mentioned specific strategies they used to make sure they got enough accurate and useful data to “make apples-to-apples comparisons.” Here are seven strategies and tactics she used to get clean data that you can apply to your own projects:

  • Get the ball rolling. This is pretty simple. The team used social media and their on-air broadcast presence to take advantage of the trust that people already had in KQED.
  • Report during the process. Here is a list of stories the team wrote when they found particularly large disparities in costs with and without insurance. This kept their information relevant and allowed them to push updates to users as they continued to ask for data.
  • Ask for one thing at a time. For instance, they would push a request for information on chest x-ray costs and then later another on mammograms. By splitting up the procedures they were asking for, they could target specific people in a wide range of patients and get all the information on one thing at a time.


Source: blogs.kqed.org

  • Do the hard work for the users. The team found that users were making mistakes when filling out the survey on their health benefits, so they implemented autocompleting for procedures they looked for and used Google places to standardize the input of the medical care providers.
  • Explain benefits to users. A lot of times, they found that patients didn’t know how to properly read their benefits. This was an opportunity for them to help their target audience learn and contribute their information at the same time.
  • Use common sense. The data reporters had a general understanding about health benefits and costs, so they were able to pick out careless mistakes. For example, Pickoff-White noticed that $3592.50 was unusually steep for a certain procedure. She contacted the user, who corrected the mistake to $359.25. This led to the next strategy in which they would...
  • Ask for contact information. They included a space for the user to enter in an email address, to correct errors exactly like the one above.


“Some data is better than no data”

Pickoff-White admits it’s hard to determine how much data is enough data to mean something. But because of the specificity of each patient’s experience, the database displays all cases separately which reflects the transparency they aim for. It’s main function is so that someone can search a database and see information on and cost disparities for someone else in a similar situation. Their goal is to get as much data as possible but not necessarily to generalize every patient’s experience into one whopping conclusion.

Here is the powerpoint from the rest of the Data from Scratch session at NICAR.

About the author

Ashley Wu

Undergraduate Fellow

Designing, developing and studying journalism at Northwestern. Also constantly scouting the campus for free food.

Latest Posts

  • Introducing StorylineJS

    Today we're excited to release a new tool for storytellers.

    StorylineJS makes it easy to tell the story behind a dataset, without the need for programming or data visualization expertise. Just upload your data to Google Sheets, add two columns, and fill in the story on the rows you want to highlight. Set a few configuration options and you have an annotated chart, ready to embed on your website. (And did we mention, it looks great on phones?) As with all of our tools, simplicity...

    Continue Reading

  • Join us in October: NU hosts the Computation + Journalism 2017 symposium

    An exciting lineup of researchers, technologists and journalists will convene in October for Computation + Journalism Symposium 2017 at Northwestern University. Register now and book your hotel rooms for the event, which will take place on Friday, Oct. 13, and Saturday, Oct. 14 in Evanston, IL. Hotel room blocks near campus are filling up fast! Speakers will include: Ashwin Ram, who heads research and development for Amazon’s Alexa artificial intelligence (AI) agent, which powers the...

    Continue Reading

  • Bringing Historical Data to Census Reporter

    A Visualization and Research Review

    An Introduction Since Census Reporter’s launch in 2014, one of our most requested features has been the option to see historic census data. Journalists of all backgrounds have asked for a simplified way to get the long-term values they need from Census Reporter, whether it’s through our data section or directly from individual profile pages. Over the past few months I’ve been working to make that a reality. With invaluable feedback from many of you,......

    Continue Reading

  • How We Brought A Chatbot To Life

    Best Practice Guide

    A chatbot creates a unique user experience with many benefits. It gives the audience an opportunity to ask questions and get to know more about your organization. It allows you to collect valuable information from the audience. It can increase interaction time on your site. Bot prototype In the spring of 2017, our Knight Lab team examined the conversational user interface of Public Good Software’s chatbot, which is a chat-widget embedded within media partner sites.......

    Continue Reading

  • Stitching 360° Video

    For the time-being, footage filmed on most 360° cameras cannot be directly edited and uploaded for viewing immediately after capture. Different cameras have different methods of outputting footage, but usually each camera lens corresponds to a separate video file. These video files must be combined using “video stitching” software on a computer or phone before the video becomes one connected, viewable video. Garmin and other companies have recently demonstrated interest in creating cameras that stitch......

    Continue Reading

  • Publishing your 360° content

    Publishing can be confusing for aspiring 360° video storytellers. The lack of public information on platform viewership makes it nearly impossible to know where you can best reach your intended viewers, or even how much time and effort to devote to the creation of VR content. Numbers are hard to come by, but were more available in the beginning of 2016. At the time, most viewers encountered 360° video on Facebook. In February 2016, Facebook......

    Continue Reading

Storytelling Tools

We build easy-to-use tools that can help you tell better stories.

View More