MozFest 2014: Gotta lotta analog data? Crowdsourcing may make it useful for you and fun for readers

When we think of data, we almost always think of computers. But when it comes to data that was created before the digital area —  handwritten notes, ancient maps or printed documents, for example — nothing beats human eyes to quantify and verify. And when many human eyes are needed, journalists have the option to crowdsource their data.

At MozFest this weekend, Mike Tigas of ProPublica and Jeremy B. Merrill of The New York Times facilitated a session that touched upon four projects leading the movement in crowdsourcing data. Here's a look at a few projects and why people want to get involved:

Free the Files

ProPublica’s Free the Files tool began crowdsourcing back in 2012 when it asked users to “free a file” about political TV ads, by recording the advertiser, agency and gross total dollar amount spent on an ad. In the time since the project’s launch during the 2012 elections about 18,000 documents have been "freed" out of more than 43,000.

Free the Files asks users to verify information that computers can't automatically ascertain from scanned files.

It takes two reports of identical information to verify a file’s data, according to Merrill. Once this occurs, the file’s information has been freed, and ProPublica publishes the information.

NYPL's Building Inspector

The New York Public Library crowdsourced maps in 2013 via a project called Building Inspector. The tool asks users to identify colors, realign building boundaries and input addresses of 1850s New York City. By relying on the input of users to verify information, the data stored in these scanned images of discolored maps can be used with modern cartographic tools.


Additional historical information from New York City has made its way to the crowdsourcing platform via The New York Times’ Madison project. Madison asks users to click one of several buttons to indicate whether or not the depicted image is an advertisement or some variation of. Although the digitization of The Times’ archives has largely focused on editorial, advertisements can be a telling indication of history as well. With Madison, The Times intends to unlock this data.

Madison takes a look at the ads, not articles, that exist in old New York Times newspapers.


Another crowdsourcing data tool, CrowData, was released this year and is available on github. CrowData comes out of an initiative by La Nacion called VozData, which uses the public to verify records released about political spending, much like ProPublica does with Free the Files.

Why do readers contribute to data projects?

Free the Files and VozData incorporate gamification into verification by asking users to log in and displaying a leaderboard of the top contributors. With gamification, “you make people more engaged with some data that they otherwise would maybe never have looked at,” Tigas said.

Projects like Free the Files are alternatives to paying many people for hours of work pouring over data. Because they rely on free work from the public, the verifiers must be interested in the information and believe that the news company is going to do something beneficial with the data.

Merill said that an ideal case for crowdsourcing data is one that provides an exchange that goes both ways. “Where you learn something meaningful about the world that you didn’t know before by doing this,” he said.

Others feel a personal connection to the projects. “There are people who are really interested in their neighborhood histories,” Tigas said. “And so they get something out of [Building Inspector]  because they learn what their neighborhood looked like.”

“There’s a whole segment of readers that are really interested in being involved with the news that they read,” he said.

About the author

Mallory Busch

Undergraduate Fellow

Latest Posts

  • Building a Community for VR and AR Storytelling

    In 2016 we founded the Device Lab to provide a hub for the exploration of AR/VR storytelling on campus. In addition to providing access to these technologies for Medill and the wider Northwestern community, we’ve also pursued a wide variety of research and experimental content development projects. We’ve built WebVR timelines of feminist history and looked into the inner workings of ambisonic audio. We’ve built virtual coral reefs and prototyped an AR experience setting interviews...

    Continue Reading

  • A Brief Introduction to NewsgamesCan video games be used to tell the news?

    When the Financial Times released The Uber Game in 2017, the game immediately gained widespread popularity with more than 360,000 visits, rising up the ranks as the paper’s most popular interactive piece of the year. David Blood, the game’s lead developer, said that the average time spent on the page was about 20 minutes, which was substantially longer than what most Financial Times interactives tend to receive, according to Blood. The Uber Game was so successful that the Financial...

    Continue Reading

  • With the 25th CAR Conference upon us, let’s recall the first oneWhen the Web was young, data journalism pioneers gathered in Raleigh

    For a few days in October 1993, if you were interested in journalism and technology, Raleigh, North Carolina was the place you had to be. The first Computer-Assisted Reporting Conference offered by Investigative Reporters & Editors brought more than 400 journalists to Raleigh for 3½ days of panels, demos and hands-on lessons in how to use computers to find stories in data. That seminal event will be commemorated this week at the 25th CAR Conference, which...

    Continue Reading

  • Prototyping Augmented Reality

    Something that really frustrates me is that, while I’m excited about the potential AR has for storytelling, I don’t feel like I have really great AR experiences that I can point people to. We know that AR is great for taking a selfie with a Pikachu and it’s pretty good at measuring spaces (as long as your room is really well lit and your phone is fully charged) but beyond that, we’re really still figuring...

    Continue Reading

  • Capturing the Soundfield: Recording Ambisonics for VR

    When building experiences in virtual reality we’re confronted with the challenge of mimicking how sounds hit us in the real world from all directions. One useful tool for us to attempt this mimicry is called a soundfield microphone. We tested one of these microphones to explore how audio plays into building immersive experiences for virtual reality. Approaching ambisonics with the soundfield microphone has become popular in development for VR particularly for 360 videos. With it,...

    Continue Reading

  • Audience Engagement and Onboarding with Hearken Auditing the News Resurrecting History for VR Civic Engagement with City Bureau Automated Fact Checking Conversational Interface for News Creative Co-Author Crowdsourcing for Journalism Environmental Reporting with Sensors Augmented Reality Visualizations Exploring Data Visualization in VR Fact Flow Storytelling with GIFs Historical Census Data Information Spaces in AR/VR Contrasting Forms Of Interactive 3D Storytelling Interactive Audio Juxtapose Legislator Tracker Storytelling with Augmented Reality Music Magazine Navigating Virtual Reality Open Data Reporter Oscillations Personalize My Story Photo Bingo Photojournalism in 3D for VR and Beyond Podcast Discoverability Privacy Mirror Projection Mapping ProPublica Illinois Rethinking Election Coverage SensorGrid API and Dashboard Sidebar Smarter News Exploring Software Defined Radio Story for You Storyline: Charts that tell stories. Storytelling Layers on 360 Video Talking to Data Visual Recipes Watch Me Work Writing and Designing for Chatbots
  • Prototyping Spatial Audio for Movement Art

    One of Oscillations’ technical goals for this quarter’s Knight Lab Studio class was an exploration of spatial audio. Spatial audio is sound that exists in three dimensions. It is a perfect complement to 360 video, because sound sources can be localized to certain parts of the video. Oscillations is especially interested in using spatial audio to enhance the neuroscientific principles of audiovisual synchrony that they aim to emphasize in their productions. Existing work in spatial......

    Continue Reading

Storytelling Tools

We build easy-to-use tools that can help you tell better stories.

View More