MozFest 2014: Gotta lotta analog data? Crowdsourcing may make it useful for you and fun for readers

When we think of data, we almost always think of computers. But when it comes to data that was created before the digital area —  handwritten notes, ancient maps or printed documents, for example — nothing beats human eyes to quantify and verify. And when many human eyes are needed, journalists have the option to crowdsource their data.

At MozFest this weekend, Mike Tigas of ProPublica and Jeremy B. Merrill of The New York Times facilitated a session that touched upon four projects leading the movement in crowdsourcing data. Here's a look at a few projects and why people want to get involved:

Free the Files

ProPublica’s Free the Files tool began crowdsourcing back in 2012 when it asked users to “free a file” about political TV ads, by recording the advertiser, agency and gross total dollar amount spent on an ad. In the time since the project’s launch during the 2012 elections about 18,000 documents have been "freed" out of more than 43,000.

Free the Files asks users to verify information that computers can't automatically ascertain from scanned files.

It takes two reports of identical information to verify a file’s data, according to Merrill. Once this occurs, the file’s information has been freed, and ProPublica publishes the information.

NYPL's Building Inspector

The New York Public Library crowdsourced maps in 2013 via a project called Building Inspector. The tool asks users to identify colors, realign building boundaries and input addresses of 1850s New York City. By relying on the input of users to verify information, the data stored in these scanned images of discolored maps can be used with modern cartographic tools.


Additional historical information from New York City has made its way to the crowdsourcing platform via The New York Times’ Madison project. Madison asks users to click one of several buttons to indicate whether or not the depicted image is an advertisement or some variation of. Although the digitization of The Times’ archives has largely focused on editorial, advertisements can be a telling indication of history as well. With Madison, The Times intends to unlock this data.

Madison takes a look at the ads, not articles, that exist in old New York Times newspapers.


Another crowdsourcing data tool, CrowData, was released this year and is available on github. CrowData comes out of an initiative by La Nacion called VozData, which uses the public to verify records released about political spending, much like ProPublica does with Free the Files.

Why do readers contribute to data projects?

Free the Files and VozData incorporate gamification into verification by asking users to log in and displaying a leaderboard of the top contributors. With gamification, “you make people more engaged with some data that they otherwise would maybe never have looked at,” Tigas said.

Projects like Free the Files are alternatives to paying many people for hours of work pouring over data. Because they rely on free work from the public, the verifiers must be interested in the information and believe that the news company is going to do something beneficial with the data.

Merill said that an ideal case for crowdsourcing data is one that provides an exchange that goes both ways. “Where you learn something meaningful about the world that you didn’t know before by doing this,” he said.

Others feel a personal connection to the projects. “There are people who are really interested in their neighborhood histories,” Tigas said. “And so they get something out of [Building Inspector]  because they learn what their neighborhood looked like.”

“There’s a whole segment of readers that are really interested in being involved with the news that they read,” he said.

About the author

Mallory Busch

Undergraduate Fellow

Latest Posts

  • Introducing StorylineJS

    Today we're excited to release a new tool for storytellers.

    StorylineJS makes it easy to tell the story behind a dataset, without the need for programming or data visualization expertise. Just upload your data to Google Sheets, add two columns, and fill in the story on the rows you want to highlight. Set a few configuration options and you have an annotated chart, ready to embed on your website. (And did we mention, it looks great on phones?) As with all of our tools, simplicity...

    Continue Reading

  • Join us in October: NU hosts the Computation + Journalism 2017 symposium

    An exciting lineup of researchers, technologists and journalists will convene in October for Computation + Journalism Symposium 2017 at Northwestern University. Register now and book your hotel rooms for the event, which will take place on Friday, Oct. 13, and Saturday, Oct. 14 in Evanston, IL. Hotel room blocks near campus are filling up fast! Speakers will include: Ashwin Ram, who heads research and development for Amazon’s Alexa artificial intelligence (AI) agent, which powers the...

    Continue Reading

  • Bringing Historical Data to Census Reporter

    A Visualization and Research Review

    An Introduction Since Census Reporter’s launch in 2014, one of our most requested features has been the option to see historic census data. Journalists of all backgrounds have asked for a simplified way to get the long-term values they need from Census Reporter, whether it’s through our data section or directly from individual profile pages. Over the past few months I’ve been working to make that a reality. With invaluable feedback from many of you,......

    Continue Reading

  • How We Brought A Chatbot To Life

    Best Practice Guide

    A chatbot creates a unique user experience with many benefits. It gives the audience an opportunity to ask questions and get to know more about your organization. It allows you to collect valuable information from the audience. It can increase interaction time on your site. Bot prototype In the spring of 2017, our Knight Lab team examined the conversational user interface of Public Good Software’s chatbot, which is a chat-widget embedded within media partner sites.......

    Continue Reading

  • Stitching 360° Video

    For the time-being, footage filmed on most 360° cameras cannot be directly edited and uploaded for viewing immediately after capture. Different cameras have different methods of outputting footage, but usually each camera lens corresponds to a separate video file. These video files must be combined using “video stitching” software on a computer or phone before the video becomes one connected, viewable video. Garmin and other companies have recently demonstrated interest in creating cameras that stitch......

    Continue Reading

  • Publishing your 360° content

    Publishing can be confusing for aspiring 360° video storytellers. The lack of public information on platform viewership makes it nearly impossible to know where you can best reach your intended viewers, or even how much time and effort to devote to the creation of VR content. Numbers are hard to come by, but were more available in the beginning of 2016. At the time, most viewers encountered 360° video on Facebook. In February 2016, Facebook......

    Continue Reading

Storytelling Tools

We build easy-to-use tools that can help you tell better stories.

View More