Questions and consequences when publishing public data

Over the past few months something unusual has happened to public data projects: they’ve made national headlines.

For journalists the most well known project was the gun permit holder map the Journal News in White Plains, New York published late last year featuring names and addresses of all registered gun owners in two New York counties.

The map was controversial and inspired journalists and journalism pundits to weigh in on the project’s virtues and faults before it was ultimately removed late last month.

The controversy — especially in light of recent and proposed legislation — got us thinking about how newsroom developers should best handle public data. What solutions are best suited to deal with data that is potentially invasive? Are there differences when dealing with data online versus in print? And what repercussions might news organizations face following controversial publishing of public data?

Questions that come with data

At its core, publishing data requires editorial judgment not all that different from the judgment journalists have honed in print over the past few decades.

“Every data set is like a human source,” said Derek Willis, an interactive news developer at The New York Times.  “You weigh whether to publish the information you get from it, in what context and to what end.”

Still, there are some unique questions that come with data and digital distribution.

Of course there’s the issue of permanence. Stories and data last much longer online than they do in print and have the potential to follow the people mentioned in the data for years to come with potentially negative effects.

There’s also the issue of accuracy. Rich Gordon, a Knight Lab co-founder and former digital director at the Miami Herald, recalls working on projects for the print edition of the Herald years ago in which every line of data printed was double-checked by a person.

Online, Gordon contends, there’s a greater tendency to present all data of a particular set. That tendency allows for much more depth, but the volume makes it difficult to double-check for accuracy.

It also represents a shift from looking for stories within data to data being the story. That shift isn’t necessarily problematic, but it does make journalists less likely to find mistakes or inaccuracies in the data.

“Because we spent a lot of time with the data in search of the news before we published, we were more likely to find trouble with the data,” Gordon said.

In fact, data accuracy was one of the reasons the Journal News’ publisher cited for taking the map down, according to the publisher's note that announced the removal. It also appears to be one of the reasons cited for a similar database the Roanoke Times introduced and subsequently removed back in 2007, according to a note from the publisher of that paper.

More challenging than mere accuracy for large data sets, is that what’s accurate one day might be inaccurate the next — again, a factor in both the Journal News’ decision to pull down the map.

The mugshot data dilemma

The changing nature of accuracy was one of the key concerns the New Products Development Team at the Tampa Bay Times faced when developing a mug shot site back in 2009, said Matt Waite, who was part of the team and today is a journalism professor at the University of Nebraska.

“You have to ask yourself, how long is your data valid,” Waite said, “how long is it good.”

Waite and his colleagues didn’t have a reliable answer to that question when it came to arrest records. Though the record of each arrest was unlikely to change, it seemed unfair to publish an arrest record and then neglect to follow the case through the court system simply because it was a challenge they couldn’t handle programmatically, he said.

Out of concern for privacy and a desire to avoid building a “background check tool” for the Tampa area, the team decided to take some steps to protect the privacy of those arrested.

Waite and team told scrapers running not to scrape the mug shot pages. They did it twice, in fact — first in the site’s robots.txt file and again in the HTML of the individual pages. Then they came up with a way to house the names of arrestees in the JavaScript for each page, a non-standard way to handle names and one not likely to be picked up by bots, Waite said.

As a final protection both against publishing inaccurate data and against creating an undue burden on those arrested, all photos are deleted from the site after 60 days.

Waite's team demonstrates data and potential invasion of privacy of private citizens are challenges, but not insurmountable. Creativity allows you to make illustrate a story without trampling on potential privacy concerns.

“You have options as a developer,” said Ben Welsh, a database producer at the Los Angeles Times.

For example, the Memphis Commercial Appeal publishes a database of handgun carry permit holders, including full name, city, and zip code. The information provided was not all that different from what the Journal News presented.

The difference is that the database is searchable and doesn’t ever appear in one piece. It works well and searches return broad results. When I enter “Graff” in the last name field, for example, it returns not only exact matches, but also Pendergraff and DeGraffenreaid.

Another potential solution is to carefully choose what data to publish, which is exactly what the Commercial Appeal did with it’s decision to publish zip codes, but not addresses.

And therein lies the challenge — “threading the needle,” as Welsh said. The idea is to provide enough information to be useful, but not so much that you’re invading the privacy of ordinary citizens.


The consequences for the news industry for journalists and others who publish data that the public deems reckless are real. Lawmakers in New York passed legislation soon after the Journal News’ map that allowed permit holders to request confidentiality. Just last week, a group of lawmakers in Maine tried to pressure the Bangor Daily News into withdrawing a request for concealed weapon permits.

Also last week a Florida lawmaker introduced a bill that would require all websites to remove mug shots within 15 days of being notified that an arrest did not result in conviction. The bill was reportedly inspired by a so-called extortion mug shot site, but makes no distinction between those sites and traditional news sites.

“If news organizations want to separate themselves from the mug shot racket they need to be conscientious about how they handle public data,” Waite said.

It’s a good lesson and one that journalists can avoid with some creativity and, perhaps, restraint.

The real key in publishing data, as in other journalism, is to add context and nuance to it.

“To republish something with out any insight or analysis is a low form of journalism,” Welsh said. “With data — as with all things in journalism — we should strive not to be stenographers."

About the author

Ryan Graff

Communications and Outreach Manager, 2011-2016

Journalism, revenue, whitewater, former carny. Recently loving some quality time @KelloggSchool.

Latest Posts

  • Building a Community for VR and AR Storytelling

    In 2016 we founded the Device Lab to provide a hub for the exploration of AR/VR storytelling on campus. In addition to providing access to these technologies for Medill and the wider Northwestern community, we’ve also pursued a wide variety of research and experimental content development projects. We’ve built WebVR timelines of feminist history and looked into the inner workings of ambisonic audio. We’ve built virtual coral reefs and prototyped an AR experience setting interviews...

    Continue Reading

  • A Brief Introduction to NewsgamesCan video games be used to tell the news?

    When the Financial Times released The Uber Game in 2017, the game immediately gained widespread popularity with more than 360,000 visits, rising up the ranks as the paper’s most popular interactive piece of the year. David Blood, the game’s lead developer, said that the average time spent on the page was about 20 minutes, which was substantially longer than what most Financial Times interactives tend to receive, according to Blood. The Uber Game was so successful that the Financial...

    Continue Reading

  • With the 25th CAR Conference upon us, let’s recall the first oneWhen the Web was young, data journalism pioneers gathered in Raleigh

    For a few days in October 1993, if you were interested in journalism and technology, Raleigh, North Carolina was the place you had to be. The first Computer-Assisted Reporting Conference offered by Investigative Reporters & Editors brought more than 400 journalists to Raleigh for 3½ days of panels, demos and hands-on lessons in how to use computers to find stories in data. That seminal event will be commemorated this week at the 25th CAR Conference, which...

    Continue Reading

  • Prototyping Augmented Reality

    Something that really frustrates me is that, while I’m excited about the potential AR has for storytelling, I don’t feel like I have really great AR experiences that I can point people to. We know that AR is great for taking a selfie with a Pikachu and it’s pretty good at measuring spaces (as long as your room is really well lit and your phone is fully charged) but beyond that, we’re really still figuring...

    Continue Reading

  • Capturing the Soundfield: Recording Ambisonics for VR

    When building experiences in virtual reality we’re confronted with the challenge of mimicking how sounds hit us in the real world from all directions. One useful tool for us to attempt this mimicry is called a soundfield microphone. We tested one of these microphones to explore how audio plays into building immersive experiences for virtual reality. Approaching ambisonics with the soundfield microphone has become popular in development for VR particularly for 360 videos. With it,...

    Continue Reading

  • Audience Engagement and Onboarding with Hearken Auditing the News Resurrecting History for VR Civic Engagement with City Bureau Automated Fact Checking Conversational Interface for News Creative Co-Author Crowdsourcing for Journalism Environmental Reporting with Sensors Augmented Reality Visualizations Exploring Data Visualization in VR Fact Flow Storytelling with GIFs Historical Census Data Information Spaces in AR/VR Contrasting Forms Of Interactive 3D Storytelling Interactive Audio Juxtapose Legislator Tracker Storytelling with Augmented Reality Music Magazine Navigating Virtual Reality Open Data Reporter Oscillations Personalize My Story Photo Bingo Photojournalism in 3D for VR and Beyond Podcast Discoverability Privacy Mirror Projection Mapping ProPublica Illinois Rethinking Election Coverage SensorGrid API and Dashboard Sidebar Smarter News Exploring Software Defined Radio Story for You Storyline: Charts that tell stories. Storytelling Layers on 360 Video Talking to Data Visual Recipes Watch Me Work Writing and Designing for Chatbots
  • Prototyping Spatial Audio for Movement Art

    One of Oscillations’ technical goals for this quarter’s Knight Lab Studio class was an exploration of spatial audio. Spatial audio is sound that exists in three dimensions. It is a perfect complement to 360 video, because sound sources can be localized to certain parts of the video. Oscillations is especially interested in using spatial audio to enhance the neuroscientific principles of audiovisual synchrony that they aim to emphasize in their productions. Existing work in spatial......

    Continue Reading

Storytelling Tools

We build easy-to-use tools that can help you tell better stories.

View More