Preserving interactive news projects with Newseum, OpenNews and Pop Up Archive

Photo by Ted Han during the #apparchive designathon at Newseum with OpenNews and Pop Up Archive

On Sunday, March 2, Knight-Mozilla OpenNews, the Newseum and Pop Up Archive hosted a one-day conference focused on solving a fairly new problem: How to preserve the new breed of complex interactive projects that are becoming more prevalent in news. While print newspapers are relatively well-preserved, we as an industry do a poor job of preserving interactive databases and online data visualizations, and they are in danger of being lost to history.

Inside newsrooms, these interactive databases are sometimes called “news applications” — but don’t be confused. They’re interactive databases published on the web, not something you buy on your smartphone. Think Dollars for Docs, not not Flipboard or Zite.

We were among a few dozen attendees who attended the meeting. Preserving interactive databases isn’t as easy as storing a digital copy. They’re far more complex than a printed newspaper, with technical requirements and external dependencies that make preservation anything but straightforward.

The conference split into small groups to start with some basic questions. One group tackled best practices. Another talked about how external dependencies like the Google Maps API could be handled, while another asked who might be willing to pay archiving efforts, and how to make it inexpensive so cash-strapped newsrooms can do it. Our group, consisting of Elaine Ayo, Mohammed Haddad, Tyler Fisher, Jacob Harris, Scott Klein, Roger Macdonald, Mike Tigas and Marcos Vanetta was tasked with answering the very basic question, “What is a news app and what is one made of?” Our goal was to define the components of a news app to better facilitate the conversation around what is worth preserving, what needs to be virtualized, and what it might take to archive one.

The conceptual model we took as an inspiration was the Open Systems Interconnection Model, usually called “The OSI Model,” one of the frameworks that makes the low-level networking bits of the Internet work without a lot of coordination. We attempted to come up with a way to describe news apps using OSI-like “layers,” with infrastructure at the bottom and audience at the top. Like the OSI Model, we conceived each layer as talking exclusively to the layer above and below it. But the metaphor broke down. We found that too many parts of news applications worth preserving — the code we write, the processes we define — talk to lots of layers at once.

So we ditched the layers idea and started to think about interdependent, non-hierarchical categories. We defined six of them, each with artifacts, attributes and preservation requirements.

Find a quick draft of the described model below in this repo.

Our draft model includes six categories:

  • The Code Category includes the software that runs in production as well as the software used to acquire, parse and analyze the data and any libraries the newsroom wrote for its own use.
  • The Data Category might better be called the Input Category. It includes data (raw and cleaned), metadata and data structure artifacts like the data dictionaries, reporting material and more.
  • The Story Category, also called the Output Category includes the narrative stories that went along with the app, APIs published with the app, multimedia, UX, visual design, information architecture, annotations and documentation.
  • The Infrastructure Category, which is something to be simulated more than preserved, includes the Internet itself (bandwidth), web browsers, web servers, operating systems, programming languages and frameworks, external display APIs, vendor libraries and dependencies and database management systems
  • The Process Category includes code documentation, code history (git), data transformation diaries, data diaries and documentation, documents describing the cultural context, story edits, data sources such as FOIA letters and general writing about process (e.g., a nerd blog)
  • The Response Category includes user comments, site metrics, awards won, user behavior metrics, logs, media coverage, tweets and other social media mentions, as well as real-world impact
  • Every category has actors, or people who perform tasks on this category. When archiving, ask who those actors were and what decisions they made. Save versions of each artifact to show how something transformed over time. And of course, provide documentation for all of these things.


Perhaps this all seems laborious or trivial, but knowing exactly what goes into and comes out of a news application is fundamental to understanding how to preserve one.

The model is not so much a way to think about how to build news apps — though it certainly does strongly imply something about how they’re built — as it is a way to understand them as human-made objects, and how to break them down in order to preserve them. Separating “code” from “infrastructure” and “data” is not all that helpful when building, but preserving each category requires separate and intentional efforts, different skills and technologies.

Throughout the day, we continued to return to Adrian Holovaty’s chicagocrime.org, a groundbreaking news application that is now lost to the world. What do we want to know in 2014 about that app? What would we want to know in 2034? It’s not just the code that Adrian wrote or the map itself, though his reverse engineering of the Google Maps Flash API was one of its great innovations when it first came out. We want to know about his process. We want to know the infrastructure on which he built the app (indeed, making his use of Google Maps even more impressive). We want to know about how it was designed, how the user interactions worked. We want to know the impact it had and who responded to it.

With a defined model of news applications, it becomes clear that archiving a news app is about more than just making sure the app still exists on the web. Things like oral histories and screencasts will likely be required to tell future news developers and historians how this kind of journalism came to be and why we made the decisions we made.

This is just a first stab at the model. Our draft is the result of a few hours’ effort and we’ve posted it to GitHub. We hold no monopoly on the idea. Feel free to fork it and send us pull requests and to open issues to give us better ideas.

Eventually, we hope this model can serve as a document for understanding how to preserve a news application, and to start a conversation about how to tackle each part. Preserving our work for future generations is crucial. Just as we can look at a New York Herald issue from the middle of the Civil War, even though the Herald itself and everybody associated with it has long since died, we hope that future news nerds can look at the work we do long after we’re gone. The challenge is great, but monumentally important.

Latest Posts

  • A Google Spreadsheets change affecting TimelineJS users

    Google recently changed something about their Sheets service which is causing many people to run into an error when they are making a new timeline. Note: there should be no impact on existing timelines! After this change, many of you click on the "preview" and get this message: An unexpected error occurred trying to read your spreadsheet data [SyntaxError] Timeline configuration has no events. There is a straightforward work-around, but it requires those of you who have...

    Continue Reading

  • How Americans think and feel about gun violence

    A man killed his wife, then himself. I want you to see his face and learn that he enjoyed fishing with his grandchildren. A small-time drug dealer is shot by two men in a parking lot. I find his Facebook profile and a photo shows him striking a playfully irreverent pose, giving the camera the middle finger. The photo’s comments take a mournful turn after a certain date. “Rest easy bro ???” Gun Memorial runs...

    Continue Reading

  • Software developers interested in journalism: Northwestern and The Washington Post want you!

    Northwestern University and The Washington Post are offering a unique opportunity for two talented software developers interested in applying their programming skills in media and journalism. Here’s the proposition: (1) a full-tuition scholarship to earn a master’s degree in journalism at Northwestern University, followed by (2) a six-month paid internship with The Post’s world-class engineering team, with the possibility of subsequent full-time employment. These opportunities are made possible by the John S. and James L....

    Continue Reading

  • What happened when Gun Memorial let anyone contribute directly to victim profiles

    If you’re reporting local or niche news, there’s a good chance that your audience collectively knows more about the story than you do. That’s especially true for us at Gun Memorial, a small publication with a nationwide mission of covering every American who is shot dead. In our latest, mostly successful, experiment, we let readers add to our stories without editor intervention. This article shares some lessons from that experience. Asking for reader contributions A...

    Continue Reading

  • How conversational interfaces make the internet more accessible for everyone

    This story is part of a series on bringing the journalism we produce to as many people as possible, regardless of language, access to technology, or physical capability. Find the series introduction, as well as a list of published stories here. In 2004, human-computer interaction professor Alan Dix published the third edition of Human-Computer Interaction along with his colleagues, Janet Finley, Gregory Abowd, and Russell Beale. In a chapter called “The Interaction,” the authors wrote...

    Continue Reading

  • Three tools to help you make colorblind-friendly graphics

    This story is part of a series on bringing the journalism we produce to as many people as possible, regardless of language, access to technology, or physical capability. Find the series introduction, as well as a list of published stories here. I am one of the 8% of men of Northern European descent who suffers from red-green colorblindness. Specifically, I have a mild case of protanopia (also called protanomaly), which means that my eyes lack...

    Continue Reading

Storytelling Tools

We build easy-to-use tools that can help you tell better stories.

View More