Preserving interactive news projects with Newseum, OpenNews and Pop Up Archive

Photo by Ted Han during the #apparchive designathon at Newseum with OpenNews and Pop Up Archive

On Sunday, March 2, Knight-Mozilla OpenNews, the Newseum and Pop Up Archive hosted a one-day conference focused on solving a fairly new problem: How to preserve the new breed of complex interactive projects that are becoming more prevalent in news. While print newspapers are relatively well-preserved, we as an industry do a poor job of preserving interactive databases and online data visualizations, and they are in danger of being lost to history.

Inside newsrooms, these interactive databases are sometimes called “news applications” — but don’t be confused. They’re interactive databases published on the web, not something you buy on your smartphone. Think Dollars for Docs, not not Flipboard or Zite.

We were among a few dozen attendees who attended the meeting. Preserving interactive databases isn’t as easy as storing a digital copy. They’re far more complex than a printed newspaper, with technical requirements and external dependencies that make preservation anything but straightforward.

The conference split into small groups to start with some basic questions. One group tackled best practices. Another talked about how external dependencies like the Google Maps API could be handled, while another asked who might be willing to pay archiving efforts, and how to make it inexpensive so cash-strapped newsrooms can do it. Our group, consisting of Elaine Ayo, Mohammed Haddad, Tyler Fisher, Jacob Harris, Scott Klein, Roger Macdonald, Mike Tigas and Marcos Vanetta was tasked with answering the very basic question, “What is a news app and what is one made of?” Our goal was to define the components of a news app to better facilitate the conversation around what is worth preserving, what needs to be virtualized, and what it might take to archive one.

The conceptual model we took as an inspiration was the Open Systems Interconnection Model, usually called “The OSI Model,” one of the frameworks that makes the low-level networking bits of the Internet work without a lot of coordination. We attempted to come up with a way to describe news apps using OSI-like “layers,” with infrastructure at the bottom and audience at the top. Like the OSI Model, we conceived each layer as talking exclusively to the layer above and below it. But the metaphor broke down. We found that too many parts of news applications worth preserving — the code we write, the processes we define — talk to lots of layers at once.

So we ditched the layers idea and started to think about interdependent, non-hierarchical categories. We defined six of them, each with artifacts, attributes and preservation requirements.

Find a quick draft of the described model below in this repo.

Our draft model includes six categories:

  • The Code Category includes the software that runs in production as well as the software used to acquire, parse and analyze the data and any libraries the newsroom wrote for its own use.
  • The Data Category might better be called the Input Category. It includes data (raw and cleaned), metadata and data structure artifacts like the data dictionaries, reporting material and more.
  • The Story Category, also called the Output Category includes the narrative stories that went along with the app, APIs published with the app, multimedia, UX, visual design, information architecture, annotations and documentation.
  • The Infrastructure Category, which is something to be simulated more than preserved, includes the Internet itself (bandwidth), web browsers, web servers, operating systems, programming languages and frameworks, external display APIs, vendor libraries and dependencies and database management systems
  • The Process Category includes code documentation, code history (git), data transformation diaries, data diaries and documentation, documents describing the cultural context, story edits, data sources such as FOIA letters and general writing about process (e.g., a nerd blog)
  • The Response Category includes user comments, site metrics, awards won, user behavior metrics, logs, media coverage, tweets and other social media mentions, as well as real-world impact
  • Every category has actors, or people who perform tasks on this category. When archiving, ask who those actors were and what decisions they made. Save versions of each artifact to show how something transformed over time. And of course, provide documentation for all of these things.


Perhaps this all seems laborious or trivial, but knowing exactly what goes into and comes out of a news application is fundamental to understanding how to preserve one.

The model is not so much a way to think about how to build news apps — though it certainly does strongly imply something about how they’re built — as it is a way to understand them as human-made objects, and how to break them down in order to preserve them. Separating “code” from “infrastructure” and “data” is not all that helpful when building, but preserving each category requires separate and intentional efforts, different skills and technologies.

Throughout the day, we continued to return to Adrian Holovaty’s chicagocrime.org, a groundbreaking news application that is now lost to the world. What do we want to know in 2014 about that app? What would we want to know in 2034? It’s not just the code that Adrian wrote or the map itself, though his reverse engineering of the Google Maps Flash API was one of its great innovations when it first came out. We want to know about his process. We want to know the infrastructure on which he built the app (indeed, making his use of Google Maps even more impressive). We want to know about how it was designed, how the user interactions worked. We want to know the impact it had and who responded to it.

With a defined model of news applications, it becomes clear that archiving a news app is about more than just making sure the app still exists on the web. Things like oral histories and screencasts will likely be required to tell future news developers and historians how this kind of journalism came to be and why we made the decisions we made.

This is just a first stab at the model. Our draft is the result of a few hours’ effort and we’ve posted it to GitHub. We hold no monopoly on the idea. Feel free to fork it and send us pull requests and to open issues to give us better ideas.

Eventually, we hope this model can serve as a document for understanding how to preserve a news application, and to start a conversation about how to tackle each part. Preserving our work for future generations is crucial. Just as we can look at a New York Herald issue from the middle of the Civil War, even though the Herald itself and everybody associated with it has long since died, we hope that future news nerds can look at the work we do long after we’re gone. The challenge is great, but monumentally important.

About the author

Tyler Fisher

Undergraduate Fellow

Latest Posts

  • Building a Community for VR and AR Storytelling

    In 2016 we founded the Device Lab to provide a hub for the exploration of AR/VR storytelling on campus. In addition to providing access to these technologies for Medill and the wider Northwestern community, we’ve also pursued a wide variety of research and experimental content development projects. We’ve built WebVR timelines of feminist history and looked into the inner workings of ambisonic audio. We’ve built virtual coral reefs and prototyped an AR experience setting interviews...

    Continue Reading

  • A Brief Introduction to NewsgamesCan video games be used to tell the news?

    When the Financial Times released The Uber Game in 2017, the game immediately gained widespread popularity with more than 360,000 visits, rising up the ranks as the paper’s most popular interactive piece of the year. David Blood, the game’s lead developer, said that the average time spent on the page was about 20 minutes, which was substantially longer than what most Financial Times interactives tend to receive, according to Blood. The Uber Game was so successful that the Financial...

    Continue Reading

  • With the 25th CAR Conference upon us, let’s recall the first oneWhen the Web was young, data journalism pioneers gathered in Raleigh

    For a few days in October 1993, if you were interested in journalism and technology, Raleigh, North Carolina was the place you had to be. The first Computer-Assisted Reporting Conference offered by Investigative Reporters & Editors brought more than 400 journalists to Raleigh for 3½ days of panels, demos and hands-on lessons in how to use computers to find stories in data. That seminal event will be commemorated this week at the 25th CAR Conference, which...

    Continue Reading

  • Prototyping Augmented Reality

    Something that really frustrates me is that, while I’m excited about the potential AR has for storytelling, I don’t feel like I have really great AR experiences that I can point people to. We know that AR is great for taking a selfie with a Pikachu and it’s pretty good at measuring spaces (as long as your room is really well lit and your phone is fully charged) but beyond that, we’re really still figuring...

    Continue Reading

  • Capturing the Soundfield: Recording Ambisonics for VR

    When building experiences in virtual reality we’re confronted with the challenge of mimicking how sounds hit us in the real world from all directions. One useful tool for us to attempt this mimicry is called a soundfield microphone. We tested one of these microphones to explore how audio plays into building immersive experiences for virtual reality. Approaching ambisonics with the soundfield microphone has become popular in development for VR particularly for 360 videos. With it,...

    Continue Reading

  • Audience Engagement and Onboarding with Hearken Auditing the News Resurrecting History for VR Civic Engagement with City Bureau Automated Fact Checking Conversational Interface for News Creative Co-Author Crowdsourcing for Journalism Environmental Reporting with Sensors Augmented Reality Visualizations Exploring Data Visualization in VR Fact Flow Storytelling with GIFs Historical Census Data Information Spaces in AR/VR Contrasting Forms Of Interactive 3D Storytelling Interactive Audio Juxtapose Legislator Tracker Storytelling with Augmented Reality Music Magazine Navigating Virtual Reality Open Data Reporter Oscillations Personalize My Story Photo Bingo Photojournalism in 3D for VR and Beyond Podcast Discoverability Privacy Mirror Projection Mapping ProPublica Illinois Rethinking Election Coverage SensorGrid API and Dashboard Sidebar Smarter News Exploring Software Defined Radio Story for You Storyline: Charts that tell stories. Storytelling Layers on 360 Video Talking to Data Visual Recipes Watch Me Work Writing and Designing for Chatbots
  • Prototyping Spatial Audio for Movement Art

    One of Oscillations’ technical goals for this quarter’s Knight Lab Studio class was an exploration of spatial audio. Spatial audio is sound that exists in three dimensions. It is a perfect complement to 360 video, because sound sources can be localized to certain parts of the video. Oscillations is especially interested in using spatial audio to enhance the neuroscientific principles of audiovisual synchrony that they aim to emphasize in their productions. Existing work in spatial......

    Continue Reading

Storytelling Tools

We build easy-to-use tools that can help you tell better stories.

View More