Pop Up Archive's Anne Wootton and Bailey Smith on born digital audio, search and transcriptions

"Obligatory question: what shows do y’all love to listen? Give ‘em to me. The more obscure the better."

"Oh my god, Miranda. There's this podcast, Serial, that is so good."

Unless you've been ignoring the future-of-journalism chatter completely, chances are you've begun to tire of the whole "podcasts were dead and now they are back" discourse. This story is everywhere and it’s not exactly accurate, as they never really were "dead." That said, if you are focused solely on podcasts-as-the-silver-bullet solving for the future of radio, you are missing the larger picture.

Talking to Anne Wootton and Bailey Smith is like having a coffee with your favorite librarians. The Pop Up Archive co-founders are passionate about searchable sound and developing a future for audio-on-the-web. From a recent newsletter:

“Sound is opaque: you can’t picture audio, or skim the words in a recording. Its opacity makes it hard to share — there’s no visual content to latch on to. The text and images added to audio pages are labor intensive to create, and often fail to capture crucial content within audio.

"The fact is: text is the medium of the web. But audio isn’t text. There’s no clear path for audiences to find even the most compelling audio.”

Anne (CEO) and Bailey (CTO) met and developed the concept for their startup at the UC-Berkeley School of Information. Pop Up Archive was a Knight News Challenge winner in 2012, and is supported by Knight Foundation, the National Endowment for the Humanities, 500 Startups, Bloomberg Beta, and Funders Club.

In the following conversation—and to wrap up our tranche of audio-on-the-web research—Anne, Bailey and myself talk about working with radio stations, archival and born-digital content, a future for audio files and stories, podcasts, structured metadata, transcription, and much more.

(Discussion has been edited for length and clarity.)


Pop Up Archive's co-founders Bailey Smith and Anne Wootton

What is the question that no one asks you, but secretly you wish that someone would?

BAILEY: Actually, I have one, but it is a thing that I like to talk about that gets on my nerves. "Virality." Everyone wants more clicks and more listens and I just feel like there has to be a new metric of engagement that is more meaningful than that. Especially for audio. The Serial podcast is getting something on the order of a million listens on average, or downloads, who knows what that even means. I don't know if that's the threshold of “viral”.

ANNE: This is like what Alex mentioned in his Q&A: A million downloads for Serial is awesome, viral YouTube videos have billions of views. It's just a completely different world. That’s not necessarily a bad thing: only one or two million people watch each episode of FX’s Louis, which seems pretty successful to me.

BAILEY: I checked Charlie Bit My Finger. About 800 million views.

ANNE: Also, that's *Charlie Bit My Finger*. It's a different thing.

BAILEY: Yeah, it has no substance! What we're talking about is like art or journalism, these are really important modes of expression. I feel disgusted when people talk about these assigning binary metrics of listens or clicks. We need a new metric that is more substantive.

MIRANDA: I don't know how much this concept has permeated the radio and audio community, but there's been a shift in thinking around metrics which is now focused on more meaningfully accounting for a listener's time. Understanding "time," I think, is a far more valuable measure, especially for audio content. Time is the big ask and it is really hard to skim an audio story.

BAILEY: Even more valuable than time would be the “network effects” of a story. How do we track how the things we listen to are shared and discussed? There's no way to track this, but how many times have we talked about who did it in Serial in the past month? And how many other conversations exist around that piece of media? Having a “cultural moment”, that's the real effect that we want to measure.

Given Pop Up Archive, can you talk about the technology that could intersect with impact and value analytics for audio?

ANNE: For one thing, as you said, it's hard to skim audio, right? It's a lot easier if you have text. That’s a mantra we're constantly repeating. We're not doing transcription for the sake of perfection or perfect accuracy. We're doing it for the sake of search, and you don't need perfect accuracy for search or for skimming.

[sc:pull-left pulltext="For one thing, as you said, it’s hard to skim audio, right? It’s a lot easier if you have text. " ]It's a frustrating conversation. People hear the word “transcription” and expect to read text, grammatically, as if it had been checked by an editor. That's not the world we’re preparing for… It's more interesting to look at groups of people online, influencers on Twitter for example, what they talk about, the links that they're sharing. Audio is just starting to become a part of that conversation. The tools we're building to help people parse audio give some sense of what the audio is about — like, people are talking about this one show, but what is that show talking about? It’s not text, so we don’t know.

What is a typical engagement with a station like, as you work with them and discuss specific efforts to improve their web experiences?

ANNE: We're having conversations with stations where they are realizing that people want to come to their website and press play and just keep listening. They want whatever just played on the station, and they want to be able to choose what’s interesting. They want to be able to skip.

All of this is very hard to do. It takes a lot of curation and structured data about audio that has previously not existed. It requires a lot more content than what’s created for a typical 24-hour broadcast period.

So, we're getting requests to help stations with their evergreen content, that is now born digital, or has been digitized. Fortunately, we get to come in and help make all of those stories searchable. This way they can more easily find and showcase past content for today’s audiences.

BAILEY: To date, these efforts have been focused on video, television, and movie content primarily. Now, [public media] stations are realizing that there is a huge appetite for stories produced 10, 15 or 20 years ago.

MIRANDA: It is strange that news organizations are having such a hard time re-positioning themselves as knowledge centers about their communities. Are anyone’s archives set up for searchability and content reuse? Every look-back package involves many human-hours and elbow grease.

ANNE: The reality is that stations are chipping away at a problem that didn't even exist 10 or 15 years ago.

Take the obituary scenario: a local politician dies and a station has decades worth of interviews and coverage of this person but it's all tied down in the tacit knowledge of the people at that station, if they're even actually still there, who helped do that reporting and produce those pieces. Finding and packaging those stories to present online to serve the volume of audience interest requires a lot of human effort.

Contrastly, when Maya Angelou died this summer, The LA Times linked to an interview that aired on WFMT Chicago in 1970 in which she talked about "I Know Why The Caged Bird Sings," only a couple of days after it had been published. The response was incredible. After collecting dust in Chicago for decades, it was finally digitized, transcribed and indexed on the web by Pop Up Archive, so that it could be discovered and loved this summer.

The problem is that, even today, a station will broadcast a two-way interview in the morning with a celebrity or news figure and then it is almost instantly lost because they don’t have the workflows in place to publish it online, let alone publish it in a way that's easily found and discovered.

People ask about the platform or distribution channel that could be the silver bullet, if there even is one. What we are realizing is that audio-on-the-web—whether it is a podcast, or some other kind of audio file on the web—it is currently and increasingly will be this discrete packet that pops up on Twitter, or in a BuzzFeed listicle, or any number of podcasts.

BAILEY: They really got that right in the name, a podcast file really is “a pod.”

ANNE: Well, the iPod was also “a pod” kind of thing.

BAILEY: Their intention is irrelevant.

[laughter]

ANNE: From the beginning, our role has been to add as much description to audio files on the web as possible, whether it's through human generated metadata or more recently through automatic transcription and semantic analysis and tagging. We add as much text possible needs because that is how the Internet discovers things.

We're excited about a future where entering search terms into Google returns audio results. When you search for a quote and that audio is the first format returned. That's what we think about a lot, same as our customers. They have discrete audio, or even video, components and we want that content to be indexed and surfaced regardless of the channel.

What is hard about publishing audio to web, and born-digital content?

BAILEY: Even though we talk a lot about archives, we actually think that the search solution we’re developing is the same as the archiving solution.

People are really bad at planning for the future, they’re bad at saving money and they are bad at saving their archives. Ideally, as soon as a piece is done, as soon as it is finished, that's the best moment to make it searchable and, inherently, that's when it is archived.

In working with text analysis and with key words, we're making searchable audio that can be published to your site and indexed by Google. That's the whole point. Just by making audio searchable, you're making it possible to find it in like 10 years, to find what was in the audio file. The two solutions are really complementary and will solve many problems for this community.

[sc:pull-left pulltext="Even though we talk a lot about archives, we actually think that the search solution we’re developing is the same as the archiving solution." ]ANNE: To Bailey's point, you can’t do much with a file if you don't know what's inside it. Who's going to listen to dozens, hundreds, or even thousands of hours of sound?

When it comes to public preservation, we've been working with the Internet Archive from the very beginning. We provide stations, individuals, universities, and all of our customers with a method for backing up and storing their audio through a noncommercial partner — that’s the beauty of archive.org, that it promises to be there 10 years from now. Its exciting to see people posting on listservs about the best places to save their work. That decision will always be a personal one, whether you have hard-disk backup or use some cloud solution, but to see people talking about the Internet Archive as a preferred, public facing solution and comparing that to more commercial solutions like SoundCloud… in terms of the long game for content creators, it's a toss-up.

There was a Huffington Post article recently about the podcasts you should be following—they, and everybody else, have been publishing this story recently—and in it they use the SoundCloud embeddable player for every example. And then, at the bottom of the article, they point the reader to download each podcast from iTunes or Stitcher. Clearly some decision, conscious or unconscious, to use iTunes or Stitcher as the two channels that you'd go to for downloading podcasts, but SoundCloud was the way to listen in the moment.

I think that's interesting. It proves once again that there is no single solution.

BAILEY (to Anne): Remember when you gave me such a hard time because I didn't have the iOS podcast app on my phone because I hated it so much? Over the years, I've cobbled together so many different methods of consuming the media that I like. I have the “This American Life" app, the "Planet Money” app, the Radiolab app, NPR news app, the PRX app… And now I’ve returned to subscribing to podcasts the old fashioned way, iTunes.

MIRANDA: The the iOS podcast app and iTunes are just the worst. Now, Overcast won me over because, in your app settings, it says "Overcast not for you? Support independent developers … ” and suggests other management solutions. I really like that. Plus the management features are pretty great.

ANNE: There are real “discovery” opportunities for audio right now. Yes, SoundCloud is one of them, but you're on point to note that SoundCloud has a particularly robust community of musicians and music lovers over audio storytellers and spoken word. PRX’s platform will only continue to grow in its role supporting independent producers and helping their stories get discovered

The necessary formula for a podcast to become successful, like RadioLab for example, still requires a very particular confluence of events. In some ways, if audio is really going from niche to mainstream, then audio as we're describing it right now is only represented by public media and podcaster communities. Those are small communities. For any mainstream media that has to change, whether it's Justin Bieber getting discovered on YouTube or Lorde on SoundCloud, right now being mentioned on This American Life is pretty much the way that successful audio producers and shows make it.

Do you think there is a new way to connect listeners to new content, as well as other listeners?

ANNE: This is where structured metadata plays a significant role. The trick is to strike a balance between identifying what a show or series is about vs. identifying its mood and its tone. Human recommendation and high quality content are critical — it's not because of structured metadata that Serial got popular. There is a significant human component when it comes to creating the kind of structured, descriptive, machine readable data that can be used to surface audio content for new and different audiences.

MIRANDA: Ok, has anyone ever made jokes to you about being librarians?

ANNE: Are you kidding me?! I almost went to library science school. Bailey and I were in the minority but we pushed for a self-directed, archival and library science crash course for ourselves at Berkeley School of Information.

MIRANDA: This has been such an amazing conversation. I could talk to you two for hours about all of the audio story friction points.

Ok, seriously, what are your favorite shows?

ANNE: Oh, yeah, Serial is not my only answer.

BAILEY: My very favorite is, and they just changed their name, "The Heart."

MIRANDA: I love that show.

BAILEY: I've learned so much from listening to that show. Two of my very favorite podcasts are The Heart, which is sort of explaining the world of sexuality. And also Planet Money, explaining the world of economics. You know, we could combine them.

[laughter]

MIRANDA: When I first came across The Heart it was still called "Audio Smut." I remember sending it to a friend noting it as “so dirty, and so not dirty all at the same time.” Ha!

BAILEY: It's just such a surprising and interesting show. I always get excited about anything that can show me a part of the world that I just wouldn't have had access to otherwise.

ANNE: Oh, this is going to be awful, but I'll offer the perfect counter to that — I’ve been listening to Startup because it describes an experience I’ve navigated step-by-step, verbatim. I find myself cringing as I listen because as opposed to learning something about a world I know nothing about, this show is about what we have done all year. Particularly as the CEO of an audio startup, it's like, I just can't help myself.


More podcasts and audio on the Web stories:


Latest Posts

  • A Google Spreadsheets change affecting TimelineJS users

    Google recently changed something about their Sheets service which is causing many people to run into an error when they are making a new timeline. Note: there should be no impact on existing timelines! After this change, many of you click on the "preview" and get this message: An unexpected error occurred trying to read your spreadsheet data [SyntaxError] Timeline configuration has no events. There is a straightforward work-around, but it requires those of you who have...

    Continue Reading

  • How Americans think and feel about gun violence

    A man killed his wife, then himself. I want you to see his face and learn that he enjoyed fishing with his grandchildren. A small-time drug dealer is shot by two men in a parking lot. I find his Facebook profile and a photo shows him striking a playfully irreverent pose, giving the camera the middle finger. The photo’s comments take a mournful turn after a certain date. “Rest easy bro ???” Gun Memorial runs...

    Continue Reading

  • Software developers interested in journalism: Northwestern and The Washington Post want you!

    Northwestern University and The Washington Post are offering a unique opportunity for two talented software developers interested in applying their programming skills in media and journalism. Here’s the proposition: (1) a full-tuition scholarship to earn a master’s degree in journalism at Northwestern University, followed by (2) a six-month paid internship with The Post’s world-class engineering team, with the possibility of subsequent full-time employment. These opportunities are made possible by the John S. and James L....

    Continue Reading

  • What happened when Gun Memorial let anyone contribute directly to victim profiles

    If you’re reporting local or niche news, there’s a good chance that your audience collectively knows more about the story than you do. That’s especially true for us at Gun Memorial, a small publication with a nationwide mission of covering every American who is shot dead. In our latest, mostly successful, experiment, we let readers add to our stories without editor intervention. This article shares some lessons from that experience. Asking for reader contributions A...

    Continue Reading

  • How conversational interfaces make the internet more accessible for everyone

    This story is part of a series on bringing the journalism we produce to as many people as possible, regardless of language, access to technology, or physical capability. Find the series introduction, as well as a list of published stories here. In 2004, human-computer interaction professor Alan Dix published the third edition of Human-Computer Interaction along with his colleagues, Janet Finley, Gregory Abowd, and Russell Beale. In a chapter called “The Interaction,” the authors wrote...

    Continue Reading

  • Three tools to help you make colorblind-friendly graphics

    This story is part of a series on bringing the journalism we produce to as many people as possible, regardless of language, access to technology, or physical capability. Find the series introduction, as well as a list of published stories here. I am one of the 8% of men of Northern European descent who suffers from red-green colorblindness. Specifically, I have a mild case of protanopia (also called protanomaly), which means that my eyes lack...

    Continue Reading

Storytelling Tools

We build easy-to-use tools that can help you tell better stories.

View More