Since joining Knight Lab as student fellows in April, Michael Martinez and I have been thinking about podcasting technology and online audio in hopes that a project idea would emerge. We're obviously not alone. The last 12 months have seen the rise of highly popular shows like Serial and advanced mobile or in-car "podcatching" platforms. Just last week, podcasts marked another milestone when President Obama appeared on Marc Maron's WTF podcast. Unfortunately, a handful of problems still plague podcasting; many of those challenges were covered by previous student fellow Neil Holt.
Even after reading widely on issues affecting podcasts, we had quite a bit of trouble trying to determine the most fitting angle of attack. It was difficult to choose a relevant, bite-sized project that we could accomplish and would be useful. We looked into building ID3 editors, developing Chrome extensions, designing new podcatchers, and more. Eventually, we settled on building a platform that primarily helps with the discoverability and categorization of podcasts.
Our (currently nameless) project creates topical podcast feeds comprised of episodes from many other sources. The general goal is to let podcast listeners subscribe to a topic instead of a series and be presented with episodes from series and producers that they may have otherwise overlooked. We accumulate existing podcast feeds and pick out each episode in the feed. Then, the episodes are categorized by topic area and saved into our database of episodes. When the user searches for a topic, we can query the database to construct new podcast feeds containing episodes that are all related to the searched topic. Users should also be able to subscribe to the topical podcast feeds on their mobile podcatchers.
After finding our project direction, we set about choosing some technologies. Michael and I had very little experience with databases and had primarily used Node.js and Express to build web applications in the past. We looked into a few technologies, including Parse, Flask, and others. With some encouragement, we decided to learn Django and use it for this project. It took a while to understand, but with some guidance from other engineers and student fellows at Knight Lab, we were able to start building.
There are still a lot of questions to answer on both the engineering and design sides of the project.
On the engineering side, the biggest area for improvement is in the categorization of the episodes. We are working on how we can best use keyword extraction and other natural language processing concepts to best classify the episodes. Additionally, we would like to test the project with podcast listeners to help build our sense of the most crucial functions of this project. We need a better understanding of how to maximize the value and utility of this project for potential users. We’re a long way from a completed project, but we’ll keep going. Check back in for updates.
About the author