Introducing Neighborhood Buzz

Neightborhood Buzz, Chicago, O'hare, Science tweets

As social media have become a regular part of daily life, people have wondered what they can learn about themselves and their communities from the millions of messages posted online—especially on Twitter, because it is so public and so conversational. Many projects in this space begin by selecting tweets for analysis based on who tweeted or specific terms used in the tweets. Students in our Fall 2012 Innovation in Journalism and Technology class wanted instead to explore what could be learned by grouping tweets based on their geolocation. Building on that first prototype, the Knight Lab staff has developed the idea a bit further, leading to today's release of Neighborhood Buzz.

In short, Neighborhood Buzz is an experiment in summarizing the topics of conversation on Twitter by neighborhood in 20 cities across the country. Neighborhood Buzz continually collects tweets, assigns them to a neighborhood and categorizes them topically. When you visit a city page, Buzz shows you the number of geolocated tweets in that city in each category over the last week. If you click on a neighborhood on the map, the numbers adjust reflect tweets geolocated in that neighborhood. Or, you can click on a topical category in the list, which displays a heatmap overlaid on the city map showing how much each neighborhood tweets about that topic. (More on that below.)  You can also click on the arrow next to each category to see a sampling of tweets in that category for the selected city or neighborhood.

The original student team focused on Chicago and used a simple mathematical algorithm (rounding to grid coordinates) to assign tweets to neighborhoods. While we were developing this new version of Neighborhood Buzz, the Code For America Louisville fellows released a fun project called Click that 'Hood. As part of that project, they've collected a trove of neighborhood maps for cities around the world. Their maps and our decision to use PostGIS for our database made it easy to add many more cities to the project. (Unfortunately for our friends outside the United States, for technical reasons involving Twitter's streaming API, we had to use a geographic filter that only processes tweets geocoded somewhere in the U.S.)

We started with all of the neighborhood maps collected by Click that 'Hood. After a little experimentation, we found that there just weren't enough geocoded tweets per neighborhood in many of the cities for statistical analysis. We looked at the totals and saw a pretty natural break after the top twenty cities, so we decided to limit our project to those. (Technically, we have 15 cities; four of the five boroughs of New York City; and Los Angeles County including neighborhoods of the city of Los Angeles).

In developing Neighborhood Buzz to this point, we find two continuing challenges … tweets are simply hard to classify using traditional text analysis methods … only a very small number of tweets are geocoded.

Once we sort tweets into neighborhoods, we use a topical classifier to assign them to one of nine categories. The classifier provides a score for each category reflecting its "confidence" that the tweet belongs in that category. We assign the tweet to whichever category has the highest score.

In developing Neighborhood Buzz to this point, we find two continuing challenges. The primary problem is that tweets are simply hard to classify using traditional text analysis methods. They are chatty, full of abbreviations and slang, and of course, just short. About one-third of the tweets we attempt to classify don't get any scores at all, and so aren't shown in Buzz. Additionally, classifiers must be trained using labeled texts, and the texts we had available for this purpose were from a different category altogether (news stories). For these reasons, you'll sometimes see quirks such as the word 'party' often leading a tweet to be put in the 'politics' category.

Finally, at the current time only a very small number of tweets are geocoded. A couple of random samples indicate that about 1.5% of all tweets have a location associated with them. And many of those are geocoded because they are sent by a third-party service, such as FourSquare or Instagram, so it's a bit harder to say that we know what people are "talking about" in a certain neighborhood. (Also, Instagram's default text, "Just posted a photo," results in a disproportionate number of tweets being labeled about "Art.")

And, getting back to the heat map: any time one uses a heat map to summarize data points, it's easy to simply reproduce a population density map. Computing the per-neighborhood population was outside the scope of our project, and Twitter users are not evenly distributed among the general public. So, we looked for a different way to normalize the data. In our current implementation, we compute the percentage of all tweets in the given category that came from a given neighborhood. In practice, it's rare that any one neighborhood is the source of more than 20% of all tweets in a category, so the maps are often a fairly uniform color.

We’re releasing Neighborhood Buzz now so people can use it and tell us what they think. Our plans for future development will depend on what we hear. We've had some conversations internally about other approaches to aggregating and summarizing the tweets in a neighborhood. I'm sure we'll have other projects for which we’ll continue working on the general challenge of categorizing tweets, and some of those advances may make their way back into Buzz. In the meantime, try it and tell us what you think!

About the author

Joe Germuska

Chief Nerd

Joe runs Knight Lab’s technology, professional staff and student fellows. Before joining us, Joe was on the Chicago Tribune News Apps team. Also, he hosts a weekly radio show on WNUR-FM – Conference of the Birds.

Latest Posts

  • Building a Community for VR and AR Storytelling

    In 2016 we founded the Device Lab to provide a hub for the exploration of AR/VR storytelling on campus. In addition to providing access to these technologies for Medill and the wider Northwestern community, we’ve also pursued a wide variety of research and experimental content development projects. We’ve built WebVR timelines of feminist history and looked into the inner workings of ambisonic audio. We’ve built virtual coral reefs and prototyped an AR experience setting interviews...

    Continue Reading

  • A Brief Introduction to NewsgamesCan video games be used to tell the news?

    When the Financial Times released The Uber Game in 2017, the game immediately gained widespread popularity with more than 360,000 visits, rising up the ranks as the paper’s most popular interactive piece of the year. David Blood, the game’s lead developer, said that the average time spent on the page was about 20 minutes, which was substantially longer than what most Financial Times interactives tend to receive, according to Blood. The Uber Game was so successful that the Financial...

    Continue Reading

  • With the 25th CAR Conference upon us, let’s recall the first oneWhen the Web was young, data journalism pioneers gathered in Raleigh

    For a few days in October 1993, if you were interested in journalism and technology, Raleigh, North Carolina was the place you had to be. The first Computer-Assisted Reporting Conference offered by Investigative Reporters & Editors brought more than 400 journalists to Raleigh for 3½ days of panels, demos and hands-on lessons in how to use computers to find stories in data. That seminal event will be commemorated this week at the 25th CAR Conference, which...

    Continue Reading

  • Prototyping Augmented Reality

    Something that really frustrates me is that, while I’m excited about the potential AR has for storytelling, I don’t feel like I have really great AR experiences that I can point people to. We know that AR is great for taking a selfie with a Pikachu and it’s pretty good at measuring spaces (as long as your room is really well lit and your phone is fully charged) but beyond that, we’re really still figuring...

    Continue Reading

  • Capturing the Soundfield: Recording Ambisonics for VR

    When building experiences in virtual reality we’re confronted with the challenge of mimicking how sounds hit us in the real world from all directions. One useful tool for us to attempt this mimicry is called a soundfield microphone. We tested one of these microphones to explore how audio plays into building immersive experiences for virtual reality. Approaching ambisonics with the soundfield microphone has become popular in development for VR particularly for 360 videos. With it,...

    Continue Reading

  • Audience Engagement and Onboarding with Hearken Auditing the News Resurrecting History for VR Civic Engagement with City Bureau Automated Fact Checking Conversational Interface for News Creative Co-Author Crowdsourcing for Journalism Environmental Reporting with Sensors Augmented Reality Visualizations Exploring Data Visualization in VR Fact Flow Storytelling with GIFs Historical Census Data Information Spaces in AR/VR Contrasting Forms Of Interactive 3D Storytelling Interactive Audio Juxtapose Legislator Tracker Storytelling with Augmented Reality Music Magazine Navigating Virtual Reality Open Data Reporter Oscillations Personalize My Story Photo Bingo Photojournalism in 3D for VR and Beyond Podcast Discoverability Privacy Mirror Projection Mapping ProPublica Illinois Rethinking Election Coverage SensorGrid API and Dashboard Sidebar Smarter News Exploring Software Defined Radio Story for You Storyline: Charts that tell stories. Storytelling Layers on 360 Video Talking to Data Visual Recipes Watch Me Work Writing and Designing for Chatbots
  • Prototyping Spatial Audio for Movement Art

    One of Oscillations’ technical goals for this quarter’s Knight Lab Studio class was an exploration of spatial audio. Spatial audio is sound that exists in three dimensions. It is a perfect complement to 360 video, because sound sources can be localized to certain parts of the video. Oscillations is especially interested in using spatial audio to enhance the neuroscientific principles of audiovisual synchrony that they aim to emphasize in their productions. Existing work in spatial......

    Continue Reading

Storytelling Tools

We build easy-to-use tools that can help you tell better stories.

View More