Behind the dialect map interactive: How an intern created The New York Times' most popular piece of content in 2013

NYT's most popular piece of content in 2013 — “How Y’all, Youse and You Guys Talk” generates a personalized dialect map based upon user responses compared to data from more than 350,000 survey responses collected in 2013.

How do you create the most popular piece of content of the year at one of the nation’s most prestigious news outlet?

Well, for starters, study or consider careers in politics, law, and philosophy before eventually deciding that statistics is for you. Then apply to grad school and while you're there dig in to some intriguing data that Harvard researchers had published 10 years prior, apply some stats and smart algorithms, post your work online, then wait for The New York Times to call.

That’s not the whole story of course, but it’s the rough run-up to how Josh Katz ended up an intern at the Times last fall and eventually created  (with graphics editor Wilson Andrews) the newspaper’s most popular piece of content in 2013 — “How Y’all, Youse and You Guys Talk.”

“I’d enjoyed the news as a consumer,” Katz said, “but I'd never really pictured myself as being a part of the journalism world.”

“I’d always had an interest in data visualization and finding a way of communicating results graphically,” he said. “What I didn’t realize is that that is essentially a lot of what they do at Times graphics, so it was really a perfect fit.”

Katz’s personal journey to the Times is a fun one, but the story of the technology behind the popular project is just as good.

The Harvard Dialect Survey maps created by researchers in 2003.

Last March Katz was a grad student in the Department of Statistics at North Carolina State University and had recently decided he wanted to look more closely at an interesting set of data he’d seen 10 years prior, the Harvard Dialect Survey.

The study was based on the responses of more than 50,000 people to 122 questions on dialect, and had been presented by the researchers (Bert Vaux and Scott Golder) as a series of colored points on a map. While the data was interesting, Katz wanted to show a more elegant “smoothed estimate” of the same data.

Using the k-nearest neighbor algorithm and kernel density estimation (more detail here) he created a series of maps that showed the Harvard data in a series of maps most of us would call heat maps.

In June he posted those maps on the North Carolina State University website and on RStudio.com, a community site for R developers.

By August the graphics desk at the Times had discovered them and invited him to New York for an internship starting in September.

Though satisfied with the work he’d done with the data thus far, Katz had also come up with a plan to verify and update the data and turn it in to a quiz.

A map from Katz's smoothing project based on the Harvard dialect data.

To do this he’d need to whittle down the original set of 122 questions in to a manageable number. He’d also need to figure out if dialects in the United States had changed over the last 10 years.

Using the suggestions from the online community he came up with 20 additional questions he thought would help him determine changes in dialect and built a survey of more than 140 questions (the original 122 plus his 20 new ones), and posted it on RStudio.com.

“One of the great things about doing this online was that you get all of this instant feedback and a lot of people have great suggestions,” Katz said.

Of the 140 questions asked, a good portion didn’t tell Katz much.

Pancakes or flapjacks? Everyone says pancakes, Katz said. So that question and about 120 others were thrown out.

To effectively find the most telling though, Katz surveyed 350,000 people not just on dialect questions, but also on age, gender, and where they lived.

The key to the project was Katz’s stats background.

“Getting from the point reference data to having a continuous estimate is really the back bone of the quiz,” Katz said. “There’s this statistical underpinning to the whole project.”

With the most-telling questions in hand, Katz and Andrews set about building the app you see on the Times site. They used D3 and the canvas element to visualize and render the maps. The three cities you get at the end of the quiz are plucked from a database of 150 and are simply the cities where residents are most likely to answer questions like you did.

There are some interesting wrinkles to the project. For one, it’s not designed to predict you where you grew up, Katz said. The quiz simply shows you the region of the country where the dialect most closely matches your own. (Seriously, Bryant Gumble, take note.)

And while that wrinkle may disappoint some users, it also helps make the project intriguing.

“In a way it’s more interesting,” Katz said. “For a lot of people the quiz will show them where their parents grew up.”

Interestingly, the quiz almost didn’t make it onto the Times’ site.

Katz had pitched an enthusiastic group of editors on the project earlier in his internship, but by mid December, with Katz’s time winding down, the quiz still wasn’t up.

On December 21 the quiz was posted and by the end of the year had become the site’s most popular piece of content for 2013.

“I’m pretty blown away by the response to the whole thing,” Katz said. But he can understand the project’s success.

“Dialect is all about people’s sense of identity — ‘this is who I am, this is where I come from,’” he said.

But beyond sentimentality or being able to identify your roots, it’s an entertaining feature.

“At the end of the day it’s fun,” he said.

Internship complete, Katz will start as a staff editor with the Times’ new data journalism project in the next few weeks.

Thanks for reading! While you're here, check out our free tools for journalists and digital storytellers: StoryMapJS (in beta), SoundCiteJS, TimelineJS, and twXplorer.

About the author

Ryan Graff

Communications and Outreach Manager, 2011-2016

Journalism, revenue, whitewater, former carny. Recently loving some quality time @KelloggSchool.

Latest Posts

  • Building a Community for VR and AR Storytelling

    In 2016 we founded the Device Lab to provide a hub for the exploration of AR/VR storytelling on campus. In addition to providing access to these technologies for Medill and the wider Northwestern community, we’ve also pursued a wide variety of research and experimental content development projects. We’ve built WebVR timelines of feminist history and looked into the inner workings of ambisonic audio. We’ve built virtual coral reefs and prototyped an AR experience setting interviews...

    Continue Reading

  • A Brief Introduction to NewsgamesCan video games be used to tell the news?

    When the Financial Times released The Uber Game in 2017, the game immediately gained widespread popularity with more than 360,000 visits, rising up the ranks as the paper’s most popular interactive piece of the year. David Blood, the game’s lead developer, said that the average time spent on the page was about 20 minutes, which was substantially longer than what most Financial Times interactives tend to receive, according to Blood. The Uber Game was so successful that the Financial...

    Continue Reading

  • With the 25th CAR Conference upon us, let’s recall the first oneWhen the Web was young, data journalism pioneers gathered in Raleigh

    For a few days in October 1993, if you were interested in journalism and technology, Raleigh, North Carolina was the place you had to be. The first Computer-Assisted Reporting Conference offered by Investigative Reporters & Editors brought more than 400 journalists to Raleigh for 3½ days of panels, demos and hands-on lessons in how to use computers to find stories in data. That seminal event will be commemorated this week at the 25th CAR Conference, which...

    Continue Reading

  • Prototyping Augmented Reality

    Something that really frustrates me is that, while I’m excited about the potential AR has for storytelling, I don’t feel like I have really great AR experiences that I can point people to. We know that AR is great for taking a selfie with a Pikachu and it’s pretty good at measuring spaces (as long as your room is really well lit and your phone is fully charged) but beyond that, we’re really still figuring...

    Continue Reading

  • Capturing the Soundfield: Recording Ambisonics for VR

    When building experiences in virtual reality we’re confronted with the challenge of mimicking how sounds hit us in the real world from all directions. One useful tool for us to attempt this mimicry is called a soundfield microphone. We tested one of these microphones to explore how audio plays into building immersive experiences for virtual reality. Approaching ambisonics with the soundfield microphone has become popular in development for VR particularly for 360 videos. With it,...

    Continue Reading

  • Audience Engagement and Onboarding with Hearken Auditing the News Resurrecting History for VR Civic Engagement with City Bureau Automated Fact Checking Conversational Interface for News Creative Co-Author Crowdsourcing for Journalism Environmental Reporting with Sensors Augmented Reality Visualizations Exploring Data Visualization in VR Fact Flow Storytelling with GIFs Historical Census Data Information Spaces in AR/VR Contrasting Forms Of Interactive 3D Storytelling Interactive Audio Juxtapose Legislator Tracker Storytelling with Augmented Reality Music Magazine Navigating Virtual Reality Open Data Reporter Oscillations Personalize My Story Photo Bingo Photojournalism in 3D for VR and Beyond Podcast Discoverability Privacy Mirror Projection Mapping ProPublica Illinois Rethinking Election Coverage SensorGrid API and Dashboard Sidebar Smarter News Exploring Software Defined Radio Story for You Storyline: Charts that tell stories. Storytelling Layers on 360 Video Talking to Data Visual Recipes Watch Me Work Writing and Designing for Chatbots
  • Prototyping Spatial Audio for Movement Art

    One of Oscillations’ technical goals for this quarter’s Knight Lab Studio class was an exploration of spatial audio. Spatial audio is sound that exists in three dimensions. It is a perfect complement to 360 video, because sound sources can be localized to certain parts of the video. Oscillations is especially interested in using spatial audio to enhance the neuroscientific principles of audiovisual synchrony that they aim to emphasize in their productions. Existing work in spatial......

    Continue Reading

Storytelling Tools

We build easy-to-use tools that can help you tell better stories.

View More