Behind the dialect map interactive: How an intern created The New York Times' most popular piece of content in 2013

NYT's most popular piece of content in 2013 — “How Y’all, Youse and You Guys Talk” generates a personalized dialect map based upon user responses compared to data from more than 350,000 survey responses collected in 2013.

How do you create the most popular piece of content of the year at one of the nation’s most prestigious news outlet?

Well, for starters, study or consider careers in politics, law, and philosophy before eventually deciding that statistics is for you. Then apply to grad school and while you're there dig in to some intriguing data that Harvard researchers had published 10 years prior, apply some stats and smart algorithms, post your work online, then wait for The New York Times to call.

That’s not the whole story of course, but it’s the rough run-up to how Josh Katz ended up an intern at the Times last fall and eventually created  (with graphics editor Wilson Andrews) the newspaper’s most popular piece of content in 2013 — “How Y’all, Youse and You Guys Talk.”

“I’d enjoyed the news as a consumer,” Katz said, “but I'd never really pictured myself as being a part of the journalism world.”

“I’d always had an interest in data visualization and finding a way of communicating results graphically,” he said. “What I didn’t realize is that that is essentially a lot of what they do at Times graphics, so it was really a perfect fit.”

Katz’s personal journey to the Times is a fun one, but the story of the technology behind the popular project is just as good.

The Harvard Dialect Survey maps created by researchers in 2003.

Last March Katz was a grad student in the Department of Statistics at North Carolina State University and had recently decided he wanted to look more closely at an interesting set of data he’d seen 10 years prior, the Harvard Dialect Survey.

The study was based on the responses of more than 50,000 people to 122 questions on dialect, and had been presented by the researchers (Bert Vaux and Scott Golder) as a series of colored points on a map. While the data was interesting, Katz wanted to show a more elegant “smoothed estimate” of the same data.

Using the k-nearest neighbor algorithm and kernel density estimation (more detail here) he created a series of maps that showed the Harvard data in a series of maps most of us would call heat maps.

In June he posted those maps on the North Carolina State University website and on, a community site for R developers.

By August the graphics desk at the Times had discovered them and invited him to New York for an internship starting in September.

Though satisfied with the work he’d done with the data thus far, Katz had also come up with a plan to verify and update the data and turn it in to a quiz.

A map from Katz's smoothing project based on the Harvard dialect data.

To do this he’d need to whittle down the original set of 122 questions in to a manageable number. He’d also need to figure out if dialects in the United States had changed over the last 10 years.

Using the suggestions from the online community he came up with 20 additional questions he thought would help him determine changes in dialect and built a survey of more than 140 questions (the original 122 plus his 20 new ones), and posted it on

“One of the great things about doing this online was that you get all of this instant feedback and a lot of people have great suggestions,” Katz said.

Of the 140 questions asked, a good portion didn’t tell Katz much.

Pancakes or flapjacks? Everyone says pancakes, Katz said. So that question and about 120 others were thrown out.

To effectively find the most telling though, Katz surveyed 350,000 people not just on dialect questions, but also on age, gender, and where they lived.

The key to the project was Katz’s stats background.

“Getting from the point reference data to having a continuous estimate is really the back bone of the quiz,” Katz said. “There’s this statistical underpinning to the whole project.”

With the most-telling questions in hand, Katz and Andrews set about building the app you see on the Times site. They used D3 and the canvas element to visualize and render the maps. The three cities you get at the end of the quiz are plucked from a database of 150 and are simply the cities where residents are most likely to answer questions like you did.

There are some interesting wrinkles to the project. For one, it’s not designed to predict you where you grew up, Katz said. The quiz simply shows you the region of the country where the dialect most closely matches your own. (Seriously, Bryant Gumble, take note.)

And while that wrinkle may disappoint some users, it also helps make the project intriguing.

“In a way it’s more interesting,” Katz said. “For a lot of people the quiz will show them where their parents grew up.”

Interestingly, the quiz almost didn’t make it onto the Times’ site.

Katz had pitched an enthusiastic group of editors on the project earlier in his internship, but by mid December, with Katz’s time winding down, the quiz still wasn’t up.

On December 21 the quiz was posted and by the end of the year had become the site’s most popular piece of content for 2013.

“I’m pretty blown away by the response to the whole thing,” Katz said. But he can understand the project’s success.

“Dialect is all about people’s sense of identity — ‘this is who I am, this is where I come from,’” he said.

But beyond sentimentality or being able to identify your roots, it’s an entertaining feature.

“At the end of the day it’s fun,” he said.

Internship complete, Katz will start as a staff editor with the Times’ new data journalism project in the next few weeks.

Thanks for reading! While you're here, check out our free tools for journalists and digital storytellers: StoryMapJS (in beta), SoundCiteJS, TimelineJS, and twXplorer.

About the author

Ryan Graff

Communications and Outreach Manager, 2011-2016

Journalism, revenue, whitewater, former carny. Recently loving some quality time @KelloggSchool.

Latest Posts

  • Prototyping Augmented Reality

    Something that really frustrates me is that, while I’m excited about the potential AR has for storytelling, I don’t feel like I have really great AR experiences that I can point people to. We know that AR is great for taking a selfie with a Pikachu and it’s pretty good at measuring spaces (as long as your room is really well lit and your phone is fully charged) but beyond that, we’re really still figuring...

    Continue Reading

  • Capturing the Soundfield: Recording Ambisonics for VR

    When building experiences in virtual reality we’re confronted with the challenge of mimicking how sounds hit us in the real world from all directions. One useful tool for us to attempt this mimicry is called a soundfield microphone. We tested one of these microphones to explore how audio plays into building immersive experiences for virtual reality. Approaching ambisonics with the soundfield microphone has become popular in development for VR particularly for 360 videos. With it,...

    Continue Reading

  • How to translate live-spoken human words into computer “truth”

    Our Knight Lab team spent three months in Winter 2018 exploring how to combine various technologies to capture, interpret, and fact check live broadcasts from television news stations, using Amazon’s Alexa personal assistant device as a low-friction way to initiate the process. The ultimate goal was to build an Alexa skill that could be its own form of live, automated fact-checking: cross-referencing a statement from a politician or otherwise newsworthy figure against previously fact-checked statements......

    Continue Reading

  • Northwestern is hiring a CS + Journalism professor

    Work with us at the intersection of media, technology and design.

    Are you interested in working with journalism and computer science students to build innovative media tools, products and apps? Would you like to teach the next generation of media innovators? Do you have a track record building technologies for journalists, publishers, storytellers or media consumers? Northwestern University is recruiting for an assistant or associate professor for computer science AND journalism, who will share an appointment in the Medill School of Journalism and the McCormick School...

    Continue Reading

  • Introducing StorylineJS

    Today we're excited to release a new tool for storytellers.

    StorylineJS makes it easy to tell the story behind a dataset, without the need for programming or data visualization expertise. Just upload your data to Google Sheets, add two columns, and fill in the story on the rows you want to highlight. Set a few configuration options and you have an annotated chart, ready to embed on your website. (And did we mention, it looks great on phones?) As with all of our tools, simplicity...

    Continue Reading

  • Join us in October: NU hosts the Computation + Journalism 2017 symposium

    An exciting lineup of researchers, technologists and journalists will convene in October for Computation + Journalism Symposium 2017 at Northwestern University. Register now and book your hotel rooms for the event, which will take place on Friday, Oct. 13, and Saturday, Oct. 14 in Evanston, IL. Hotel room blocks near campus are filling up fast! Speakers will include: Ashwin Ram, who heads research and development for Amazon’s Alexa artificial intelligence (AI) agent, which powers the...

    Continue Reading

Storytelling Tools

We build easy-to-use tools that can help you tell better stories.

View More