Behind the dialect map interactive: How an intern created The New York Times' most popular piece of content in 2013

NYT's most popular piece of content in 2013 — “How Y’all, Youse and You Guys Talk” generates a personalized dialect map based upon user responses compared to data from more than 350,000 survey responses collected in 2013.

How do you create the most popular piece of content of the year at one of the nation’s most prestigious news outlet?

Well, for starters, study or consider careers in politics, law, and philosophy before eventually deciding that statistics is for you. Then apply to grad school and while you're there dig in to some intriguing data that Harvard researchers had published 10 years prior, apply some stats and smart algorithms, post your work online, then wait for The New York Times to call.

That’s not the whole story of course, but it’s the rough run-up to how Josh Katz ended up an intern at the Times last fall and eventually created  (with graphics editor Wilson Andrews) the newspaper’s most popular piece of content in 2013 — “How Y’all, Youse and You Guys Talk.”

“I’d enjoyed the news as a consumer,” Katz said, “but I'd never really pictured myself as being a part of the journalism world.”

“I’d always had an interest in data visualization and finding a way of communicating results graphically,” he said. “What I didn’t realize is that that is essentially a lot of what they do at Times graphics, so it was really a perfect fit.”

Katz’s personal journey to the Times is a fun one, but the story of the technology behind the popular project is just as good.

The Harvard Dialect Survey maps created by researchers in 2003.

Last March Katz was a grad student in the Department of Statistics at North Carolina State University and had recently decided he wanted to look more closely at an interesting set of data he’d seen 10 years prior, the Harvard Dialect Survey.

The study was based on the responses of more than 50,000 people to 122 questions on dialect, and had been presented by the researchers (Bert Vaux and Scott Golder) as a series of colored points on a map. While the data was interesting, Katz wanted to show a more elegant “smoothed estimate” of the same data.

Using the k-nearest neighbor algorithm and kernel density estimation (more detail here) he created a series of maps that showed the Harvard data in a series of maps most of us would call heat maps.

In June he posted those maps on the North Carolina State University website and on RStudio.com, a community site for R developers.

By August the graphics desk at the Times had discovered them and invited him to New York for an internship starting in September.

Though satisfied with the work he’d done with the data thus far, Katz had also come up with a plan to verify and update the data and turn it in to a quiz.

A map from Katz's smoothing project based on the Harvard dialect data.

To do this he’d need to whittle down the original set of 122 questions in to a manageable number. He’d also need to figure out if dialects in the United States had changed over the last 10 years.

Using the suggestions from the online community he came up with 20 additional questions he thought would help him determine changes in dialect and built a survey of more than 140 questions (the original 122 plus his 20 new ones), and posted it on RStudio.com.

“One of the great things about doing this online was that you get all of this instant feedback and a lot of people have great suggestions,” Katz said.

Of the 140 questions asked, a good portion didn’t tell Katz much.

Pancakes or flapjacks? Everyone says pancakes, Katz said. So that question and about 120 others were thrown out.

To effectively find the most telling though, Katz surveyed 350,000 people not just on dialect questions, but also on age, gender, and where they lived.

The key to the project was Katz’s stats background.

“Getting from the point reference data to having a continuous estimate is really the back bone of the quiz,” Katz said. “There’s this statistical underpinning to the whole project.”

With the most-telling questions in hand, Katz and Andrews set about building the app you see on the Times site. They used D3 and the canvas element to visualize and render the maps. The three cities you get at the end of the quiz are plucked from a database of 150 and are simply the cities where residents are most likely to answer questions like you did.

There are some interesting wrinkles to the project. For one, it’s not designed to predict you where you grew up, Katz said. The quiz simply shows you the region of the country where the dialect most closely matches your own. (Seriously, Bryant Gumble, take note.)

And while that wrinkle may disappoint some users, it also helps make the project intriguing.

“In a way it’s more interesting,” Katz said. “For a lot of people the quiz will show them where their parents grew up.”

Interestingly, the quiz almost didn’t make it onto the Times’ site.

Katz had pitched an enthusiastic group of editors on the project earlier in his internship, but by mid December, with Katz’s time winding down, the quiz still wasn’t up.

On December 21 the quiz was posted and by the end of the year had become the site’s most popular piece of content for 2013.

“I’m pretty blown away by the response to the whole thing,” Katz said. But he can understand the project’s success.

“Dialect is all about people’s sense of identity — ‘this is who I am, this is where I come from,’” he said.

But beyond sentimentality or being able to identify your roots, it’s an entertaining feature.

“At the end of the day it’s fun,” he said.

Internship complete, Katz will start as a staff editor with the Times’ new data journalism project in the next few weeks.

Thanks for reading! While you're here, check out our free tools for journalists and digital storytellers: StoryMapJS (in beta), SoundCiteJS, TimelineJS, and twXplorer.

About the author

Ryan Graff

Journalism, revenue, whitewater, former carny. Recently loving some quality time @KelloggSchool.

Latest Posts

  • A Google Spreadsheets change affecting TimelineJS users

    Google recently changed something about their Sheets service which is causing many people to run into an error when they are making a new timeline. Note: there should be no impact on existing timelines! After this change, many of you click on the "preview" and get this message: An unexpected error occurred trying to read your spreadsheet data [SyntaxError] Timeline configuration has no events. There is a straightforward work-around, but it requires those of you who have...

    Continue Reading

  • How Americans think and feel about gun violence

    A man killed his wife, then himself. I want you to see his face and learn that he enjoyed fishing with his grandchildren. A small-time drug dealer is shot by two men in a parking lot. I find his Facebook profile and a photo shows him striking a playfully irreverent pose, giving the camera the middle finger. The photo’s comments take a mournful turn after a certain date. “Rest easy bro ???” Gun Memorial runs...

    Continue Reading

  • Software developers interested in journalism: Northwestern and The Washington Post want you!

    Northwestern University and The Washington Post are offering a unique opportunity for two talented software developers interested in applying their programming skills in media and journalism. Here’s the proposition: (1) a full-tuition scholarship to earn a master’s degree in journalism at Northwestern University, followed by (2) a six-month paid internship with The Post’s world-class engineering team, with the possibility of subsequent full-time employment. These opportunities are made possible by the John S. and James L....

    Continue Reading

  • What happened when Gun Memorial let anyone contribute directly to victim profiles

    If you’re reporting local or niche news, there’s a good chance that your audience collectively knows more about the story than you do. That’s especially true for us at Gun Memorial, a small publication with a nationwide mission of covering every American who is shot dead. In our latest, mostly successful, experiment, we let readers add to our stories without editor intervention. This article shares some lessons from that experience. Asking for reader contributions A...

    Continue Reading

  • How conversational interfaces make the internet more accessible for everyone

    This story is part of a series on bringing the journalism we produce to as many people as possible, regardless of language, access to technology, or physical capability. Find the series introduction, as well as a list of published stories here. In 2004, human-computer interaction professor Alan Dix published the third edition of Human-Computer Interaction along with his colleagues, Janet Finley, Gregory Abowd, and Russell Beale. In a chapter called “The Interaction,” the authors wrote...

    Continue Reading

  • Three tools to help you make colorblind-friendly graphics

    This story is part of a series on bringing the journalism we produce to as many people as possible, regardless of language, access to technology, or physical capability. Find the series introduction, as well as a list of published stories here. I am one of the 8% of men of Northern European descent who suffers from red-green colorblindness. Specifically, I have a mild case of protanopia (also called protanomaly), which means that my eyes lack...

    Continue Reading

Storytelling Tools

We build easy-to-use tools that can help you tell better stories.

View More