TwxRay taken to task on Twitter

Last week the Knight Lab released a beta version of twXray and received some tough Twitter feedback.

We knew, of course, that twxRay was fallible, but it’s another thing to have it out in the world.

At any rate, a run down of where it stumbled:

You can trick TwxRay with a tweet that takes a common word and applies it in a unique context. A tweet about Tom Cruise may be categorized as travel based on the word “cruise.” Likewise, a tweet about a movie being “faithful” to the book is likely to end up in the religion category.

Typically, “cruise” and “faithful” are very reliable indicators of travel and religion, which causes twxRay to give its categorization of those tweets extra weight. But in this case were used in a way that twxRay failed to recognize.

Beyond certain words tricking the technology, twxRay also had a tough time with users who tweet in a single niche. One user, @newsstandpromos, tweeted almost entirely about the magazine industry and the subjects it covers—Ben Bernanke, Lady Gaga, Red Sox, food, etc.—which was enough to confuse twxRay. TwXray inferred @newsstandpromos was tweeting about those subjects rather than how they are covered in magazines, which again resulted in mistakes.

All that said, the technology works fairly well and the problems with twxRay are small and could be fixed in subsequent iterations, said Shawn O’Banion, the Ph.D student who built the core technology.

How might we do that?

“A feedback feature on the site would allow users to say ‘yes’ this would a good categorization or ‘no’ it wasn't,” said O’Banion. “The classifier could learn from this feedback to improve the categorization. My guess is this would help learn slang and abbreviations used in Twitter and also learn the text used by that specific user.”

Regardless, TwXray has been a useful demonstration of what’s possible with Twitter data and has presented some interesting feedback that will further our exploration in the process of classifying text.

One pleasant surprise: twxRay speaks a bit of Spanish.

Check out the results for @sersuarezr and you’ll see why twXray was able to interpret his Spanish tweets. Various words are the same across languages, particularly words that give a clear indication about a tweet’s general topic. A tweet containing the word “startup,” for example, is a solid clue that a tweet is about technology happens to be the same in both English and Spanish. Likewise for proper nouns like “Yankees,” “Beatles,” and “iPad.”

The results were good enough for @sersuarezr to tweet: “Falta afinar un poco, pero esta curiosa.” Translation: “It’s a bit out of tune, but interesting nonetheless.”

Apart from words that are the same across languages, however, twxRay might have benefited from the manner in which twxRay’s database was built.

To build the twXray database of correctly classified tweets, hundreds of tweets had to be tagged. Typically this would be done by hand, but O’Banion skipped the hand tagging by finding tweets that linked to news stories, following those links, and then taking the tags from the linked article and applying them to twXray’s interpretation of the tweet.

“The classifier is trained on the story content, not the tweets that link to the content,” said O’Banion. “It might be possible that some of the stories I scraped are in Spanish or have Spanish words in them which is how it can understand them.”

The result of that process: it didn’t matter if a tweet was in English or not so long as it linked to a tagged (in English) news story. TwXray is relying on the overlap between terms in the stories and the tweets since what it’s actually building is a categorization of terms in the stories. Using the tweets and stories the machine would automatically learn the Spanish word for tourists, for example, and associate “touristas” with the travel tag.

All told, twxRay is an interesting demonstration of what’s possible with Twitter. It might also be an interesting component of something like a content recommendation engine (which is actually how the technology was originally conceived) in which news consumers enter their handle which would enable a website to recommend content that would interest the consumer. We’ve also deployed modified components as show what politicians tweet about most and to show what a politician’s followers tweet about most.

About the author

Ryan Graff

Communications and Outreach Manager, 2011-2016

Journalism, revenue, whitewater, former carny. Recently loving some quality time @KelloggSchool.

Tagged

Twitter TwxRay Shawn O’Banion

Latest Posts

Lab , projects | Jul 21, 2025

What if news avoiders are right, and you don’t need journalism?

Journalistic training emphasizes that our societies NEED journalism, but it’s fair to ask if anyone actually NEEDS the journalism we’re currently getting. Many people worldwide are not asking ‘if’ they need today’s journalism – they’re showing they don’t: 40% “often or sometimes avoid the news these days,” according to the latest [Digital News Report](https://reutersinstitute.politics.ox.ac.uk/digital-news-report/2025/dnr-executive-summary#avoidance) (42% in the US, 46% in the UK, and over 60% in some other countries). Too often, traditional journalism fails to...

Continue Reading
Lab , projects | Oct 6, 2023

A Big Change That Will Probably Affect Your Storymaps

by Joe Germuska

A big change is coming to StoryMapJS, and it will affect many, if not most existing storymaps. When making a storymap, one way to set a style and tone for your project is to set the "map type," also known as the "basemap." When we launched StoryMapJS, it included options for a few basemaps created by Stamen Design. These included the "watercolor" style, as well as the default style for new storymaps, "Toner Lite." Stamen...

Continue Reading
People | Jan 31, 2023

Introducing AmyJo Brown, Knight Lab Professional Fellow

AmyJo Brown, a veteran journalist passionate about supporting and reshaping local political journalism and who it engages, has joined the Knight Lab as a 2022-2023 professional fellow. Her focus is on building The Public Ledger, a data tool structured from local campaign finance data that is designed to track connections and make local political relationships – and their influence – more visible. “Campaign finance data has more stories to tell – if we follow the...

Continue Reading
Ideas | May 31, 2022

Interactive Entertainment: How UX Design Shapes Streaming Platforms

by Max Johnson

As streaming develops into the latest age of entertainment, how are interfaces and layouts being designed to prioritize user experience and accessibility? The Covid-19 pandemic accelerated streaming services becoming the dominant form of entertainment. There are a handful of new platforms, each with thousands of hours of content, but not much change or differentiation in the user journeys. For the most part, everywhere from Netflix to illegal streaming platforms use similar video streaming UX standards, and...

Continue Reading
Lab projects | Dec 13, 2021

Innovation with collaborationExperimenting with AI and investigative journalism in the Americas.

by Mago Torres | magiccia

Lee este artículo en español. How might we use AI technologies to innovate newsgathering and investigative reporting techniques? This was the question we posed to a group of seven newsrooms in Latin America and the US as part of the Americas Cohort during the 2021 JournalismAI Collab Challenges. The Collab is an initiative that brings together media organizations to experiment with AI technologies and journalism. This year, JournalismAI, a project of Polis, the journalism think-tank at...

Continue Reading
Lab projects , En Español | Dec 13, 2021

Innovación con colaboraciónCuando el periodismo de investigación experimenta con inteligencia artificial.

by Mago Torres | magiccia

Read this article in English. ¿Cómo podemos usar la inteligencia artificial para innovar las técnicas de reporteo y de periodismo de investigación? Esta es la pregunta que convocó a un grupo de siete organizaciones periodísticas en América Latina y Estados Unidos, el grupo de las Américas del 2021 JournalismAI Collab Challenges. Esta iniciativa de colaboración reúne a medios para experimentar con inteligencia artificial y periodismo. Este año, JournalismAI, un proyecto de Polis, la think-tank de periodismo...

Continue Reading