TwxRay taken to task on Twitter

Last week the Knight Lab released a beta version of twXray and received some tough Twitter feedback.

We knew, of course, that twxRay was fallible, but it’s another thing to have it out in the world.

At any rate, a run down of where it stumbled:

You can trick TwxRay with a tweet that takes a common word and applies it in a unique context. A tweet about Tom Cruise may be categorized as travel based on the word “cruise.” Likewise, a tweet about a movie being “faithful” to the book is likely to end up in the religion category.

Typically, “cruise” and “faithful” are very reliable indicators of travel and religion, which causes twxRay to give its categorization of those tweets extra weight. But in this case were used in a way that twxRay failed to recognize.

Beyond certain words tricking the technology, twxRay also had a tough time with users who tweet in a single niche. One user, @newsstandpromos, tweeted almost entirely about the magazine industry and the subjects it covers—Ben Bernanke, Lady Gaga, Red Sox, food, etc.—which was enough to confuse twxRay. TwXray inferred @newsstandpromos was tweeting about those subjects rather than how they are covered in magazines, which again resulted in mistakes.

All that said, the technology works fairly well and the problems with twxRay are small and could be fixed in subsequent iterations, said Shawn O’Banion, the Ph.D student who built the core technology.

How might we do that?

“A feedback feature on the site would allow users to say ‘yes’ this would a good categorization or ‘no’ it wasn't,” said O’Banion. “The classifier could learn from this feedback to improve the categorization. My guess is this would help learn slang and abbreviations used in Twitter and also learn the text used by that specific user.”

Regardless, TwXray has been a useful demonstration of what’s possible with Twitter data and has presented some interesting feedback that will further our exploration in the process of classifying text.

One pleasant surprise: twxRay speaks a bit of Spanish.

Check out the results for @sersuarezr and you’ll see why twXray was able to interpret his Spanish tweets. Various words are the same across languages, particularly words that give a clear indication about a tweet’s general topic. A tweet containing the word “startup,” for example, is a solid clue that a tweet is about technology happens to be the same in both English and Spanish. Likewise for proper nouns like “Yankees,” “Beatles,” and “iPad.”

The results were good enough for @sersuarezr to tweet: “Falta afinar un poco, pero esta curiosa.” Translation: “It’s a bit out of tune, but interesting nonetheless.”

Apart from words that are the same across languages, however, twxRay might have benefited from the manner in which twxRay’s database was built.

To build the twXray database of correctly classified tweets, hundreds of tweets had to be tagged. Typically this would be done by hand, but O’Banion skipped the hand tagging by finding tweets that linked to news stories, following those links, and then taking the tags from the linked article and applying them to twXray’s interpretation of the tweet.

“The classifier is trained on the story content, not the tweets that link to the content,” said O’Banion. “It might be possible that some of the stories I scraped are in Spanish or have Spanish words in them which is how it can understand them.”

The result of that process: it didn’t matter if a tweet was in English or not so long as it linked to a tagged (in English) news story. TwXray is relying on the overlap between terms in the stories and the tweets since what it’s actually building is a categorization of terms in the stories. Using the tweets and stories the machine would automatically learn the Spanish word for tourists, for example, and associate “touristas” with the travel tag.

All told, twxRay is an interesting demonstration of what’s possible with Twitter. It might also be an interesting component of something like a content recommendation engine (which is actually how the technology was originally conceived) in which news consumers enter their handle which would enable a website to recommend content that would interest the consumer. We’ve also deployed modified components as show what politicians tweet about most and to show what a politician’s followers tweet about most.

About the author

Ryan Graff

Communications and Outreach Manager, 2011-2016

Journalism, revenue, whitewater, former carny. Recently loving some quality time @KelloggSchool.

Latest Posts

  • Prototyping Augmented Reality

    Something that really frustrates me is that, while I’m excited about the potential AR has for storytelling, I don’t feel like I have really great AR experiences that I can point people to. We know that AR is great for taking a selfie with a Pikachu and it’s pretty good at measuring spaces (as long as your room is really well lit and your phone is fully charged) but beyond that, we’re really still figuring...

    Continue Reading

  • Capturing the Soundfield: Recording Ambisonics for VR

    When building experiences in virtual reality we’re confronted with the challenge of mimicking how sounds hit us in the real world from all directions. One useful tool for us to attempt this mimicry is called a soundfield microphone. We tested one of these microphones to explore how audio plays into building immersive experiences for virtual reality. Approaching ambisonics with the soundfield microphone has become popular in development for VR particularly for 360 videos. With it,...

    Continue Reading

  • How to translate live-spoken human words into computer “truth”

    Our Knight Lab team spent three months in Winter 2018 exploring how to combine various technologies to capture, interpret, and fact check live broadcasts from television news stations, using Amazon’s Alexa personal assistant device as a low-friction way to initiate the process. The ultimate goal was to build an Alexa skill that could be its own form of live, automated fact-checking: cross-referencing a statement from a politician or otherwise newsworthy figure against previously fact-checked statements......

    Continue Reading

  • Northwestern is hiring a CS + Journalism professor

    Work with us at the intersection of media, technology and design.

    Are you interested in working with journalism and computer science students to build innovative media tools, products and apps? Would you like to teach the next generation of media innovators? Do you have a track record building technologies for journalists, publishers, storytellers or media consumers? Northwestern University is recruiting for an assistant or associate professor for computer science AND journalism, who will share an appointment in the Medill School of Journalism and the McCormick School...

    Continue Reading

  • Introducing StorylineJS

    Today we're excited to release a new tool for storytellers.

    StorylineJS makes it easy to tell the story behind a dataset, without the need for programming or data visualization expertise. Just upload your data to Google Sheets, add two columns, and fill in the story on the rows you want to highlight. Set a few configuration options and you have an annotated chart, ready to embed on your website. (And did we mention, it looks great on phones?) As with all of our tools, simplicity...

    Continue Reading

  • Join us in October: NU hosts the Computation + Journalism 2017 symposium

    An exciting lineup of researchers, technologists and journalists will convene in October for Computation + Journalism Symposium 2017 at Northwestern University. Register now and book your hotel rooms for the event, which will take place on Friday, Oct. 13, and Saturday, Oct. 14 in Evanston, IL. Hotel room blocks near campus are filling up fast! Speakers will include: Ashwin Ram, who heads research and development for Amazon’s Alexa artificial intelligence (AI) agent, which powers the...

    Continue Reading

Storytelling Tools

We build easy-to-use tools that can help you tell better stories.

View More