TwxRay taken to task on Twitter

Last week the Knight Lab released a beta version of twXray and received some tough Twitter feedback.

We knew, of course, that twxRay was fallible, but it’s another thing to have it out in the world.

At any rate, a run down of where it stumbled:

You can trick TwxRay with a tweet that takes a common word and applies it in a unique context. A tweet about Tom Cruise may be categorized as travel based on the word “cruise.” Likewise, a tweet about a movie being “faithful” to the book is likely to end up in the religion category.

Typically, “cruise” and “faithful” are very reliable indicators of travel and religion, which causes twxRay to give its categorization of those tweets extra weight. But in this case were used in a way that twxRay failed to recognize.

Beyond certain words tricking the technology, twxRay also had a tough time with users who tweet in a single niche. One user, @newsstandpromos, tweeted almost entirely about the magazine industry and the subjects it covers—Ben Bernanke, Lady Gaga, Red Sox, food, etc.—which was enough to confuse twxRay. TwXray inferred @newsstandpromos was tweeting about those subjects rather than how they are covered in magazines, which again resulted in mistakes.

All that said, the technology works fairly well and the problems with twxRay are small and could be fixed in subsequent iterations, said Shawn O’Banion, the Ph.D student who built the core technology.

How might we do that?

“A feedback feature on the site would allow users to say ‘yes’ this would a good categorization or ‘no’ it wasn't,” said O’Banion. “The classifier could learn from this feedback to improve the categorization. My guess is this would help learn slang and abbreviations used in Twitter and also learn the text used by that specific user.”

Regardless, TwXray has been a useful demonstration of what’s possible with Twitter data and has presented some interesting feedback that will further our exploration in the process of classifying text.

One pleasant surprise: twxRay speaks a bit of Spanish.

Check out the results for @sersuarezr and you’ll see why twXray was able to interpret his Spanish tweets. Various words are the same across languages, particularly words that give a clear indication about a tweet’s general topic. A tweet containing the word “startup,” for example, is a solid clue that a tweet is about technology happens to be the same in both English and Spanish. Likewise for proper nouns like “Yankees,” “Beatles,” and “iPad.”

The results were good enough for @sersuarezr to tweet: “Falta afinar un poco, pero esta curiosa.” Translation: “It’s a bit out of tune, but interesting nonetheless.”

Apart from words that are the same across languages, however, twxRay might have benefited from the manner in which twxRay’s database was built.

To build the twXray database of correctly classified tweets, hundreds of tweets had to be tagged. Typically this would be done by hand, but O’Banion skipped the hand tagging by finding tweets that linked to news stories, following those links, and then taking the tags from the linked article and applying them to twXray’s interpretation of the tweet.

“The classifier is trained on the story content, not the tweets that link to the content,” said O’Banion. “It might be possible that some of the stories I scraped are in Spanish or have Spanish words in them which is how it can understand them.”

The result of that process: it didn’t matter if a tweet was in English or not so long as it linked to a tagged (in English) news story. TwXray is relying on the overlap between terms in the stories and the tweets since what it’s actually building is a categorization of terms in the stories. Using the tweets and stories the machine would automatically learn the Spanish word for tourists, for example, and associate “touristas” with the travel tag.

All told, twxRay is an interesting demonstration of what’s possible with Twitter. It might also be an interesting component of something like a content recommendation engine (which is actually how the technology was originally conceived) in which news consumers enter their handle which would enable a website to recommend content that would interest the consumer. We’ve also deployed modified components as show what politicians tweet about most and to show what a politician’s followers tweet about most.

About the author

Ryan Graff

Communications and Outreach Manager, 2011-2016

Journalism, revenue, whitewater, former carny. Recently loving some quality time @KelloggSchool.

Latest Posts

  • Building a Community for VR and AR Storytelling

    In 2016 we founded the Device Lab to provide a hub for the exploration of AR/VR storytelling on campus. In addition to providing access to these technologies for Medill and the wider Northwestern community, we’ve also pursued a wide variety of research and experimental content development projects. We’ve built WebVR timelines of feminist history and looked into the inner workings of ambisonic audio. We’ve built virtual coral reefs and prototyped an AR experience setting interviews...

    Continue Reading

  • A Brief Introduction to NewsgamesCan video games be used to tell the news?

    When the Financial Times released The Uber Game in 2017, the game immediately gained widespread popularity with more than 360,000 visits, rising up the ranks as the paper’s most popular interactive piece of the year. David Blood, the game’s lead developer, said that the average time spent on the page was about 20 minutes, which was substantially longer than what most Financial Times interactives tend to receive, according to Blood. The Uber Game was so successful that the Financial...

    Continue Reading

  • With the 25th CAR Conference upon us, let’s recall the first oneWhen the Web was young, data journalism pioneers gathered in Raleigh

    For a few days in October 1993, if you were interested in journalism and technology, Raleigh, North Carolina was the place you had to be. The first Computer-Assisted Reporting Conference offered by Investigative Reporters & Editors brought more than 400 journalists to Raleigh for 3½ days of panels, demos and hands-on lessons in how to use computers to find stories in data. That seminal event will be commemorated this week at the 25th CAR Conference, which...

    Continue Reading

  • Prototyping Augmented Reality

    Something that really frustrates me is that, while I’m excited about the potential AR has for storytelling, I don’t feel like I have really great AR experiences that I can point people to. We know that AR is great for taking a selfie with a Pikachu and it’s pretty good at measuring spaces (as long as your room is really well lit and your phone is fully charged) but beyond that, we’re really still figuring...

    Continue Reading

  • Capturing the Soundfield: Recording Ambisonics for VR

    When building experiences in virtual reality we’re confronted with the challenge of mimicking how sounds hit us in the real world from all directions. One useful tool for us to attempt this mimicry is called a soundfield microphone. We tested one of these microphones to explore how audio plays into building immersive experiences for virtual reality. Approaching ambisonics with the soundfield microphone has become popular in development for VR particularly for 360 videos. With it,...

    Continue Reading

  • Audience Engagement and Onboarding with Hearken Auditing the News Resurrecting History for VR Civic Engagement with City Bureau Automated Fact Checking Conversational Interface for News Creative Co-Author Crowdsourcing for Journalism Environmental Reporting with Sensors Augmented Reality Visualizations Exploring Data Visualization in VR Fact Flow Storytelling with GIFs Historical Census Data Information Spaces in AR/VR Contrasting Forms Of Interactive 3D Storytelling Interactive Audio Juxtapose Legislator Tracker Storytelling with Augmented Reality Music Magazine Navigating Virtual Reality Open Data Reporter Oscillations Personalize My Story Photo Bingo Photojournalism in 3D for VR and Beyond Podcast Discoverability Privacy Mirror Projection Mapping ProPublica Illinois Rethinking Election Coverage SensorGrid API and Dashboard Sidebar Smarter News Exploring Software Defined Radio Story for You Storyline: Charts that tell stories. Storytelling Layers on 360 Video Talking to Data Visual Recipes Watch Me Work Writing and Designing for Chatbots
  • Prototyping Spatial Audio for Movement Art

    One of Oscillations’ technical goals for this quarter’s Knight Lab Studio class was an exploration of spatial audio. Spatial audio is sound that exists in three dimensions. It is a perfect complement to 360 video, because sound sources can be localized to certain parts of the video. Oscillations is especially interested in using spatial audio to enhance the neuroscientific principles of audiovisual synchrony that they aim to emphasize in their productions. Existing work in spatial......

    Continue Reading

Storytelling Tools

We build easy-to-use tools that can help you tell better stories.

View More