TwxRay taken to task on Twitter

Last week the Knight Lab released a beta version of twXray and received some tough Twitter feedback.

We knew, of course, that twxRay was fallible, but it’s another thing to have it out in the world.

At any rate, a run down of where it stumbled:

You can trick TwxRay with a tweet that takes a common word and applies it in a unique context. A tweet about Tom Cruise may be categorized as travel based on the word “cruise.” Likewise, a tweet about a movie being “faithful” to the book is likely to end up in the religion category.

Typically, “cruise” and “faithful” are very reliable indicators of travel and religion, which causes twxRay to give its categorization of those tweets extra weight. But in this case were used in a way that twxRay failed to recognize.

Beyond certain words tricking the technology, twxRay also had a tough time with users who tweet in a single niche. One user, @newsstandpromos, tweeted almost entirely about the magazine industry and the subjects it covers—Ben Bernanke, Lady Gaga, Red Sox, food, etc.—which was enough to confuse twxRay. TwXray inferred @newsstandpromos was tweeting about those subjects rather than how they are covered in magazines, which again resulted in mistakes.

All that said, the technology works fairly well and the problems with twxRay are small and could be fixed in subsequent iterations, said Shawn O’Banion, the Ph.D student who built the core technology.

How might we do that?

“A feedback feature on the site would allow users to say ‘yes’ this would a good categorization or ‘no’ it wasn't,” said O’Banion. “The classifier could learn from this feedback to improve the categorization. My guess is this would help learn slang and abbreviations used in Twitter and also learn the text used by that specific user.”

Regardless, TwXray has been a useful demonstration of what’s possible with Twitter data and has presented some interesting feedback that will further our exploration in the process of classifying text.

One pleasant surprise: twxRay speaks a bit of Spanish.

Check out the results for @sersuarezr and you’ll see why twXray was able to interpret his Spanish tweets. Various words are the same across languages, particularly words that give a clear indication about a tweet’s general topic. A tweet containing the word “startup,” for example, is a solid clue that a tweet is about technology happens to be the same in both English and Spanish. Likewise for proper nouns like “Yankees,” “Beatles,” and “iPad.”

The results were good enough for @sersuarezr to tweet: “Falta afinar un poco, pero esta curiosa.” Translation: “It’s a bit out of tune, but interesting nonetheless.”

Apart from words that are the same across languages, however, twxRay might have benefited from the manner in which twxRay’s database was built.

To build the twXray database of correctly classified tweets, hundreds of tweets had to be tagged. Typically this would be done by hand, but O’Banion skipped the hand tagging by finding tweets that linked to news stories, following those links, and then taking the tags from the linked article and applying them to twXray’s interpretation of the tweet.

“The classifier is trained on the story content, not the tweets that link to the content,” said O’Banion. “It might be possible that some of the stories I scraped are in Spanish or have Spanish words in them which is how it can understand them.”

The result of that process: it didn’t matter if a tweet was in English or not so long as it linked to a tagged (in English) news story. TwXray is relying on the overlap between terms in the stories and the tweets since what it’s actually building is a categorization of terms in the stories. Using the tweets and stories the machine would automatically learn the Spanish word for tourists, for example, and associate “touristas” with the travel tag.

All told, twxRay is an interesting demonstration of what’s possible with Twitter. It might also be an interesting component of something like a content recommendation engine (which is actually how the technology was originally conceived) in which news consumers enter their handle which would enable a website to recommend content that would interest the consumer. We’ve also deployed modified components as show what politicians tweet about most and to show what a politician’s followers tweet about most.

About the author

Ryan Graff

Communications and Outreach Manager, 2011-2016

Journalism, revenue, whitewater, former carny. Recently loving some quality time @KelloggSchool.

Latest Posts

  • With the 25th CAR Conference upon us, let’s recall the first oneWhen the Web was young, data journalism pioneers gathered in Raleigh

    For a few days in October 1993, if you were interested in journalism and technology, Raleigh, North Carolina was the place you had to be. The first Computer-Assisted Reporting Conference offered by Investigative Reporters & Editors brought more than 400 journalists to Raleigh for 3½ days of panels, demos and hands-on lessons in how to use computers to find stories in data. That seminal event will be commemorated this week at the 25th CAR Conference, which...

    Continue Reading

  • Prototyping Augmented Reality

    Something that really frustrates me is that, while I’m excited about the potential AR has for storytelling, I don’t feel like I have really great AR experiences that I can point people to. We know that AR is great for taking a selfie with a Pikachu and it’s pretty good at measuring spaces (as long as your room is really well lit and your phone is fully charged) but beyond that, we’re really still figuring...

    Continue Reading

  • Capturing the Soundfield: Recording Ambisonics for VR

    When building experiences in virtual reality we’re confronted with the challenge of mimicking how sounds hit us in the real world from all directions. One useful tool for us to attempt this mimicry is called a soundfield microphone. We tested one of these microphones to explore how audio plays into building immersive experiences for virtual reality. Approaching ambisonics with the soundfield microphone has become popular in development for VR particularly for 360 videos. With it,...

    Continue Reading

  • Prototyping Spatial Audio for Movement Art

    One of Oscillations’ technical goals for this quarter’s Knight Lab Studio class was an exploration of spatial audio. Spatial audio is sound that exists in three dimensions. It is a perfect complement to 360 video, because sound sources can be localized to certain parts of the video. Oscillations is especially interested in using spatial audio to enhance the neuroscientific principles of audiovisual synchrony that they aim to emphasize in their productions. Existing work in spatial......

    Continue Reading

  • Oscillations Audience Engagement Research Findings

    During the Winter 2018 quarter, the Oscillations Knight Lab team was tasked in exploring the question: what constitutes an engaging live movement arts performance for audiences? Oscillations’ Chief Technology Officer, Ilya Fomin, told the team at quarter’s start that the startup aims to create performing arts experiences that are “better than reality.” In response, our team spent the quarter seeking to understand what is reality with qualitative research. Three members of the team interviewed more......

    Continue Reading

  • How to translate live-spoken human words into computer “truth”

    Our Knight Lab team spent three months in Winter 2018 exploring how to combine various technologies to capture, interpret, and fact check live broadcasts from television news stations, using Amazon’s Alexa personal assistant device as a low-friction way to initiate the process. The ultimate goal was to build an Alexa skill that could be its own form of live, automated fact-checking: cross-referencing a statement from a politician or otherwise newsworthy figure against previously fact-checked statements......

    Continue Reading

Storytelling Tools

We build easy-to-use tools that can help you tell better stories.

View More