Semantic APIs, what to consider when picking a text analysis tool

Today, our online experiences are richer and more interconnected than ever. This is in part due to the existence of third-party services called Application Programming Interfaces, or APIs for short. APIs allow computer systems to speak with each other and exchange information. Facebook and Twitter’s APIs, for example, allow Twitter to repost your Facebook updates, and vice versa.

At the Knight Lab, we often make use of semantic APIs. These APIs will usually take text in the form of a file or URL, and tell you something about it. At their most basic, they might identify what part of the webpage is actually the article (and not the HTML, or the advertisements). There are many other services semantic APIs offer, but one of the most common identifies the topic of the text you submit, and/or finds “named entities” in the text. (These named entities include people, places, organizations, nations, and other similar things).

Given that there are several APIs that offer similar services, how can we (and you) choose between them?

Case 1: When the API is either right, or wrong

Let's say you want to know whether or not a thing (say, an entity) occurs in a document. Then, you can compare entity-extraction APIs based on:

  • False positives, type 1 – does it identify as entities things that are not entities?
  • False positives, type 2 – does it identify entities that don't even appear in the document?
  • False negatives – does it tell you that an entity is not present when it is?


Of course, APIs also assign a 'type' to each entity, based on the API's own ontology, which may or may not match your needs. Think of it as the difference between the Dewey Decimal System and the Library of Congress system; they're both valid ways of breaking things down, yet they are different. So, you can also compare APIs based on:

  • Entity types – Does it identify the entity types you're interested in?
  • Miscategorization – Does it identify entity type correctly, according to the API's ontology? If not, this could lead to both false positives and false negatives.


These concepts are usually summarized in terms of precision, recall, and accuracy. Some APIs may have better precision, or better recall. If you don’t care if you miss some things as long as the results you get are always right, then you care about precision. If you don’t care if you get some incorrect results as long as you find all of the correct results, then you care about recall. If you care about both, then accuracy is the better measure for you. In either case, classifier performance is simple to calculate.

Case 2: Finding the most relevant subset

Let's say that an API takes your documents and tells you, for each document, what keywords (for example) are present, and how relevant they are. Suppose also that you want the n best-matched keywords for each document.

Then, for any single API, you could choose the best-matched keywords in one of five ways:

  1. Make n a constant – usually 1, 3, 5, or 10 – and always take the top n keywords. This leads to problems when the API returns fewer results than the value you set for n, or when many of the top results have very low relevance.
  2. Identify a threshold, t, for relevancy, and take the keywords with a relevance higher than t. This could return no keywords, or a much larger list of keywords than you'd prefer, of course.
  3. Identify a relevancy threshold, t, but take no more than n keywords. This could return nothing.
  4. Identify a relevancy threshold, t, and a lower-limit, l, taking at least l keywords. This could lead to a set far larger than you'd prefer.
  5. Identify a relevancy threshold, t, a lower-limit, l, and a maximum number, m. Take at least l and no more than m keywords, excluding within those limits based on t.


So which of these five methods work best? It isn't clear that one is inherently superior to the others, and in fact each method may have different strengths and weaknesses (which may, in turn, be different for a different API). To really answer that question, you'd have to assess how best to use a given API by testing each of these methods for various values of l, m, and t, and assessing each with respect to:

  • False positives, type 1 – does it identify as keywords things that are not appropriate for use as keywords?
  • False positives, type 2 – does it identify keywords that don't even appear in the document?
  • Inflated ranking/relevance – are any of the keywords inside the selected subset included in it because of a higher relevance or ranking score than it deserves?
  • False negatives – does it entirely miss keywords it should find?
  • Lowball ranking/relevance – does ranking or relevance place a keyword that should be inside the subset outside of it?


Naturally, things get more complicated when you want to compare two different APIs, particularly ones made by different companies. Relevance is probably calibrated differently for each API. Furthermore, the best subset criteria for one API may not be the best subset criteria for another API. For example, one API may be able to consistently determine the three most relevant entities, but only get an average of five of the top ten correct. Another might get seven of the top ten correct, but only one or two of the top three. Either strength could be useful, depending on what you're trying to accomplish.

To make matters worse, we still don't know what relevance is. Many APIs return something like relevance, but none explain where the number actually comes from.

The biggest problem with this speculation, of course, is that ultimately, the APIs we use are black boxes. We could do a study that rigorously compared these APIs, and the week after we finish, one of them might change its algorithm without even telling anyone. The result is that APIs are best assessed when you need to make a decision, and perhaps periodically re-assessed if you wish to improve performance. That means that any solution to this conundrum will likely involve creating tools that make the assessments we’re already doing more rigorous, faster, and/or easier.

Latest Posts

  • A Big Change That Will Probably Affect Your Storymaps

    A big change is coming to StoryMapJS, and it will affect many, if not most existing storymaps. When making a storymap, one way to set a style and tone for your project is to set the "map type," also known as the "basemap." When we launched StoryMapJS, it included options for a few basemaps created by Stamen Design. These included the "watercolor" style, as well as the default style for new storymaps, "Toner Lite." Stamen...

    Continue Reading

  • Introducing AmyJo Brown, Knight Lab Professional Fellow

    AmyJo Brown, a veteran journalist passionate about supporting and reshaping local political journalism and who it engages, has joined the Knight Lab as a 2022-2023 professional fellow. Her focus is on building The Public Ledger, a data tool structured from local campaign finance data that is designed to track connections and make local political relationships – and their influence – more visible. “Campaign finance data has more stories to tell – if we follow the...

    Continue Reading

  • Interactive Entertainment: How UX Design Shapes Streaming Platforms

    As streaming develops into the latest age of entertainment, how are interfaces and layouts being designed to prioritize user experience and accessibility? The Covid-19 pandemic accelerated streaming services becoming the dominant form of entertainment. There are a handful of new platforms, each with thousands of hours of content, but not much change or differentiation in the user journeys. For the most part, everywhere from Netflix to illegal streaming platforms use similar video streaming UX standards, and...

    Continue Reading

  • Innovation with collaborationExperimenting with AI and investigative journalism in the Americas.

    Lee este artículo en español. How might we use AI technologies to innovate newsgathering and investigative reporting techniques? This was the question we posed to a group of seven newsrooms in Latin America and the US as part of the Americas Cohort during the 2021 JournalismAI Collab Challenges. The Collab is an initiative that brings together media organizations to experiment with AI technologies and journalism. This year,  JournalismAI, a project of Polis, the journalism think-tank at...

    Continue Reading

  • Innovación con colaboraciónCuando el periodismo de investigación experimenta con inteligencia artificial.

    Read this article in English. ¿Cómo podemos usar la inteligencia artificial para innovar las técnicas de reporteo y de periodismo de investigación? Esta es la pregunta que convocó a un grupo de siete organizaciones periodísticas en América Latina y Estados Unidos, el grupo de las Américas del 2021 JournalismAI Collab Challenges. Esta iniciativa de colaboración reúne a medios para experimentar con inteligencia artificial y periodismo. Este año, JournalismAI, un proyecto de Polis, la think-tank de periodismo...

    Continue Reading

  • AI, Automation, and Newsrooms: Finding Fitting Tools for Your Organization

    If you’d like to use technology to make your newsroom more efficient, you’ve come to the right place. Tools exist that can help you find news, manage your work in progress, and distribute your content more effectively than ever before, and we’re here to help you find the ones that are right for you. As part of the Knight Foundation’s AI for Local News program, we worked with the Associated Press to interview dozens of......

    Continue Reading

Storytelling Tools

We build easy-to-use tools that can help you tell better stories.

View More