Documentation Documentation

Smaph API Documentation

December 2016 - The Smaph API is still an experimental service - Do not rely on it for production. Do not rely on in for availability or quality. Do not rely on it for anything other that experiments!

Welcome to the API documentation of Smaph - the Entity Linking system for queries and very short text.

Introduction

Smaph does entity linking on web queries and very short text, meaning it disambiguates query terms linking them to their unambiguous meaning represented as an entity in a Knowledge base. To do so, it piggybacks on a search engine and uses its results to perform entity disambiguation. As a piggyback search engine, we use Google Custom Search Engine (CSE). In order to use the Smaph API, you will have to build a CSE and pass its identification data at each query. Each call to the Smaph API triggers three queries to the CSE. This number may vary without prior notice. Smaph does not store any information issued to it, including CSE credentials and the call body.

By calling this API you give Smaph permission to call the Google CSE on your behalf.

Google CSE currently offers a limited number of free calls. When the limit is reached, Smaph will stop working.

Setup

Setting up Google CSE

  1. Go to Google CSE and sign in.
  2. Create a Google CSE:
    • Click on New search engine
    • Leave the Sites to search field blank.
    • As language, select English.
    • Type any search engine name, e.g. "Smaph_piggyback".
    • Open Advanced Options.
    • As Schema.org type, insert any type, e.g. "Thing" (we will remove this later).
    • Click on Create.
  3. On the left bar, click on Edit search engine, select your search engine, and click Setup.
    • At the bottom of the page, in the Restrict Pages using Schema.org Types section, remove the schema type you inserted before and click the Update button.
  4. In the setup page, click on the Search Engine ID button and take note of the ID (it is something in the form 012345678901234567890:abcdefghilm). We will refer to it as <CSE_ID>.

Enabling the Google API.

  1. Log into the Google Developer Console.
  2. In the upper bar of the page, click on Create Project.
    • Give a name to the project, e.g. Smaph-Piggyback.
    • Click on Create.
    • In the left panel, click on Credentials.
    • Click on Create credentials, then and API key
    • Take note of the generated key. It is a 40-characters long string. From now on, we will refer to it as <GOOGLE_KEY>.
  3. Point your browser to the Custom Search API page and click Enable.
  4. You may test whether the Google CSE API is working by pointing your browser to: (replace <GOOGLE_KEY> and <CSE_ID> with their values)
https://www.googleapis.com/customsearch/v1/?key=<GOOGLE_KEY>&cx=<CSE_ID>&q=barack%20obama

Registering to Smaph

  1. Register to the Smaph VRE by clicking on Create account on the right panel of this page.
  2. Log into the Smaph VRE and, on the left panel, click the Show button to get your Smaph API authentication token. We will refer to it as <SMAPH_TOKEN>.
  3. Good! You have everything in place to issue a query to Smaph. To disambiguate query armstrong moon landing, point your browser to: (replace the placeholders with their actual value)
https://smaph.d4science.org/smaph/annotate?gcube-token=<SMAPH_TOKEN>&google-cse-id=<CSE_ID>&google-api-key=<GOOGLE_KEY>&q=armstrong%20moon%20landing

Calling the Smaph API

The API endpoint is:

https://smaph.d4science.org/smaph/annotate

API request parameters

The Smaph API accepts the following GET HTTP parameters. Values must be encoded in UTF-8.

  • q (required) the text to disambiguate.
  • gcube-token (required) the Smaph authentication token (you can find it on the left panel by logging in the VRE)
  • google-cse-id (required) the Custom Search Engine ID (see the Setup section)
  • google-api-key (required) the Google API key.
  • annotator (optional) the annotator algorithm to use. Possible values are default, ef, arcoll or greedy.
  • default is the algorithm that returns annotations and offers the best compromise between speed and result quality, according to our datasets. It is currently set to SMAPH-3. This is the suggested value for the annotator parameter;
  • ef (Entity Filter, alias smaph-1) is the simplest annotator: entities are gathered and filtered through a binary classifier. They are not linked to specific terms of the input text;
  • ar (Annotation Regressor, alias smaph-s) generates a set of candidate entities and evaluates how likely they are referred by terms in the input text. This method returns a binding between terms and entities;
  • coll (Collective Disambiguation, alias smaph-2) performs a collective disambiguation of the query. This method returns a binding between terms and entities;
  • greedy (Greedy Disambiguation, alias smaph-3) iteratively builds the solution by greedly chosing what annotation to add, if any. This method returns a binding between terms and entities;

API Response

The response is formatted in JSON. It contains an annotations field, that is an array where each annotation has the following fields:

  • begin the index of the first character (including) in the input text that mentions this entity (does not appear in ef annotator).
  • end the index of the last character (excluding) in the input text that mentions this entity (does not appear in ef annotator).
  • wid the unique Wikipedia identifier of this entity.
  • title the unique Wikipedia title of this entity.
  • url the URL pointing to the Wikipedia page about this entity.
  • score a confidence score in this annotation.

The HTTP status code will be 200 if everything went well. In other cases, an error message is returned in the body of the HTTP response.

How does it work? / What paper should I cite?

The internal functioning of Smaph is detailed in the paper "A Piggyback System for Joint Entity Mention Detection and Linking in Web Queries" by Marco Cornolti, Massimiliano Ciaramita, Paolo Ferragina, Hinrich Schuetze, and Stefan Rued. The paper appeared in the Proceedings of the 25th World Wide Web Conference (WWW2016).

To cite the paper, please use the following BibTex code:

@inproceedings{smaph,
 author = {Cornolti, Marco and Ferragina, Paolo and Ciaramita, Massimiliano and R\"{u}d, Stefan and Sch\"{u}tze, Hinrich},
 title = {A Piggyback System for Joint Entity Mention Detection and Linking in Web Queries},
 booktitle = {Proceedings of the 25th International Conference on World Wide Web},
 series = {WWW '16},
 year = {2016},
 isbn = {978-1-4503-4143-1},
 pages = {567--578},
 url = {http://dx.doi.org/10.1145/2872427.2883061},
 doi = {10.1145/2872427.2883061},
 publisher = {International World Wide Web Conferences Steering Committee},
}

Questions, issues, and bug reports

You can post any question to the Smaph VRE or contact the author.

Enter SMAPH VRE Enter SMAPH VRE

Access the SMAPH VRE with your SoBigData Gateway credentials.