SWAT API Documentation

SWAT is a entity-salience system which identifies on-the-fly the semantic focus of a document, expressed by its Salient Wikipedia Entities. The core of this technology is based on a broad set of syntactic and semantic features, extracted from the input document and later fed to a classifier, previously trained on millions of training examples extracted from the New York Times Annotated Corpus. An experimental GUI of the system is available at  https://swat.d4science.org/.

Registering to the service

The service is hosted by the D4Science Infrastructure. To obtain access you need to register to the TagMe VRE and get your authorization token by clicking on the Show button in the left panel. At each request to the API, you will have to issue this authentication token as the gcube-token URL parameter.

How to get entity salience in a document

You can use Swat by calling its API through a HTTP POST request at:

https://swat.d4science.org/salience

This endpoint accepts a JSON object as input (put it in the payload of the POST request) and returns a JSON object. The input object must/can have the following key-value pairs

Key Description Type
content The textual content of the input document (required) string
title The document's title (optional) string

 


A Python example

This is a working piece of Python code that queries SWAT:

import json
import requests

MY_GCUBE_TOKEN = 'copy your gcube-token here!'

document = {
    "title": "Obama travels.",
    "content": 'Barack Obama was in Pisa for a flying visit.'
}

url = 'https://swat.d4science.org/salience'
response = requests.post(url,
                         data=json.dumps(document),
                         params={'gcube-token': MY_GCUBE_TOKEN})

print json.dumps(response.json(), indent=4)

Response format

The response will be a JSON Object with the following structure:

 

{
    'status'                        # str

    'annotations':
       {
           'wiki_id'                # int
           'wiki_title'             # str
           'salience_class'         # int
           'salience_score'         # float
           'spans':                 # (where the entity is mentioned in content)
                [
                    {
                        'start'     # int (character-offset, included)
                        'end'       # int (character-offset, not included)
                    }
                ]
       }

    'title'                         # str
    'content'                       # str
}

 


Credits and Reference

To know more about the functioning of SWAT, check out the paper SWAT: A System for Detecting Salient Wikipedia Entities in Texts by Marco Ponza, Paolo Ferragina and Francesco Piccinno.