Compass API documentation

Last Updated: Jan 24, 2019

Table of contents

1 Input and output
- 1.1 Requests
- 1.2 Responses
  - 1.2.1 Errors
- 1.3 Data types
2 Authentication and access
- 2.1 Permissions
- 2.2 Tokens and authentication
3 Compass API Endpoints
4 Tokens
- 4.1 POST /login/
- 4.2 GET /tokens/
- 4.3 GET /tokens/<token>/
- 4.4 DELETE /tokens/<token>/
5 Projects
- 5.1 GET /projects/
- 5.2 POST /projects/
- 5.3 GET /projects/<project_id>/
- 5.4 PUT /projects/<project_id>/
- 5.5 DELETE /projects/<project_id>/
6 Documents
- 6.1 GET /projects/<project_id>/p/documents/
- 6.2 POST /projects/<project_id>/p/documents/
7 Classifiers
8 Classify
- 8.1 POST /projects/<project_id>/p/classify/
9 Project Topics
10 [DEPRECATED] Historical Topics
- 10.1 [DEPRECATED] GET /projects/<project_id>/p/topics_history/
11 [DEPRECATED] Project Messages
12 [DEPRECATED] Project Statistics
- 12.1 [DEPRECATED] GET /projects/<project_id>/p/stats/
13 Project Message Usage
14 [DEPRECATED] Project Configuration
15 Accounts
- 15.1 GET /accounts/
- 15.2 POST /accounts/
- 15.3 GET /accounts/<account_id>/
- 15.4 PUT /accounts/<account_id>/
- 15.5 DELETE /accounts/<account_id>/
16 Users
- 16.1 GET /users/
- 16.2 POST /users/
- 16.3 GET /users/<email>/
- 16.4 PUT /users/<email>/
- 16.5 DELETE /users/<email>/
- 16.6 PUT /users/<email>/password/
- 16.7 POST /users/<email>/password/reset/
17 Permissions
- 17.1 GET /permissions/
- 17.2 POST /permissions/
- 17.3 GET /permissions/<permission_id>/
- 17.4 DELETE /permissions/<permission_id>/

Input and output

Requests

Note: Ensure that all calls to the Compass API include a trailing slash in the endpoint URL.

For GET/DELETE requests, parameters should go in the URL as query parameters. For PUT/POST requests, parameters should almost always go in the request body (exceptions for particular endpoints are noted in their documentation); they can be formatted as HTML forms (in which case the Content-Type header should be set to "application/x-www-form-urlencoded") or as JSON (in which case the Content-Type header should be set to "application/json").

Responses

Response bodies are JSON-encoded (except for certain 50X server errors).

Successful responses use the following HTTP codes:

200 (OK) - request was successful
201 (Created) - request was successful and a new object was created
204 (No Content) - request was successful and the object was deleted

Some API methods return a "paginated" list of objects (for example, getting the list of messages in a topic). A paginated list is a JSON object with four keys:

count (integer) - total number of items (in all pages combined)
next (string) - URL of next page of items (null if there is no next page)
previous (string) - URL of previous page of items (null if there is no previous page)
results (JSON list of objects) - the items in the current page (each page contains up to 20 items)

Errors

Error responses use the following HTTP codes:

400 (Bad Request) - request was invalid (e.g., a required parameter was missing)
401 (Unauthorized) - request was not authenticated (authorization token or session cookie was either not provided or not valid)
403 (Forbidden) - request was authenticated but user does not have permission to perform the action
404 (Not Found) - requested URL doesn't exist
405 (Method Not Allowed) - HTTP method specified in the request is not allowed on specified URL
500 (Internal Server Error) - other unexpected error

In cases where there is an error, in addition to the HTTP code the response will include some information about the error. This information will usually be a JSON object but could also be a JSON string.

Below are some typical examples of error responses.

For a 401 (Unauthorized), when the request is not authenticated:

{
   "detail": "Authentication credentials were not provided."
}

For a 400 (Bad Request), when trying to create a project while specifying no name and a nonexistent account:

{
    "name": [
         "This field is required."
    ],
    "account": [
         "Invalid pk 'xxx' - object does not exist."
   ]
}

Data types

The documentation refers to several data types for parameters:

number - a numerical value (integer or floating point)
integer - an integer value
string - a string (in responses, this will be JSON-encoded)
boolean - a boolean value (in responses and JSON request bodies, true or false; in query parameters, True or False; in HTML forms, f/F/false/False/FALSE/0 are false and anything else is true)
time - a representation of a time (in responses, this will be a string in ISO format, such as 2014-08-17T12:45:01.22+00:00; in requests, it can be in this ISO format or a number of seconds since the UNIX epoch)

Authentication and access

Permissions

A permission allows a particular user to do certain things with respect to a particular account.

The permission levels are:

read: the user can view account information and can view projects owned by the account
readwrite: the user can view account information and can view, modify, and create projects owned by the account
manage: the user can view and modify account information, and can view, modify, and create projects owned by the account

In addition to permissions on accounts, there is a special status called "site admin". If a user is a site admin, s/he is allowed to do everything. Site admins are the only users who are allowed to create accounts and users.

Tokens and authentication

Each user can obtain a token that can be used to authenticate to the API. Currently, each user can have at most one token. All tokens currently expire two weeks after their creation, or they can be deleted manually. A user's token is also reset if his password is reset or changed.

For an API request to be authenticated, it should include an "Authorization" header, whose value should be the string "Token " followed by the user's token. For example, if the token is "74e6d14dbde4303fe9864cb77306cc36394de460", then the value of the Authorization header should be "Token 74e6d14dbde4303fe9864cb77306cc36394de460".

To obtain your token, log in with your username and password. If you already have a token, the login response will include your existing token, otherwise it will include a newly-generated token.

Compass API Endpoints

The following endpoints/methods are provided by the Compass API.

Note: Ensure that all calls to the Compass API include a trailing slash in the endpoint URL.

1 Input and output
- 1.1 Requests
- 1.2 Responses
  - 1.2.1 Errors
- 1.3 Data types
2 Authentication and access
- 2.1 Permissions
- 2.2 Tokens and authentication
3 Compass API Endpoints
4 Tokens
- 4.1 POST /login/
- 4.2 GET /tokens/
- 4.3 GET /tokens/<token>/
- 4.4 DELETE /tokens/<token>/
5 Projects
- 5.1 GET /projects/
- 5.2 POST /projects/
- 5.3 GET /projects/<project_id>/
- 5.4 PUT /projects/<project_id>/
- 5.5 DELETE /projects/<project_id>/
6 Documents
- 6.1 GET /projects/<project_id>/p/documents/
- 6.2 POST /projects/<project_id>/p/documents/
7 Classifiers
8 Classify
- 8.1 POST /projects/<project_id>/p/classify/
9 Project Topics
10 [DEPRECATED] Historical Topics
- 10.1 [DEPRECATED] GET /projects/<project_id>/p/topics_history/
11 [DEPRECATED] Project Messages
12 [DEPRECATED] Project Statistics
- 12.1 [DEPRECATED] GET /projects/<project_id>/p/stats/
13 Project Message Usage
14 [DEPRECATED] Project Configuration
15 Accounts
- 15.1 GET /accounts/
- 15.2 POST /accounts/
- 15.3 GET /accounts/<account_id>/
- 15.4 PUT /accounts/<account_id>/
- 15.5 DELETE /accounts/<account_id>/
16 Users
- 16.1 GET /users/
- 16.2 POST /users/
- 16.3 GET /users/<email>/
- 16.4 PUT /users/<email>/
- 16.5 DELETE /users/<email>/
- 16.6 PUT /users/<email>/password/
- 16.7 POST /users/<email>/password/reset/
17 Permissions
- 17.1 GET /permissions/
- 17.2 POST /permissions/
- 17.3 GET /permissions/<permission_id>/
- 17.4 DELETE /permissions/<permission_id>/

To make API requests, the endpoints listed here should be prefixed with "https://<compass-url.tld>/api"

Tokens

A token authenticates a particular user to the API. Currently, each user has at most one token at a time.

A token object in the API has the following fields:

token (string): the token string itself, to be included in authorization header on HTTP requests
user (string): the username of the user whose token this is
expiration (time): the time at which the token will expire

POST /login/

Log in with a username and password to get a token.

Permission required: none
Required body parameters:
- username (string)
- password (string)
Optional parameters: none

Response: JSON object with three keys:

token (string): the user's token string
user (JSON object): user object
expiration (time): the time at which the token will expire

GET /tokens/

Get a list of tokens. This will only include your tokens, unless you are a site admin.

Permission required: none
Required parameters: none
Optional query parameters:
- user (string): user (email address) to list tokens for

Response: paginated list of token objects

GET /tokens/<token>/

Get an existing token.

Permission required: you must be the user who owns the token
Required parameters: none
Optional parameters: none

Response: token object

DELETE /tokens/<token>/

Delete a token.

Permission required: you must be the user who owns the token
Required parameters: none
Optional parameters: none

Response: (empty)

Projects

A project is one set of data (classifiers, documents, messages and their associated information and analysis).

A project object in the API has the following fields:

url (string): project URL (containing randomly-generated unique ID for the project)
name (string): name for the project (not necessarily unique)
account (string): unique ID of the account that owns this project
description (string): a description of the project, or notes about it, etc.
language (string): language code for the project's messages
status (string): "active" or "inactive" (an inactive project can still be viewed but will not process any new messages)
creator (string): username of the user who created the project
created (time): time at which the project was created

GET /projects/

Get the list of all the projects the user has access to.

Permission required: none
Required parameters: none
Optional query parameters:
- account (string): ID of a single account whose projects to list
- status (string): only list projects that have this status ("active" or "inactive")
Response: paginated list of project objects

POST /projects/

Create a project.

Permission required: write
Required body parameters:
- name (string): name for the project
- account (string): ID of the account that will own the project
- language (string): language code for the project's messages
Optional body parameters:
- description (string): description, notes, etc.
- status (string): "active" (default) or "inactive"

Response: new project object

GET /projects/<project_id>/

Get a project.

Permission required: read
Required parameters: none
Optional parameters: none

Response: project object

PUT /projects/<project_id>/

Update a project.

Permission required: write
Required parameters: none
Optional body parameters:
- name (string): name for the project
- description (string): description, notes, etc.
- status (string): "active" or "inactive" (an inactive project can still be viewed but will not process any new messages)

Response: updated project object

DELETE /projects/<project_id>/

Delete a project.

Permission required: write
Required parameters: none
Optional parameters: none

Response: (empty)

Documents

A document is an individual unit of text that can be used in the classification workflow. A document object in the API has the following fields:

url (string): an API endpoint url to access the document object
text (string): text content of the document, used to create the project’s domain space
id (integer): a unique document ID
label (string): an optional label name (if a document is a part of the labeled set)
dataset (string): name of the dataset this document belongs to, whether manually created or auto-generated
language (string): the two-letter language code of the language of the document (e.g., “en”)

Typically, labeled documents are put into a dataset and used to train supervised classifiers in Compass. Unlabeled documents can also be added, for the purposes of setting the vector space used in supervised classifier building or building other types of classifiers.

GET /projects/<project_id>/p/documents/

Get the list of documents from the first page (20 documents).

Permission required: read
Required parameters: none
Optional query parameters:
- label (string): documents can be filtered by label
- dataset (string): used for retrieving documents belonging to a specific dataset

Response: paginated list of document objects

POST /projects/<project_id>/p/documents/

Add a document or multiple documents. To add multiple documents at a time, specify a list of JSON objects instead of a single JSON object.

Permission required: write permission
Required parameters: none
- text (string): the text content of the document, in UTF-8 encoding (string)
Optional query parameters:
- label (string): the class the document belongs to
- dataset (string): the name of the collection of documents this document belongs to

Response: paginated list of project objects

Classifiers

Compass provides three types of classifiers: voting, topic-based, and sentiment. The voting classifier is a supervised classifier trained on a collection of labeled documents. The topic-based classifier is a semi-supervised classifier; it relies on topics defined by the user and does not require training. A collection of domain documents is still needed by the topic-based classifier to create the semantic space on which to operate. The sentiment classifier relies on the semantic space only for domain expansion, and domain documents are necessary only for that case.

Depending on the type of the classifier, a classifier object is defined as follows.

A classifier object for the voting classifier:

url (string): unique URL for the classifier
name (string): a unique name for classifier
type (string): a type of the classifier (“voting”)
status (string): can be “active” or “inactive”. Only “active” classifiers can classify incoming messages.
building_state (string): either "building" (the classifier is under construction) or "ready" (the classifier is ready to classify incoming messages)
topics (list): a list of topic objects (aka topics aka classes) for the classifier
Info (dictionary):
- num_topics (integer): maximum number of topics a message can be classified into, or the special value "ALL", which will return a classification into each topic.
- threshold (float): a minimum cut-off value for confidence, for a message to be classified into a topic. A number between 0 and 1.
dataset (string): a name of the dataset containing documents used to build the classifier
created (timestamp): when classifier was first created
last_update (timestamp): when classifier was last updated

A classifier object for the topic-based classifier:

url (string): unique URL for the classifier
name (string): a unique name for classifier
type (string): a type of the classifier (“topic_based”)
status (string): can be “active” or “inactive”. Only “active” classifiers can classify incoming messages.
building_state (string): either "building" (the classifier is under construction) or "ready" (the classifier is ready to classify incoming messages)
topics (list): a list of topic objects (aka topics aka classes) and all their info for the classifier
Info (dictionary):
- num_topics (integer): maximum number of topics a message can be classified into, or the special value "ALL", which will return a classification into each topic.
- threshold (float): a minimum cut-off value for confidence, for a message to be classified into a topic. A number between 0 and 1.
- topics (list): a list of topics (title and definition) for the classifier
dataset (string): a name of the dataset containing documents used to build the classifier
created (timestamp): when classifier was first created
last_update (timestamp): when classifier was last updated

A classifier object for the sentiment classifier:

url (string): unique URL for the classifier
name (string): a unique name for classifier
type (string): a type of the classifier (“sentiment_combined", "sentiment_split", "sentiment_custom")
status (string): can be “active” or “inactive”. Only “active” classifiers can classify incoming messages.
building_state (string): either "building" (the classifier is under construction) or "ready" (the classifier is ready to classify incoming messages)
topics (list): a list of topic objects (aka classes) for the classifier
Info (dictionary):
- combined_threshold (float): a minimum cut-off value for confidence, for a message to be classified into a topic. A number between 0 and 1. Applies to the classifier of type "sentiment_combined"
- negative_threshold (float): a minimum cut-off value for confidence, for a message to be classified into a topic. A number between 0 and -1. Applies to the classifier of type "sentiment_split"
- positive_threshold (float): a minimum cut-off value for confidence, for a message to be classified into a topic. A number between 0 and 1. Applies to the classifier of type "sentiment_split"
- topics (list): a list of names and corresponding bands for the custom-defined sentiment topics. Applies to the sentiment classifier of type "sentiment_custom"
- wordlist (list): a list of words and corresponding sentiment scores to be used by the classifier. Applies only to the sentiment classifier of type "sentiment_custom"
- domain_expansion (boolean): whether domain expansion is turned on or off
- max_expansion_terms (integer): specifies a maximum desired number of domain-expanded sentiment terms. Default is 200 (for each polarity if applicable)
dataset (string): a name of the dataset containing documents used to build the classifier
created (timestamp): when classifier was first created
last_update (timestamp): when classifier was last updated

GET /projects/<project_id>/p/classifiers/

Get the list of classifiers in the project.

Permission required: read
Required parameters: none
Optional query parameters:
- name (string): you can filter by name to get a specific classifier

Response: paginated list of classifier objects.

POST /projects/<project_id>/p/classifiers/

Create a new classifier.

Permission required: write permission

Required parameters:

name (string): a string that uniquely identifies the classifier within the project
type (string): either “voting”, "topic_based", “sentiment_combined”, “sentiment_split”, or “sentiment_custom”
status (string): either “active” or "inactive"

Optional parameters vary slightly by the type of the classifier.

Optional body parameters for the voting classifier:

info (dictionary): one or more of the following:
- num_topics (integer): maximum number of topics a message can be classified into, or the special value "ALL", which will attempt to classify the message into each topic. Default value is 1. If a number greater than 1 is provided, the system will return the “top X” labels above the threshold.
- threshold (float): a minimum cut-off value for confidence, for a message to be classified into a topic. A number must be between 0 and 1. Default value is 0.4
dataset (string): if dataset name is specified, classifier will be built on documents belonging to that dataset (rather than all labeled documents)

{
     “name”: “my first voting classifier”,
     “status”: “active”,
     “type”: “voting”,

     “dataset”: “restaurant-reviews-training-set”,

     “info”: {
           “num_topics”: 2,
           “threshold”: 0.5
     }
}

Optional body parameters for the topic-based classifier:

info (dictionary)::
- topics (list): a list of title/info dictionaries that define the topics to assign (you may omit this section and create the topics individually after the classifier has been created; see Classifier Topics for details)

Sample parameters for topic-based classifier

{
    “name”: “food and drink”,
    “status”: “active”,
    “type”: “topic_based”,
    “info”: {
         “num_topics”: 2,
         “threshold”: 0.76,
          “dataset”: “restaurant reviews”,
          “topics”: [
               {“title”: “breakfast”, “info”: “bagel AND (schmear OR cream cheese)”},
               {“title”: “coffee”, “info”: “drink AND coffee AND NOT latte”},
               {“title”: “lunch”, “info”: “falafel OR chickpea fritter sandwich”},
               {“title”: “dessert”, “info”: “cake OR pie OR ice cream”}
            ]
    }
}

Optional body parameters for the sentiment classifier:

info (dictionary): one or more of the following:
- combined_threshold (float): a minimum cut-off value for confidence, for a message to be classified into a positive or a negative topic. A number must be between 0 and 1 (default value is 0.4). Applies to the sentiment classifier of type "sentiment_combined".
- negative_threshold (float): a minimum cut-off value for confidence, for a message to be classified into a negative topic. A number must be between 0 and -1 (default value is -0.4). Applies to the sentiment classifier of type "sentiment_split".
- positive_threshold (float): a minimum cut-off value for confidence, for a message to be classified into a positive topic. A number must be between 0 and 1 (default value is 0.4). Applies to the sentiment classifier of type "sentiment_split".
- topics (list): a list of names and corresponding bands for the custom-defined sentiment topics. The bands must fall within [-1, 1] range and can overlap. Applies to the sentiment classifier of type "sentiment_custom"
- wordlist (list): a list of words and corresponding sentiment scores to be used by the classifier. The sentiment scores must be within [-5, 5] range. Applies to the sentiment classifier of type "sentiment_custom"
- domain_expansion (boolean): whether domain expansion is turned on or off, defaulted to off
- max_expansion_terms (integer): specifies a maximum desired number of domain-expanded sentiment terms. Default is 200 (for each polarity if applicable)

Sample parameters for sentiment classifier of type "sentiment_combined"

{
    "name": "Sentiment Classifier - combined",
    "type": "sentiment_combined",
    "status": "active"
    “info”: {
        “combined_threshold”: 0.5,
        “domain_expansion”: True
 

    }
}

Sample parameters for sentiment classifier of type "sentiment_split"

{
   "name": "Sentiment Classifier - split",
    "type": "sentiment_split",
    "status": "active"
     “info”: {
        “negative_threshold”: -0.3,
        “positive_threshold”: 0.5,
        “domain_expansion”: True
 

    }
}

Sample parameters for sentiment classifier of type "sentiment_custom"

{
    "name": "My Custom Sentiment List",
    "type": "sentiment_custom",
    "status": "active",
    "info": {
        "topics": {
            "bad": {"min": -1, "max": 0},
            "good": {"min": 0, "max": 1}
        },
        "wordlist": {
            "superb": 5,
            "excellent": 4,
            "mighty fine": 3,
            "mediocre": 2,
            "meh": 1,
            "poor": -1,
            "sucky": -2,
            "deplorable": -3,
            "miserable": -4,
            "execrable": -5
         }
    }
}

Response: a classifier object. While the classifier is being built or rebuilt, its building_state is set to building and its status to inactive. Classifier’s URL should be checked periodically until the building_state value has changed to ready and the status to active. If something went wrong during construction of the classifier, the building_state value is error and the info field will contain information on the error.

GET /projects/<project_id>/p/classifiers/<classifier_id>/

Retrieve the specific classifier’s information.

Permission required: read permission
Required parameters: none
Optional query parameters: none

Response: a classifier object.

PUT /projects/<project_id>/p/classifiers/<classifier_id>/

Once the classifier has been created, some of its information can be updated.

Permission required: write permission
Required parameters: none
Optional query parameters:
- name (string): a string that uniquely identifies the classifier for the project
- status (string): specify the activity status
- info (dictionary with one or more of the following keys) - see description in the POST /projects/<project_id>/p/classifiers/ section

Response: an updated classifier object.

DELETE /projects/<project_id>/p/classifiers/<classifier_id>/

Delete a classifier.

Permission required: write permission
Required parameters: none
Optional query parameters: none

Response: none.

POST /projects/<project_id>/p/classifiers/<classifier_id>/rebuild/

Rebuilds classifier, leveraging new or updated set of domain documents and/or training documents (if applicable) and/or new or changed set of labels (for the voting classifier), or to change certain configuration parameters for Sentiment Classifiers. The following are the classifiers that allow rebuilding:

Voting Classifier
Topic-based Classifier
Sentiment-Combined Classifier with Domain Expansion flag on
Sentiment-Split Classifier with Domain Expansion flag on
Sentiment-Custom Classifier with Domain Expansion flag on

While the classifier is being rebuilt, its building_state is set to 'building'; the current version of the classifier still accepts messages for classification until rebuild is complete.

Permission required: write permission
Required parameters: none
Optional parameters:
- dataset (string): a string that uniquely identifies the dataset that contains domain or training documents. A classifier can be rebuilt with a new dataset.
- domain_expansion (boolean): applies to Sentiment classifiers only - as part of the rebuild, a flag can be turned on on a existing Sentiment classifier to inform the sentiment by the project's domain-specific words.
- max_expansion_terms (integer): applies to Sentiment classifiers only with domain_expansion flag on - as part of the rebuild, a maximum desired number of domain-expanded sentiment terms can be changed.

Response: an updated classifier object.

POST /projects/<project_id>/p/classifiers/<classifier_id>/test/

Once created a classifier can be tested for accuracy. The input to the classification is a set of text-label pairs, where label values represent the "truth". The output of this endpoint is the accuracy number summarized and list of classification values for each of the classifier's topics. Accuracy is defined as ratio of text items classified into a correct topic, to the to total number of text items submitted, represented as percentage.

It is a best practice to submit to-be-classified text elements as a list of around 1000.

Permission required: write permission

Required parameters:

text (string): a to-be-classified text
label (string): the label into which the text should be classified if classification is done correctly
language (string): mandatory language value

Optional parameters:

Source id (string): user can optionally supply an external id to cross-reference with a separate system of record

Response: is dictionary with two elements:

accuracy (float): overall accuracy number (between 0 and 1)
messages (JSON): a list of all the messages submitted. Each message in the list has its original attributes (text, label, source_id, language) plus an added topics list, with {name, id, source, score} elements for each of the classifier's known labels, sorted in descending order by score

Domain Expansion for Sentiment Classifiers

This section applies to Sentiment classifiers only. With Sentiment classifiers, users have the option of turning on a domain_expansion flag. Domain expansion is a powerful option, and it allows to expand the standard (generic) sentiment to the sentiment that is informed by the project's domain-specific words.

All three sentiment classifiers - Sentiment_combined, Sentiment_split, Sentiment_custom - have the option of turning the domain_expansion flag on. By default the domain_expansion flag is off.

Domain expansion leverages the domain documents in the project to find additional sentiment-bearing terms. User must have documents in the project to use domain expansion. If 'dataset' is specified, the system will use that dataset to build domain specific sentiment, if 'dataset' is not specified, it will use all documents to build domain specific sentiment, and if there are no documents, the user will receive a warning that domain_expansion cannot be turned on.

Domain expansion can be turned on at the classifier creation, or it can be turned on for a classifier that had domain expansion turned off.

Sample code to create a sentiment_combined with domain expansion on

{
     “name”: “Sentiment Combined, expansion on”,
    “type”: “sentiment_combined”,
    “info”: {
        “domain_expansion”: True,
       “dataset”: “restaurant reviews”,
    }
}

Sample output that shows terms in the domain-expanded output

{
   "info": {
        "domain_expansion": True,
        "domain_terms": {
            "challenge": -1.74,
            "more": 1.87,
            "takeout": -2.35,
            "tip": -2.11,
            "happy hour": 3.2,
            "oysters": -1.43
        ...
    }
}

If desired, the domain_terms can be extracted by using a GET endpoint, and once edited the list can be updated by issuing a PUT.

It is possible for the system to generate words that carry both negative and positive sentiment. Those are put in the term_conflicts element, and are not considered by the classifier. Conflicted terms may be reviewed and moved into the domain_terms element with an appropriate sentiment score.

Domain expansion can be turned off, by setting domain_expansion to 'false'.
Note: once domain_expansion is turned off, the list of domain terms is deleted.

Classifier Topics for topic-based classifier

This section applies to topic-based classifiers only. Topics for voting classifiers are derived directly from the labels on the training documents and cannot be modified once the classifier has been created.

POST /projects/<project_id>/p/classifiers/<classifier_id>/topics

Create a new topic for an existing topic-based classifier.

Permission required: write permission
Required parameters:
- title (string): a string that uniquely identifies the classifier within the project
- info (string): a string with the topic specification, enclosed in double quotation marks (see topic syntax)
Optional parameters:
- status (string): either active or inactive (default: active)
- blocking (string): either True or False (default: False)

Response: the newly created topic object.

PUT /projects/<project_id>/p//topics/<topic_id>

Modify an existing topic associated with topic-based classifier.

Permission required: write permission
Required parameters: one or more of the following
- title (string): a string that uniquely identifies the topic
- info (string): a string with the topic specification (see topic syntax)
- status (string): either active or inactive (default: active)

Response: the modified topic object.

DELETE /projects/<project_id>/p/topics/<topic_id>

Delete an existing topic associated with topic-based classifier.
Permission required: write permission
Required parameters: none
Response: none

Topic Syntax

This section describes how to construct topics for the topic-based classifier. A topic specification is a string composed of terms (operands) linked by Boolean operators. There are three operators: AND, OR, and NOT, with their usual meaning; they must appear in all-uppercase. Operands may be single words or multi-word phrases. The string must not contain any punctuation (including quotation marks).

Specifications may range from the very simple (two terms linked by a single operator) to the arbitrarily complex (with parentheses demarcating embedded clauses). Here are some examples:

coffee OR tea
chips AND dip
card AND (debit OR credit)
(bagel AND (schmear OR cream cheese)) OR oatmeal

A simple rule of thumb is: use parentheses whenever you change operators. These two expressions are perfectly unambiguous and do not require parentheses:

coffee OR tea OR water OR soda
drink coffee AND NOT latte

The following expression, on the other hand, is incorrect because it is ambiguous:

ale OR beer AND NOT porter

Parentheses are required to distinguish the two possible interpretations:

(ale OR beer) AND NOT porter
ale OR (beer AND NOT porter)

The operands (words or phrases) always match concepts or semantic classes rather than literal words. It is not possible to restrict operands to match words in a document a literally.

Classifiers/Chain

With multiple classifiers in the project, the default behavior is for the incoming messages to get classified by all of the classifiers independently. But if desired, it is possible to classify messages conditionally by chaining two or more classifiers together. A classifier chain defines a path of classifiers that will process a message depending on its classifications into particular topics. After a message has been processed by one classifier, it may be sent to another classifier.

A classifier chain is defined per classifier: each classifier entry maps topic(s) to their target classifier(s). If a classifier is the target for a topic anywhere in the configuration, it will only process messages that are chained to it.

Classifiers and topics are specified by name. Classifiers must only be specified once, or not at all. A classifier whose name does not appear in the chain specification will classify all messages. Topics not present in the topic list do not receive any chaining, but are still active. The same classifier may be the target for more than one topic. The 'UNCLASSIFIED' topic works as any other topic and may chain to other classifiers.

POST /projects/<project_id>/p/classifiers/chain

Create (or update) the chain specification for the classifiers in the project. Only one chain specification is allowed per project.

Permission required: write permission
Required parameters: none
Optional parameters:
- status (string): "active" (default) or "inactive"
- paths (JSON object): specify the chaining path for the classifiers in the chain

Here's a code snippet for the paths parameter:

{
    "Features Classifier": {
        "Display": "Display Sentiment Classifier",
        "Keyboard": "Keyboard Sentiment Classifier"
    },
    "Risk Classifier": {
         "UNCLASSIFIED": "Scoring Classifier"
    }
}

Response: a created or updated classification chain object with these attributes:

status (string): "active" (default) or "inactive"
paths (JSON object): the chaining path for the classifiers in the chain

Classify

Classification is accomplished by posting to one of two endpoints - /projects/<project_id>/p/classify or /projects/<project_id>/p/messages. Posting to the /classify endpoint does not save the message presented, whereas posting to /messages does. Both endpoints return in near-real time a response object for each message that will contain a topics element with a list of the topics predicted for the message. Additionally, posting to /messages invokes periodic unsupervised clustering.

POST /projects/<project_id>/p/classify/

Get classification(s) for a message without saving it to the database.

Permission required: write permission
Required parameters:
- text (string): a to-be-classified text
Optional parameters:
- source_id (string): user-supplied ID to be used for reference purposes, for example to link the text classification results to a corresponding record in an external (source) system

Response: a message object with the topics list filled in; if none of the classifier’s predictions met the threshold, the special label UNCLASSIFIED will be returned. Each entry in the topics list has these four attributes:

id: the internal identifier of the topic (the last digit of the corresponding topic URL in the classifier definition)
source (string): the name of the Classifier that assigned the topic
name (string): one of the labels (class names) from the Classifier, or UNCLASSIFIED if the message did not match any of them with a confidence at or above the threshold
score (float): a value between 0 and 1 that indicates the strength of the match between the text and the topic; for UNCLASSIFIED topics the score will be 0

Also, if classifier chaining is active, a topic will have a next element that holds the topic assigned by the next classifier in the chain.

next (JSON object): another entry in the topic list as described above

You can see the details about submitting via /projects/<project_id>/p/messages end point here.

Project Topics

A topic is for messages to be categorized into. A topic is defined by zero or more clusters. A topic object in the API has the following fields:

url (string): unique URL for the topic
title (string): title for the topic
status (string): one of:
- "active" (activated by user),
- "inactive" (deactivated by user), or
- "pending" (typically used for topics generated by the system and not yet activated or deactivated; pending topics are removed when the next set of clusters is generated)
only active and pending topics get messages categorized into them
blocking (boolean): whether this is a blocking topic (a topic can be marked as blocking to indicate, for example, that it is spam; messages in blocking topics are not shown in any other topics that they would otherwise appear in)
created (time): time at which the topic was created
info (JSON object): arbitrary metadata (empty by default)
clusters (JSON list of objects): list of clusters in the topic
measurements (JSON object): information about rates of incoming messages (see the statistics section for details)
messages_url (string): URL for the messages in the topic

GET /projects/<project_id>/p/topics/

Get the list of topics.

Permission required: read

Required parameters: none

Optional query parameters:

status (string): "active", "inactive", or "pending" (default is to include all statuses)
blocking (boolean): include only blocking topics or only non-blocking topics (default is to include all topics)
stats (boolean): include volume and sentiment statistics for each topic. If this parameter is "true", you can further configure the time_buckets with these parameters:
- interval (number): number of seconds in each bucket (default is 300 (5 minutes))
- num_buckets (integer): number of buckets to report information for (default is 120)
- at_time (time): end of period to report information for (default is the current time)

Response: paginated list of topic objects

GET /projects/<project_id>/p/topics/<id>/

Get a topic.

Permission required: read
Required parameters: none
Optional query parameters:
- stats (boolean): include volume and sentiment statistics for each topic. If this parameter is "true", you can further configure the time_buckets with these parameters:
  - interval (number): number of seconds in each bucket (default is 300 (5 minutes))
  - num_buckets (integer): number of buckets to report information for (default is 120)
  - at_time (time): end of period to report information for (default is the current time)

Response: topic object

[DEPRECATED] GET /projects/<project_id>/p/topics/<id>/messages/

Get the list of messages in a topic.

Permission required: read
Required parameters: none
Optional query parameters:
- outliers (boolean): include non-central messages (messages that are closer to this topic than any other topic but that aren't especially central to this topic)
- any of the parameters for GET /projects/<project_id>/p/messages/

Response: paginated list of message objects

centrality (number): a positive or negative floating-point number that indicates how "central" the message is of the topic; negative numbers indicate "outliers"

PUT /projects/<project_id>/p/topics/<id>/

Update a topic.
Permission required: write
Required parameters: none

Optional body parameters:

title (string): title for the topic
status (string): "active" or "inactive"
blocking (boolean): whether this should be a blocking topic
info (JSON object): topic specification for topic-based classifiers only

Response: updated topic object

DELETE /projects/<project_id>/p/topics/<id>/

Delete a topic.
Permission required: write
Required parameters: none
Optional parameters: none
Response: (empty)

[DEPRECATED] Historical Topics

A read-only list of topics from the project's history, regardless of their status. Objects returned have the same fields as a regular topic.

[DEPRECATED] GET /projects/<project_id>/p/topics_history/

Get a list of topics from the project's history. Without query parameters, all topics are returned in reverse chronological order.

Permission required: read
Required parameters: none

Optional query parameters:

Four query parameters are available In addition to the query parameters for regular topics. They all take a timestamp argument in ISO 8601 format -- YYYY-MM- DD(THH:mm:ss)*. All times are UTC.

earliest_created (time): topics that were created on or after this time
latest_created (time): topics that were created on or before this time
earliest_ended (time): topics that became inactive on or after this time
latest_ended (time): topics that became inactive on or before this time

Response: paginated list of topic objects

[DEPRECATED] Project Messages

A message is a single unit of text, along with some associated data. Messages are via API.

A message object in the API has the following fields:

url (string): unique URL for the message
text (string): text of the message
language (string): code for language that the message is in (all messages within a project must be the same language)
source_id (string): ID of the message on the original platform
timestamp (time): time at which the message was written
blocked (boolean): whether this message belongs to any blocked topic
weight (number):
info (JSON object): arbitrary metadata; for Twitter messages, this contains information from the original source
sentiment (number): a confidence measure (positive or negative) indicative of the sentiment expressed by the message

[DEPRECATED] POST /projects/<project_id>/p/messages/

Submit a message or multiple messages for classification, while saving them to the database. To add multiple messages at a time, specify a list of JSON objects instead of a single JSON object (adding multiple messages at a time is probably not possible when using HTML forms instead of JSON).

Permission required: write

Required body parameters (on each message):

text (string): the text of the message
timestamp (time): time at which the message was written

Optional body parameters (on each message):

source_id (string): ID of the message on the original platform
language (string): code for language that the message is in
info (JSON object): other information to include with the message (this is for reference only, not searchable or analyzed)

Optional query parameters:

replace (boolean): the messages added in this request will replace all other messages that match the other query parameters
any of the query parameters allowed for GET /projects/<project_id>/p/messages/: specify which existing messages to replace

Response: message object or list of message objects. If at least one classifier is present, the message object(s) will have topics list filled in, one for each of the topics predicted. Each entry in the topics list has these four attributes:

id: the internal identifier of the topic (the last digit of the corresponding topic URL in the classifier definition)
source (string): the name of the Voting Classifier that assigned the topic
name (string): one of the labels (class names) from the Classifier training set, or UNCLASSIFIED if the message did not match any of them with a confidence at or above the threshold
score (float): a value between 0 and 1 that indicates the strength of the match between the text and the topic; for UNCLASSIFIED topics the score will be 0

[DEPRECATED] GET /projects/<project_id>/p/messages/

Get the list of messages.

Permission required: read
Required parameters: none
Optional query parameters:
- earliest (time): only include messages whose timestamp is this time or later
- latest (time): only include messages whose timestamp is this time or earlier
- older_than_id (number): only include messages older than the specified ID (i.e., with a lower ID, added to the system longer ago)
- newer_than_id (number): only include messages newer than the specified ID (i.e., with a higher ID, added to the system more recently)
Response: paginated list of message objects

[DEPRECATED] GET /projects/<project_id>/p/messages/<id>/

Get a message.
Permission required: read
Required parameters: none
Optional parameters: none

Response: message object

[DEPRECATED] DELETE /projects/<project_id>/p/messages/<id>/

Delete a message.
Permission required: write
Required parameters: none
Optional parameters: none

Response: (empty)

[DEPRECATED] Project Statistics

[DEPRECATED] GET /projects/<project_id>/p/stats/

Get statistics about the number of messages in the project.

Permission required: read
Required parameters: none

Optional query parameters:

interval (number): number of seconds in each bucket (default is 300 (5 minutes))
num_buckets (integer): number of buckets to report information for (default is 120)
at_time (time): end of period to report information for (default is the current time)

Response: JSON object with two keys:

measurements
- count (integer): total number of messages
- velocity (number): rate of incoming messages (messages per hour)
- acceleration (number): rate of velocity change (messages per hour per hour)
time_buckets
- start_time (time): beginning of period for which buckets are being reported (default is 10 hours ago)
- end_time (time): end of period for which buckets are being reported
- interval (integer): number of seconds that each bucket represents
- counts (JSON list of numbers): number of messages in each bucket, from earliest to latest
- sentiment_counts (JSON list of JSON dictionaries, each with three key/value pairs: the total weight of negative, neutral, and positive messages received in each bucket, from earliest to latest

If the number of buckets requested is greater than the number of buckets that exist, some of the buckets will be null.

Project Message Usage

Message usage objects keep track of the number of messages used by the project.

A message usage object in the API has the following fields:

url (string): URL for the message usage object
start_time (time): start time of the day that this object counts messages for (days start and end at noon, including the start point and not the end point)
count (integer): number of messages recorded

GET /projects/<project_id>/p/usage/

Get the list of message usage objects.
Permission required: read
Required parameters: none

Optional query parameters:

source (string): only show message usage for this source
earliest (time): only show information from this time or later
latest (time): only show information from this time or earlier

Response: list of JSON objects, one per month, each with four keys:

start_time (time) - start time of the month (noon on the first day of the month)
month_name (string) - English name of the month
total_counts (JSON object) - keys are source types and values are message counts
days (list of JSON objects) - list of message usage objects for the individual days and source types in the mont

GET /projects/<project_id>/p/usage/<id>/

Get message usage information for a particular source and day.
Permission required: read
Required parameters: none
Optional parameters: none

Response: message usage object

[DEPRECATED] Project Configuration

Note: This area of the API is deprecated. A configuration is a user-modifiable aspect of a project's settings - for example, the number of clusters to make in each partition. Usually the defaults are good and you will not need to change them.

A configuration object in the API has the following fields:

key (string): the name of the configuration
value (JSON object): the information stored for the configuration

The current configurable options are:

assoc_axes - number of dimensions in the vectors that are used to represent terms and messages
assoc_min_tokens - minimum number of tokens (word instances) required to build a new assoc space
assoc_max_tokens - maximum number of tokens (word instances) used to build a new assoc space
assoc_min_messages - minimum number of messages required to build a new assoc space
partition_min_messages - minimum number of messages required to build a new partition
partition_max_messages - minimum number of messages used to build a new partition
number_of_clusters - number of clusters to keep per partition
number_of_discardable_clusters - number of clusters (topics) from the current partition that will be replaced by new topics when the next partition is built
partition_min_lifetime - any partitions created more than this many seconds ago will be automatically periodically deleted unless they contain any active topics
assoc_do_not_build_within - do not build a new assoc space within this many seconds of having built one
assoc_force_build_after - if it has been more than this many seconds since the last assoc space was built, make a new one even if there aren't enough messages/tokens
partition_do_not_build_within - do not build a new partition within this many seconds of having built one
partition_force_build_after - if it has been more than this many seconds since the last partition was built, make a new one even if there aren't enough messages
labels_for_clustering - a list of topic titles or IDs (specified in a Voting Classifier) to identify messages to be used in default clustering mechanism

[DEPRECATED] GET /projects/<project_id>/p/config/

Get all of the configuration keys and values.
Permission required: read
Required parameters: none
Optional parameters: none

Response: JSON object with two keys:

default (JSON list of objects): list of configuration objects with their default values (including ones whose values in this project are overridden to something else)
overridden (JSON list of objects): list of configuration objects whose values have been set by a user

[DEPRECATED] GET /projects/<project_id>/p/config/<key>/

Get a configuration value.
Permission required: read
Required parameters: none
Optional parameters: none

Response: configuration object

[DEPRECATED] PUT /projects/<project_id>/p/config/<key>/

Update a configuration value.
Permission required: write
Required body parameters:
- value (JSON object or list or number): value to store for this configuration key
Optional parameters: none

Response: updated configuration object

[DEPRECATED] DELETE /projects/<project_id>/p/config/<key>/

Unset a key in the project configuration. This will restore its default value.
Permission required: write
Required parameters: none
Optional parameters: none

Response: (empty)

Accounts

An account is a billable entity and has certain account-wide settings (it doesn't actually have any yet).

An account object in the API has the following fields:

url (string): account URL (contains the unique ID)
id (string): unique randomly-generated ID for the account
name (string): name for the account (not necessarily unique)

GET /accounts/

Get the list of accounts. If you are a site admin, this will include all accounts; otherwise, it will only include accounts that you have permission on.

Permission required: none
Required parameters: none
Optional parameters: none

Response: paginated list of account objects

POST /accounts/

Create a new account.

Permission required: site admin
Required body parameters:
- name (string): a name for the account
Optional parameters: none
Response: new account object

GET /accounts/<account_id>/

Get an account.

Permission required: read (in the future this might change to manage, if accounts store information other than name)
Required parameters: none
Optional parameters: none

Response: account object

PUT /accounts/<account_id>/

Update an account.

Permission required: manage
Required parameters: none
Optional body parameters:
- name (string): name for the account

Response: updated account object

DELETE /accounts/<account_id>/

Delete an account.

Permission required: manage
Required parameters: none
Optional parameters: none

Response: (empty)

Users

A user represents one person using the system.

A user object in the API has the following fields:

url (string): URL for the user (contains the email address)
email (string): email address (this is the equivalent of a username)
name (string): full name
default_account (string or null): ID of the user's default account
must_reset_password (boolean): whether the user is currently required to change his password before doing anything else
admin (boolean): whether the user is a site admin
permissions (JSON list of objects): list of permissions the user has

GET /users/

Get the list of users. If you are a site admin, this will include all users, otherwise it will include yourself and any users who have permission on any account that you have manage permission on.

Permission required: none
Required parameters: none
Optional parameters: none

Response: paginated list of user objects

POST /users/

Create a user.

Permission required: site admin
Required body parameters:
- name (string): full name of the person
- email (string): email address
Optional body parameters:
- default_account (string) (default null): ID of account that should be user's default
- admin (boolean) (default false): whether the user should be a site admin

Response: new user object

GET /users/<email>/

Get a user.

Permission required: you must be this user OR you must have manage permission on some account that this user has permission on
Required parameters: none
Optional parameters: none

Response: user object

PUT /users/<email>/

Update a user.

Permission required: you must be this user
Required parameters: none
Optional body parameters:
- name (string): full name of the person
- email (string): email address
- default_account (string): ID of account that should be user's default
Optional body parameters for site admin only:
- admin (boolean): whether this user should be a site admin

Response: updated user object

DELETE /users/<email>/

Delete a user.

Permission required: site admin
Required parameters: none
Optional parameters: none

Response: (empty)

PUT /users/<email>/password/

Change a user's password. This also resets his API token.

Permission required: you must be this user
Required body parameters:
- password (string): new password
- old_password (string): old password
Optional parameters: none

Response: JSON object with three keys:

token (string): the user's new token string
user (JSON object): user object
expiration (time): the time at which the new token will expire

POST /users/<email>/password/reset/

Reset a user's password to a temporary password. This also deletes his API token. When the user next logs in, he will not be permitted to do (most) things until he changes his password to something else.

Permission required: site admin
Required parameters: none
Optional parameters: none

Response: JSON object with one key:

password (string): the user's temporary password

Permissions

Permissions allow users to do certain things to certain accounts. A user can have at most one permission on each account.

A permission object in the API has the following fields:

url (string): URL for the permission
user (string): email address of the user who has the permission
account (string): ID of the account that the user has permission on
level (string): the permission level that the user has on the account ("read", "readwrite", or "manage" - for descriptions of the levels, refer to the API Guide.

GET /permissions/

Get the list of permissions on accounts that you have manage permission on.

Permission required: manage
Required parameters: none
Optional query parameters:
- user (string): user (email address) to list permissions for
- level (string): only include permissions with this level ("read", "readwrite", or "manage")
Response: paginated list of permission objects

POST /permissions/

Give a user permission on an account.
Permission required: manage
Required body parameters:
- user (string): user (email address) to give permission on the account
- account (string): ID of account to give the user permission on
- level (string): what permission to give the user on the account

Response: new permission object

GET /permissions/<permission_id>/

Get a permission object.

Permission required: manage OR you must be this user
Required parameters: none
Optional parameters: none

Response: permission object

DELETE /permissions/<permission_id>/

Remove a permission.

Permission required: manage
Required parameters: none
Optional parameters: none

Response: (empty)