Compass API documentation

Compass API documentation

Last Updated: Jan 24, 2019

Table of contents

Input and output

Requests

Note: Ensure that all calls to the Compass API include a trailing slash in the endpoint URL.

For GET/DELETE requests, parameters should go in the URL as query parameters. For PUT/POST requests, parameters should almost always go in the request body (exceptions for particular endpoints are noted in their documentation); they can be formatted as HTML forms (in which case the Content-Type header should be set to "application/x-www-form-urlencoded") or as JSON (in which case the Content-Type header should be set to "application/json").

Responses

Response bodies are JSON-encoded (except for certain 50X server errors).

Successful responses use the following HTTP codes:

  • 200 (OK) - request was successful

  • 201 (Created) - request was successful and a new object was created

  • 204 (No Content) - request was successful and the object was deleted 

Some API methods return a "paginated" list of objects (for example, getting the list of messages in a topic). A paginated list is a JSON object with four keys:

  • count (integer) - total number of items (in all pages combined)

  • next (string) - URL of next page of items (null if there is no next page)

  • previous (string) - URL of previous page of items (null if there is no previous page)

  • results (JSON list of objects) - the items in the current page (each page contains up to 20 items) 

Errors

Error responses use the following HTTP codes:

  • 400 (Bad Request) - request was invalid (e.g., a required parameter was missing)

  • 401 (Unauthorized) - request was not authenticated (authorization token or session cookie was either not provided or not valid)

  • 403 (Forbidden) - request was authenticated but user does not have permission to perform the action

  • 404 (Not Found) - requested URL doesn't exist

  • 405 (Method Not Allowed) - HTTP method specified in the request is not allowed on specified URL

  • 500 (Internal Server Error) - other unexpected error

In cases where there is an error, in addition to the HTTP code the response will include some information about the error. This information will usually be a JSON object but could also be a JSON string.

Below are some typical examples of error responses.

For a 401 (Unauthorized), when the request is not authenticated:

{ "detail": "Authentication credentials were not provided."

For a 400 (Bad Request), when trying to create a project while specifying no name and a nonexistent account:

{ "name": [ "This field is required." ], "account": [ "Invalid pk 'xxx' - object does not exist." ]

Data types

The documentation refers to several data types for parameters:

  • number - a numerical value (integer or floating point)

  • integer - an integer value

  • string - a string (in responses, this will be JSON-encoded)

  • boolean - a boolean value (in responses and JSON request bodies, true or false; in query parameters, True or False; in HTML forms, f/F/false/False/FALSE/0 are false and anything else is true)

  • time - a representation of a time (in responses, this will be a string in ISO format, such as 2014-08-17T12:45:01.22+00:00; in requests, it can be in this ISO format or a number of seconds since the UNIX epoch

Authentication and access

Permissions

A permission allows a particular user to do certain things with respect to a particular account.

The permission levels are:

  • read: the user can view account information and can view projects owned by the account

  • readwrite: the user can view account information and can view, modify, and create projects owned by the account

  • manage: the user can view and modify account information, and can view, modify, and create projects owned by the account

In addition to permissions on accounts, there is a special status called "site admin". If a user is a site admin, s/he is allowed to do everything. Site admins are the only users who are allowed to create accounts and users. 

Tokens and authentication

Each user can obtain a token that can be used to authenticate to the API. Currently, each user can have at most one token. All tokens currently expire two weeks after their creation, or they can be deleted manually. A user's token is also reset if his password is reset or changed.

For an API request to be authenticated, it should include an "Authorization" header, whose value should be the string "Token " followed by the user's token. For example, if the token is "74e6d14dbde4303fe9864cb77306cc36394de460", then the value of the Authorization header should be "Token 74e6d14dbde4303fe9864cb77306cc36394de460".

To obtain your token, log in with your username and password. If you already have a token, the login response will include your existing token, otherwise it will include a newly-generated token.

Compass API Endpoints

The following endpoints/methods are provided by the Compass API.

Note: Ensure that all calls to the Compass API include a trailing slash in the endpoint URL.

To make API requests, the endpoints listed here should be prefixed with "https://<compass-url.tld>/api"

Tokens

A token authenticates a particular user to the API. Currently, each user has at most one token at a time.

A token object in the API has the following fields:

  • token (string): the token string itself, to be included in authorization header on HTTP requests

  • user (string): the username of the user whose token this is

  • expiration (time): the time at which the token will expire

POST /login/

Log in with a username and password to get a token.

  • Permission required: none

  • Required body parameters:

    • username (string)

    • password (string)

  • Optional parameters: none

Response: JSON object with three keys:

  • token (string): the user's token string

  • user (JSON object): user object

  • expiration (time): the time at which the token will expire

GET /tokens/

Get a list of tokens. This will only include your tokens, unless you are a site admin.

  • Permission required: none

  • Required parameters: none

  • Optional query parameters:

    • user (string): user (email address) to list tokens for

Response: paginated list of token objects

GET /tokens/<token>/

Get an existing token.

  • Permission required: you must be the user who owns the token

  • Required parameters: none

  • Optional parameters: none

Response: token object 

DELETE /tokens/<token>/

Delete a token.

  • Permission required: you must be the user who owns the token

  • Required parameters: none

  • Optional parameters: none

Response: (empty)

Projects

A project is one set of data (classifiers, documents, messages and their associated information and analysis).

A project object in the API has the following fields:

  • url (string): project URL (containing randomly-generated unique ID for the project)

  • name (string): name for the project (not necessarily unique)

  • account (string): unique ID of the account that owns this project

  • description (string): a description of the project, or notes about it, etc.

  • language (string): language code for the project's messages

  • status (string): "active" or "inactive" (an inactive project can still be viewed but will not process any new messages)

  • creator (string): username of the user who created the project

  • created (time): time at which the project was created

GET /projects/

Get the list of all the projects the user has access to.

  • Permission required: none

  • Required parameters: none

  • Optional query parameters:

    • account (string): ID of a single account whose projects to list

    • status (string): only list projects that have this status ("active" or "inactive")

  • Response: paginated list of project objects

POST /projects/

Create a project.

  • Permission required: write

  • Required body parameters:

    • name (string): name for the project

    • account (string): ID of the account that will own the project

    • language (string): language code for the project's messages

  • Optional body parameters:

    • description (string): description, notes, etc.

    • status (string): "active" (default) or "inactive"

Response: new project object

GET /projects/<project_id>/

Get a project.

  • Permission required: read

  • Required parameters: none

  • Optional parameters: none

Response: project object

PUT /projects/<project_id>/

Update a project.

  • Permission required: write

  • Required parameters: none

  • Optional body parameters:

    • name (string): name for the project

    • description (string): description, notes, etc.

    • status (string): "active" or "inactive" (an inactive project can still be viewed but will not process any new messages)

Response: updated project object

DELETE /projects/<project_id>/

Delete a project.

  • Permission required: write

  • Required parameters: none

  • Optional parameters: none

Response: (empty)

Documents

A document is an individual unit of text that can be used in the classification workflow.  A document object in the API has the following fields:

  • url (string): an API endpoint url to access the document object

  • text (string): text content of the document, used to create the project’s domain space

  • id (integer): a unique document ID

  • label (string): an optional label name (if a document is a part of the labeled set)

  • dataset (string): name of the dataset this document belongs to, whether manually created or auto-generated

  • language (string): the two-letter language code of the language of the document (e.g., “en”)

Typically, labeled documents are put into a dataset and used to train supervised classifiers in Compass. Unlabeled documents can also be added, for the purposes of setting the vector space used in supervised classifier building or building other types of classifiers.

GET /projects/<project_id>/p/documents/

Get the list of documents from the first page (20 documents).

  • Permission required: read

  • Required parameters: none

  • Optional query parameters:

    • label (string): documents can be filtered by label

    • dataset (string): used for retrieving documents belonging to a specific dataset

Response: paginated list of document objects

POST /projects/<project_id>/p/documents/

Add a document or multiple documents. To add multiple documents at a time, specify a list of JSON objects instead of a single JSON object.

  • Permission required: write permission

  • Required parameters: none

    • text (string): the text content of the document, in UTF-8 encoding (string)

  • Optional query parameters:

    • label (string): the class the document belongs to

    • dataset (string): the name of the collection of documents this document belongs to

Response: paginated list of project objects

Classifiers

Compass provides three types of classifiers: voting, topic-based, and sentiment. The voting classifier is a supervised classifier trained on a collection of labeled documents. The topic-based classifier is a semi-supervised classifier; it relies on topics defined by the user and does not require training. A collection of domain documents is still needed by the topic-based classifier to create the semantic space on which to operate. The sentiment classifier relies on the semantic space only for domain expansion, and domain documents are necessary only for that case.

Depending on the type of the classifier, a classifier object is defined as follows.

A classifier object for the voting classifier:

  • url (string): unique URL for the classifier

  • name (string): a unique name for classifier

  • type (string): a type of the classifier (“voting”)

  • status (string): can be “active” or “inactive”. Only “active” classifiers can classify incoming messages.

  • building_state (string): either "building" (the classifier is under construction) or "ready" (the classifier is ready to classify incoming messages)

  • topics (list): a list of topic objects (aka topics aka classes) for the classifier

  • Info (dictionary):

    • num_topics (integer): maximum number of topics a message can be classified into, or the special value "ALL", which will return a classification into each topic.

    • threshold (float): a minimum cut-off value for confidence, for a message to be classified into a topic. A number between 0 and 1.

  • dataset (string): a name of the dataset containing documents used to build the classifier

  • created (timestamp): when classifier was first created

  • last_update (timestamp): when classifier was last updated

A classifier object for the topic-based classifier:

  • url (string): unique URL for the classifier

  • name (string): a unique name for classifier

  • type (string): a type of the classifier (“topic_based”)

  • status (string): can be “active” or “inactive”. Only “active” classifiers can classify incoming messages.

  • building_state (string): either "building" (the classifier is under construction) or "ready" (the classifier is ready to classify incoming messages)

  • topics (list): a list of topic objects (aka topics aka classes) and all their info for the classifier

  • Info (dictionary):

    • num_topics (integer): maximum number of topics a message can be classified into, or the special value "ALL", which will return a classification into each topic.

    • threshold (float): a minimum cut-off value for confidence, for a message to be classified into a topic. A number between 0 and 1.

    • topics (list): a list of topics (title and definition) for the classifier

  • dataset (string): a name of the dataset containing documents used to build the classifier

  • created (timestamp): when classifier was first created

  • last_update (timestamp): when classifier was last updated

A classifier object for the sentiment classifier:

  • url (string): unique URL for the classifier

  • name (string): a unique name for classifier

  • type (string): a type of the classifier (“sentiment_combined", "sentiment_split", "sentiment_custom")

  • status (string): can be “active” or “inactive”. Only “active” classifiers can classify incoming messages.

  • building_state (string): either "building" (the classifier is under construction) or "ready" (the classifier is ready to classify incoming messages)

  • topics (list): a list of topic objects (aka classes) for the classifier

  • Info (dictionary):

    • combined_threshold (float): a minimum cut-off value for confidence, for a message to be classified into a topic. A number between 0 and 1. Applies to the classifier of type "sentiment_combined"

    • negative_threshold (float): a minimum cut-off value for confidence, for a message to be classified into a topic. A number between 0 and -1. Applies to the classifier of type "sentiment_split"

    • positive_threshold (float): a minimum cut-off value for confidence, for a message to be classified into a topic. A number between 0 and 1. Applies to the classifier of type "sentiment_split"

    • topics (list): a list of names and corresponding bands for the custom-defined sentiment topics. Applies to the sentiment classifier of type "sentiment_custom"

    • wordlist (list): a list of words and corresponding sentiment scores to be used by the classifier. Applies only to the sentiment classifier of type "sentiment_custom"

    • domain_expansion (boolean): whether domain expansion is turned on or off

    • max_expansion_terms (integer): specifies a maximum desired number of domain-expanded sentiment terms. Default is 200 (for each polarity if applicable)

  • dataset (string): a name of the dataset containing documents used to build the classifier

  • created (timestamp): when classifier was first created

  • last_update (timestamp): when classifier was last updated

GET /projects/<project_id>/p/classifiers/

Get the list of classifiers in the project.

  • Permission required: read

  • Required parameters: none

  • Optional query parameters:

    • name (string): you can filter by name to get a specific classifier

Response: paginated list of classifier objects.

POST /projects/<project_id>/p/classifiers/

Create a new classifier.

Permission required: write permission

Required parameters:

  • name (string): a string that uniquely identifies the classifier within the project

  • type (string): either “voting”, "topic_based", “sentiment_combined”, “sentiment_split”, or “sentiment_custom”

  • status (string): either “active” or "inactive"

Optional parameters vary slightly by the type of the classifier.

Optional body parameters for the voting classifier:

  • info (dictionary): one or more of the following:

    • num_topics (integer): maximum number of topics a message can be classified into, or the special value "ALL", which will attempt to classify the message into each topic. Default value is 1. If a number greater than 1 is provided, the system will return the “top X” labels above the threshold.

    • threshold (float): a minimum cut-off value for confidence, for a message to be classified into a topic. A number must be between 0 and 1. Default value is 0.4

  • dataset (string): if dataset name is specified, classifier will be built on documents belonging to that dataset (rather than all labeled documents)

{ “name”: “my first voting classifier”, “status”: “active”, “type”: “voting”, “dataset”: “restaurant-reviews-training-set”, “info”: { “num_topics”: 2, “threshold”: 0.5 } }

Optional body parameters for the topic-based classifier:

  • info (dictionary)::

    • topics (list): a list of title/info dictionaries that define the topics to assign (you may omit this section and create the topics individually after the classifier has been created; see Classifier Topics for details) 

Sample parameters for topic-based classifier

{ “name”: “food and drink”, “status”: “active”, “type”: “topic_based”, “info”: { “num_topics”: 2, “threshold”: 0.76, “dataset”: “restaurant reviews”, “topics”: [ {“title”: “breakfast”, “info”: “bagel AND (schmear OR cream cheese)”}, {“title”: “coffee”, “info”: “drink AND coffee AND NOT latte”}, {“title”: “lunch”, “info”: “falafel OR chickpea fritter sandwich”}, {“title”: “dessert”, “info”: “cake OR pie OR ice cream”} ] } }

Optional body parameters for the sentiment classifier:

  • info (dictionary): one or more of the following:

    • combined_threshold (float): a minimum cut-off value for confidence, for a message to be classified into a positive or a negative topic. A number must be between 0 and 1 (default value is 0.4).  Applies to the sentiment classifier of type "sentiment_combined".

    • negative_threshold (float): a minimum cut-off value for confidence, for a message to be classified into a negative topic. A number must be between 0 and -1 (default value is -0.4).  Applies to the sentiment classifier of type "sentiment_split".

    • positive_threshold (float): a minimum cut-off value for confidence, for a message to be classified into a positive topic. A number must be between 0 and 1 (default value is 0.4).  Applies to the sentiment classifier of type "sentiment_split".

    • topics (list): a list of names and corresponding bands for the custom-defined sentiment topics. The bands must fall within [-1, 1] range and can overlap. Applies to the sentiment classifier of type "sentiment_custom"

    • wordlist (list): a list of words and corresponding sentiment scores to be used by the classifier.  The sentiment scores must be within [-5, 5] range. Applies to the sentiment classifier of type "sentiment_custom"

    • domain_expansion (boolean): whether domain expansion is turned on or off, defaulted to off

    • max_expansion_terms (integer): specifies a maximum desired number of domain-expanded sentiment terms. Default is 200 (for each polarity if applicable)

Sample parameters for sentiment classifier of type "sentiment_combined"

{ "name": "Sentiment Classifier - combined", "type": "sentiment_combined", "status": "active" “info”: { “combined_threshold”: 0.5, “domain_expansion”: True } }

Sample parameters for sentiment classifier of type "sentiment_split"

{ "name": "Sentiment Classifier - split", "type": "sentiment_split", "status": "active" “info”: { “negative_threshold”: -0.3, “positive_threshold”: 0.5, “domain_expansion”: True } }

Sample parameters for sentiment classifier of type "sentiment_custom"

{ "name": "My Custom Sentiment List", "type": "sentiment_custom", "status": "active", "info": { "topics": { "bad": {"min": -1, "max": 0}, "good": {"min": 0, "max": 1} }, "wordlist": { "superb": 5, "excellent": 4, "mighty fine": 3, "mediocre": 2, "meh": 1, "poor": -1, "sucky": -2, "deplorable": -3, "miserable": -4, "execrable": -5 } } }

Response: a classifier object. While the classifier is being built or rebuilt, its building_state is set to building and its status to inactive. Classifier’s URL should be checked periodically until the building_state value has changed to ready and the status to active. If something went wrong during construction of the classifier, the building_state value is error and the info field will contain information on the error.

GET /projects/<project_id>/p/classifiers/<classifier_id>/

Retrieve the specific classifier’s information.

  • Permission required: read permission

  • Required parameters: none

  • Optional query parameters: none

Response: a classifier object.

PUT /projects/<project_id>/p/classifiers/<classifier_id>/

Once the classifier has been created, some of its information can be updated.

  • Permission required: write permission

  • Required parameters: none

  • Optional query parameters:

    • name (string): a string that uniquely identifies the classifier for the project

    • status (string): specify the activity status

    • info (dictionary with one or more of the following keys) - see description in the POST /projects/<project_id>/p/classifiers/ section 

Response: an updated classifier object.

DELETE /projects/<project_id>/p/classifiers/<classifier_id>/

Delete a classifier.

  • Permission required: write permission

  • Required parameters: none

  • Optional query parameters: none

Response: none. 

POST /projects/<project_id>/p/classifiers/<classifier_id>/rebuild/

Rebuilds classifier, leveraging new or updated set of domain documents and/or training documents (if applicable) and/or new or changed set of labels (for the voting classifier), or to change certain configuration parameters for Sentiment Classifiers. The following are the classifiers that allow rebuilding:

  1. Voting Classifier

  2. Topic-based Classifier

  3. Sentiment-Combined Classifier with Domain Expansion flag on

  4. Sentiment-Split Classifier with Domain Expansion flag on

  5. Sentiment-Custom Classifier with Domain Expansion flag on

While the classifier is being rebuilt, its building_state is set to 'building'; the current version of the classifier still accepts messages for classification until rebuild is complete. 

  • Permission required: write permission

  • Required parameters: none

  • Optional parameters:

    • dataset (string): a string that uniquely identifies the dataset that contains domain or training documents. A classifier can be rebuilt with a new dataset.

    • domain_expansion (boolean): applies to Sentiment classifiers only - as part of the rebuild, a flag can be turned on on a existing Sentiment classifier to inform the sentiment by the project's domain-specific words.

    • max_expansion_terms (integer): applies to Sentiment classifiers only with domain_expansion flag on - as part of the rebuild, a maximum desired number of domain-expanded sentiment terms can be changed.

Response: an updated classifier object.

POST /projects/<project_id>/p/classifiers/<classifier_id>/test/ 

Once created a classifier can be tested for accuracy. The input to the classification is a set of text-label pairs, where label values represent the "truth".  The output of this endpoint is the accuracy number summarized and list of classification values for each of the classifier's topics. Accuracy is defined as ratio of text items classified into a correct topic, to the to total number of text items submitted, represented as percentage.

It is a best practice to submit to-be-classified text elements as a list of around 1000.

Permission required: write permission

Required parameters:

  • text (string): a to-be-classified text

  • label (string): the label into which the text should be classified if classification is done correctly

  • language (string): mandatory language value

Optional parameters:

  • Source id (string): user can optionally supply an external id to cross-reference with a separate system of record

Response: is dictionary with two elements:

  • accuracy (float): overall accuracy number (between 0 and 1)

  • messages (JSON): a list of all the messages submitted. Each message in the list has its original attributes (text, label, source_id, language) plus an added topics list, with {name, id, source, score} elements for each of the classifier's known labels, sorted in descending order by score

Domain Expansion for Sentiment Classifiers

This section applies to Sentiment classifiers only. With Sentiment classifiers, users have the option of turning on a domain_expansion flag. Domain expansion is a powerful option, and it allows to expand the standard (generic) sentiment to the sentiment that is informed by the project's domain-specific words.

All three sentiment classifiers - Sentiment_combined, Sentiment_split, Sentiment_custom - have the option of turning the domain_expansion flag on.  By default the domain_expansion flag is off.

Domain expansion leverages the domain documents in the project to find additional sentiment-bearing terms. User must have documents in the project to use domain expansion. If 'dataset' is specified, the system will use that dataset to build domain specific sentiment, if 'dataset' is not specified, it will use all documents to build domain specific sentiment, and if there are no documents, the user will receive a warning that domain_expansion cannot be turned on.

Domain expansion can be turned on at the classifier creation, or it can be turned on for a classifier that had domain expansion turned off.

Sample code to create a sentiment_combined with domain expansion on

{ “name”: “Sentiment Combined, expansion on”, “type”: “sentiment_combined”, “info”: { “domain_expansion”: True, “dataset”: “restaurant reviews”, } }

Sample output that shows terms in the domain-expanded output

{ "info": { "domain_expansion": True, "domain_terms": { "challenge": -1.74, "more": 1.87, "takeout": -2.35, "tip": -2.11, "happy hour": 3.2, "oysters": -1.43 ... } }

If desired, the domain_terms can be extracted by using a GET endpoint, and once edited the list can be updated by issuing a PUT.

It is possible for the system to generate words that carry both negative and positive sentiment. Those are put in the term_conflicts element, and are not considered by the classifier. Conflicted terms may be reviewed and moved into the domain_terms element with an appropriate sentiment score.

Domain expansion can be turned off, by setting domain_expansion to 'false'. 
Note: once domain_expansion is turned off, the list of domain terms is deleted. 

Classifier Topics for topic-based classifier

This section applies to topic-based classifiers only. Topics for voting classifiers are derived directly from the labels on the training documents and cannot be modified once the classifier has been created. 

POST /projects/<project_id>/p/classifiers/<classifier_id>/topics

Create a new topic for an existing topic-based classifier.

  • Permission required: write permission

  • Required parameters:

    • title (string): a string that uniquely identifies the classifier within the project

    • info (string): a string with the topic specification, enclosed in double quotation marks (see topic syntax

  • Optional parameters:

    • status (string): either active or inactive (default: active)

    • blocking (string): either True or False (default: False)

Response: the newly created topic object.

PUT /projects/<project_id>/p//topics/<topic_id>

Modify an existing topic associated with topic-based classifier.

  • Permission required: write permission

  • Required parameters: one or more of the following

    • title (string): a string that uniquely identifies the topic

    • info (string): a string with the topic specification (see topic syntax

    • status (string): either active or inactive (default: active)

Response: the modified topic object. 

DELETE /projects/<project_id>/p/topics/<topic_id>

  • Delete an existing topic associated with topic-based classifier.

  • Permission required: write permission

  • Required parameters: none

  • Response: none

Topic Syntax

This section describes how to construct topics for the topic-based classifier. A topic specification is a string composed of terms (operands) linked by Boolean operators. There are three operators: AND, OR, and NOT, with their usual meaning; they must appear in all-uppercase. Operands may be single words or multi-word phrases. The string must not contain any punctuation (including quotation marks). 

Specifications may range from the very simple (two terms linked by a single operator) to the arbitrarily complex (with parentheses demarcating embedded clauses). Here are some examples:

  • coffee OR tea

  • chips AND dip

  • card AND (debit OR credit)

  • (bagel AND (schmear OR cream cheese)) OR oatmeal

A simple rule of thumb is: use parentheses whenever you change operators. These two expressions are perfectly unambiguous and do not require parentheses:

  • coffee OR tea OR water OR soda
    drink coffee AND NOT latte

The following expression, on the other hand, is incorrect because it is ambiguous: 

  • ale OR beer AND NOT porter

© 2020 Luminoso Technologies. All rights reserved.