Introduction

Use these instructions to prepare your data, build a project using Luminoso’s Daylight REST API, and extract your documents before entering weighted keyword expansion into your search engine.

We provide general cURL examples for maximum compatibility across platforms. This guide assumes that you have some experience using REST services.

Table of Contents

1 Set up your data
2 Get an API token
3 Create a project
4 Upload your documents
5 Build the project
6 Check build status
7 Extract documents
8 Upload to third-party search engine

Steps

Set up your data

First, structure your data similar to the following example. In Luminoso’s system, each unit of data is a document, which must have a UTF-8 encoded “text” field. Your dataset must all be in one language.

If you want to perform keyword expansion on only some of your documents, you must also include metadata fields that indicate which samples should be included. For more information on metadata options in Luminoso Daylight, read our API documentation on metadata filters.

Once you organize it, your data may look something like:

Data.json:

In the example above, the “third example” document doesn’t match a keyword expansion filter criteria of “primary” and “yes”. Since it doesn’t match the filter, keyword expansion will not be conducted on this document, but it will be included in the project to deepen Luminoso Daylight’s natural language analysis.

Get an API token

You need an API token to authenticate the API calls you make while building an enhanced search project. You can create a token using the Luminoso Daylight user interface. For steps on the creation process, read the To create an API token section in our Settings Page Guide.

After you have created and copied a token, enter the token in each authorization header of your commands that our cURL call examples refer to as “token.”

Create a project

Create a Luminoso Daylight project. Add a name for your project, select the language of your documents, and use a command like the following.

Depending on your project and dataset, you receive an output like:

This output describes your project details. Read more about creating projects in the Luminoso API documentation.

Upload your documents

Upload the data that you prepared to your newly created project. With cURL, that step would look like this:

Build the project

When you make the build call, Luminoso processes your uploaded data and stores the resulting analysis. Read more about building a project in the API documentation. Build time is proportional to the size of your document set, so larger document sets take longer to upload.

When you make the call, set the filter for the documents on which you want to perform search expansion, and include a limit that sets the maximum number of expansions for each distinct term in a document.

Initiate the build with a call that looks like this using cURL:

You must include the “keyword_expansion” component for Luminoso to perform search expansion.

If you don’t include a filter in the “keyword_expansion” specification, Luminoso expands every document uploaded. This means the build may run for a very long time, depending on your document set.

You may also encounter a very long build time if you set the “limit” component too high. The “limit” component defaults to 20, but you can adjust it depending on how many keywords you would like to receive.

Check build status

Check to see if the project build is complete. Repeat a command similar to the cURL example every thirty seconds or so. Alternately, come back after an hour or so and check.

After checking the project’s build, you receive an output like

When the “success” field under “last_build_info” displays “true”, your project build is complete. For more information about projects in Luminoso, refer to the Projects section in the API documentation.

Extract documents

Use the documents endpoint to get the documents that Luminoso analyzed. Use a command like:

When you call the documents endpoint, the output for a single document looks like the following. For more information on documents in Luminoso Daylight, refer to the Documents section in the API documentation, especially the subsection which refers to limiting the fields that are included in the output.

In this example, take notice of the “weighted_keyword_expansion” field. This is the keyword output that Luminoso created for your search index. It represents each keyword found and its associated weight. Use this weight to help determine a threshold for terms that are appropriately related in your search engine.

Upload to third-party search engine

To finalize and apply the expanded search, enter the JSON output you received from the Documents endpoint into your third party search engine, or convert the JSON output to one that’s compatible with your search engine.

Make sure to retain the information from the “weighted_keyword_expansion” field. Use the weight value to determine which keywords are relevant enough to include in your search engine.

Search Enhancement API Cookbook