Page Comparison

Preparing a dataset for upload

Once you choose a natural language source, include all the information you’ll need in your analysis so you can get the best results from Daylight. Invest time now, since Daylight learns directly from the data you upload and data can’t be modified after uploading.

Daylight’s Create a Project page accepts comma-separated value (CSV) format files with appropriately formatted columns. At minimum, every uploaded data file must be a CSV file and have a column titled text.

...

Key vocabulary

A verbatim is the conversational text component of the sample you have collected.
A document is a row of your source data, including the conversational text and any associated metadata.
Metadata is structured data that creates context for text responses. Metadata may include demographics, dates, scores, or product details.
A CSV file, or comma-separated value file, is a plain-text file format that is used to organize data. CSV files exclude styling information that is included in an Excel XLS or XLSX file format. You can create a CSV file with most spreadsheet editors.

Supported languages and multilingual datasets

Daylight includes 15 natural language processing pipelines that analyze unstructured text in one language at a time. For best results with a multilingual dataset, split your data into one language per CSV file. Then, select the appropriate language when uploading each file.

Metadata

To add metadata, designate specific headers as you create your CSV file. These headers tell Daylight how to treat the contents of a column. Columns without a Daylight-compatible header will be ignored.
When you name a metadata column, the first value you prepend it with is the one that Daylight uses.

Columns can only use one data type. All data type names are ignored after upload. For instance, string_Name1, number_Name2 or string_number_Name3, become Name1, Name2, Name3.

...

Data type

...

Examples

...

Text (Required)

...

Column header: text

...

The natural language samples (verbatims) for Daylight to analyze
Only one text column is permitted per file
Each piece of text may not exceed 500,000 characters in length

...

Example: text

I loved the free coffee and the room was very clean, but it smelled strongly of cigarette smoke.
I booked this room last-minute when my travel plans changed. The price was ok considering it was last-minute but it was way out of the way.
We come to this hotel every year, and we appreciate the consistently top-notch experience!

...

Title

...

Column header: title

...

Any identifier that is associated with text
Only one column permitted per data file
Isn’t analyzed as part of language sample, but can help organize text

...

Example: title

Recent stay
Hotel visit
We’ll definitely be back!

...

String

...

Column header: string_[FieldName]

...

Information that helps categorize text
Include as many string columns as needed
Can only filter fields in Daylight with up to 10,000 values
Helps filter your data by category

...

Example: string_MemberLevel

None
Business
Loyalty

...

Number

...

Column header: number_[FieldName]

...

Any numeric-only data associated with text
Include as many number columns as needed
Can optionally use in Driver feature

...

Example: number_MemberSince

2015
2018
1998

...

Score

...

Column header: score_[FieldName]

...

Any score or rating data associated with text
Include as many score columns as needed
Recommended for using the Drivers function

...

Example: score_OverallExperience

7
4
10

...

Date

...

Column header: date_[FieldName]

...

Any date or time associated with text
Include as many date columns as needed
Daylight assumes that all dates are in a UTC timezone unless you include an ISO 8601 date with a specific timezone
Accepts ISO 8601 strings, Unix timestamps, or US-style formats
Helps you filter your project, especially if you upload data more than once

...

Example: date_CheckoutDate

ISO 8601 formatted dates:

2018-04-10
2018-04-10T13:45
2018-04-10T13:45:00Z

US-style dates:

04/10/2018
04/10/2018 13:45:15
4/10/18 1:45 PM

...

String (where more than one value can be selected)

...

Column header: string_[FieldName], (string_[FieldName], …..)

...

You can have multiple values for string based metadata fields for cases where it is possible for more than one response to be selected for a single question. There are two ways to format such data:

Record the multiple values across multiple columns with the same string_[FieldName] column header.
Record the multiple values within a single column with the values separated by the pipe character (“|”).

Both options produce a metadata field with the name [FieldName] with all of the supplied values being applied to the corresponding document.

...

Example: Which product have you tried? [Choose all that apply]

Use string_ProductsUsed as the column header for as many columns as needed
Use one column with the header string_ProductsUsed and format the value in this column as ProductA | Product B | …

Versions Compared

Old Version 12

New Version 13

Key

Preparing a dataset for upload

Key vocabulary

Supported languages and multilingual datasets

Metadata