Formatting your data
Now that you have an account on Daylight and access to a Workspace, the next step is to upload some text data to analyze. You can refer to the Using different data sources with Daylight page for some pointers on different types of text data to consider. You will also want to include some metadata to provide context for the text data, such as demographic information, dates, scores, etc. Metadata can be used to filter the data to analyze a subset of the data set, and numerical data can be used in Drivers analysis, which you will learn about later in this document.
Key vocabulary
Before we get started, here are some terms that we will be using throughout this section of the document.
A verbatim is the conversational text component of the sample you have collected.
A document is a row of your source data, including the conversational text and any associated metadata.
Metadata is structured data that creates context for text responses. Metadata may include demographics, dates, scores, or product details.
A CSV file, or comma-separated value file, is a plain-text file format that is used to organize data. CSV files exclude styling information that is included in an Excel XLS or XLSX file formats. You can export a CSV file from most spreadsheet editors.
Data Fields
The following table describes the types of data that can be included in the data to be uploaded. You will need to designate the data type for each column of the uploaded data during the uploading process. There are two ways to do this:
In your dataset file, make sure each header has the data type, then an underscore, then the column name. E.g. score_Rating, string_Location
As you upload your dataset file, you can change individual data types in Daylight.
Data type | Examples |
| Column header: text or text_[FieldName] |
|
|
Title | Column header: title or title_[FieldName] |
|
|
String | Column header: string_[FieldName] |
| Example: string_MemberLevel
|
Number | Column header: number_[FieldName] |
| Example: number_Age
|
Score | Column header: score_[FieldName] |
| Example: score_OverallExperience
|
Date | Column header: date_[FieldName] |
| Example: date_CheckoutDate ISO 8601 formatted dates:
US-style dates:
|
Sometimes, a metadata field can have multiple values within a single document. For example, a survey may ask the respondent “which of these products have you tried?”. In such a case, the respondent may select more than one product. There are two ways that you can format the data in such cases:
Have one column for this metadata field and enter all of the values separated with the “|” (pipe) character. In the example above, the column header can be “Products Used” and the value in a given cell could be “ProductA | ProductB | ProductC”.
Have multiple columns with the same column header name with a single value in each cell. In the example above, you would have as many columns as you need with the header name “Products Used” and populate each cell with a single Product. Some of the cells can be left blank.
In both cases, a single metadata field will be created with multiple values.
Supported languages and multilingual datasets
Daylight is capable of performing analysis natively in 15 languages. For best results with a multilingual dataset, split your data into one language per upload file. Each language will be uploaded and analyzed as its own Project.
Save as a CSV file
To upload your data to Daylight, save the file in CSV format, and make sure that the file extension is .csv. Daylight will also accept similarly formatted files, such as a tab separated value (TSV) file. In this case, make sure that the file extension is .tsv.
© 2020 Luminoso Technologies. All rights reserved.