Using different data sources with Daylight

Luminoso Daylight is an AI-powered text analytics application that automatically analyzes conversational text like product reviews, open-ended survey responses, and support tickets. Daylight learns from the text data you upload, automatically adapting to your specific data.  Daylight relies on data that is compatible with its system to create accurate and valuable results. Before uploading your data to Daylight, consider what data sources best meet your organization’s needs. 

If your organization uses external data sources, you’re responsible for ensuring that the acquisition and use of data abides by your license agreement with Luminoso. 

Table of contents

Key vocabulary

  • verbatim is the conversational text component of the sample you have collected. 

  • document is a row of your source data, including the conversational text and any associated metadata.

  • Metadata is structured data that creates context for text responses. Metadata may include demographics, dates, scores, or product details. 

  • Concepts are words and phrases from the project dataset that Daylight automatically identifies as having relationships to other concepts within the dataset.

  • Key concepts are concepts that Daylight identifies as significant to your dataset. 

What data do I have?

First, consider all of the ways that potential customers or employees might generate natural language text data. This data could take the form of:

  • Reviews

  • Open-ended surveys

  • Support tickets

  • Forum posts

  • Social media posts

This article discusses the most common data sources among Daylight users, but don’t limit yourself. Use these examples to think through the challenges and opportunities of using any appropriate data source available to your organization. Depending on your needs, all of these data sources could produce valuable results, but some require additional work to optimize for Daylight, and may offer varying levels of opportunity.

Every example here is generalized. Use these suggestions to guide your analytical strategy and create a data set that gets optimal results from Daylight. 

Which data should I choose?

Consider these essential factors when you select natural language text data to process with Luminoso Daylight. 

  1. Use natural language text — Not all text is natural language. For instance, structured fields or forms offer very little change from verbatim to verbatim. 

  2. Consider how much data you have and how often you’ll use it —Think about how many rows of text data you have to analyze and how often you’ll need to take action on it. It’s also good to think about if you’ll need to upload more documents to your project later. 

  3. Prioritize unique unstructured text — Daylight identifies relationships between terms, so if your verbatims include recurring information, like ticket IDs or greetings, that come from within your organization, Daylight will probably identify that information as a top concept. Remove this information to focus on text that’s unique. 

  4. Include structured data — Adding structured data, or metadata, related to your verbatims allows you to use all features in Daylight. When you add metadata, you can filter your documents within a project and create new projects based on matching filter criteria. After you upload satisfaction scores alongside verbatims, use the Drivers feature in Daylight to analyze concepts that impact satisfaction scores. If you include dates, you can filter your dataset to display specific periods of time. 

  5. Provide context — Select verbatims where the writer has a focused goal when they are providing feedback. Avoid analyzing scattered data that isn’t relevant to your brand. This is especially crucial when considering data sources like forums or tweets, where the prompt may not be brand-generated, and the forum may be less moderated. 

  6. Plan your extraction process — When planning your approach to data and analysis, consider the amount of effort it will take to format data into a Comma Separated Values (CSV) file  for Daylight. Some sources may require more effort to make Daylight-compatible than others. 


Reviews collect customer thoughts about a product or service. Customers share thoughts and opinions about the parts of a product or service that were most emotionally impactful to them, which makes reviews one of the best data sources for Daylight.  Before Luminoso, creating value from reviews could be difficult for companies at scale, since it’s impossible to predict the words a customer will use to describe a product or service. Daylight automatically identifies the words and phrases in a dataset — even those it has never seen before — so you don’t have to guess the different ways customers might describe their experience.   

Find insights like

  • What do customers like about the product or service?

  • What friction areas do customers experience?

  • Did a recent change fix an issue, or did it have unintended effects on customer experience?

  • What are current weaknesses that the brand should improve?

Best practices for reviews

Keep and label data associated with your text to use as metadata


User: dsrice
Text: My 9 year old gets lots of ear infections, usually in the middle of the night. This warm fox and Tylenol are what soothes the pain until we can see a doctor and get medicine. It's a must have at our house.
Star Rating: 5
Title: Great for ear infection pain!
Review Date: October 5, 2018
Product: Warmies Microwavable French Lavender Scented Plush
Style: Fox
Verified Purchase: Yes

Rationale: Keeping your data clean allows you to deep-dive by creating projects from a master project. The more metadata you include, the more questions you can address. For example, Marketing may want to know what’s popular with two different generations of consumers, while Manufacturing may need to differentiate problems between models.


Surveys are a request for feedback that may be administered one-time, periodically, or on a rolling basis. To work in Daylight, a  survey must include open-ended questions that involve natural language text. Surveys often include rich structured data, like numeric satisfaction ratings.  Including this structured data as it relates to the natural language enables deeper research within Daylight through filters. The more open-ended a survey prompt, the stronger analysis will be in Daylight. Targeted questions can skew the way a respondent answers, making results more difficult to interpret. 

Find insights like

  • What are employees most passionate about?

  • How do new employees feel about the company?

  • What do management-level employees want to improve?

  • Are there differences in service between two locations?

Best practices for surveys

Focus on responses to open-ended prompts or elaboration on an initial answer


Text: I loved the free coffee and the room was very clean, but it smelled strongly of cigarette smoke.
Gold Member Since: 2015
Recent Stay: January 5, 2019
Overall Experience Score: 7
Check-In: 3
Room Cleanliness: 8
Room Service: 5
Check-Out: 4

Rationale: Responses like this offer unique insight into what the respondent was thinking or feeling. Including score data combines your natural language text with scores in Daylight’s Drivers feature, helping you make informed decisions about your data.

Support tickets

Support tickets are an information source that gives insight into parts of a product or service where users encounter difficulty. Daylight is an excellent match for this data source, since support tickets contain natural language descriptions of problems. A benefit of analyzing support tickets is identifying patterns and quantifying frequent issues. Some successful Daylight users conduct analysis alongside events like marketing campaigns, changes in service, or after a problem is reported. 

Find insights like

  • What are the most common issues being reported?

  • What type of issue causes the most difficulty for our clients?

  • Are there unexpected issues being reported?

  • Did recent changes impact the volume of support tickets for a given topic? 

Best practices for support tickets

Focus on the customer's side of the conversation and remove canned responses from data

  • Example: Thank you for contacting us, I’m sorry to hear of the inconvenience. 

  • Rationale: Greetings and goodbyes are canned responses, so they don’t reveal any new information. Including the agent side of the conversation may distort what question was being asked or if the customer was satisfied with the outcome.

Remove ticket IDs from your data

  • Example: 158475383947394 Requesting a refund for damaged couch.

  • Rationale: Internal ticket IDs aren’t necessary for analyzing the contents of support tickets. If ticket IDs are included in your verbatims, Daylight may process them as key concepts, making your data analysis less focused. 

Forum posts

Forum posts are an information source that are typically generated by users, not organizations. Users create and respond to threads, which usually start with a question or observation. Responses to the initial post are unique and open-ended, and vary based on the initial post and the forum’s moderation rules. Forum posts might not always address your business questions, but could provide major and unexpected insight. Forum data can also require work to clean and isolate individual posts from a running thread. Be careful, since combining multiple threads can cause the context across different conversations to drift apart. 

Find insights like

  • How should we engage with our user community? 

  • How should we communicate our community policies?

  • How should we engage with players who are violating terms of service? 

Best practices for forum posts

Choose high-volume threads with a clear focus

  • Do include:  How do I troubleshoot syncing my older bluetooth device with this new product version? 

  • Rationale: This thread offers focused suggestions for resolving a specific user issue on a brand-sponsored forum. A sponsored forum is usually moderated and adheres to strict rules, so content is focused and relates strongly  to the brand.

  • Don’t include: Did you see @leroyjenkins epic playthrough?

  • Rationale: This post is reacting to a player’s video on an open forum, not about the game, its content, or its services. This kind of open forum thread might include irrelevant information like user chatter or viral memes.

Social media posts

Social media posts, like those on Twitter or Instagram, are a unique source of natural language data which are often brief in length but may be high volume. Due to the brief but creative nature of posts, you need many samples to create a helpful Daylight analysis. Before starting, consider your methodology for de-duplicating re-posts and capturing posts that are relevant to your topic of research. Some companies measure success of marketing campaigns through social media. Daylight can improve that process by summarizing the main topics in the response.    

Find insights like

  • Does the market discuss a topic as much as it did in a prior period?

  • What concepts are most associated with a specific topic?

  • What was the general reaction to our system outage? 

  • What part of our marketing campaign are people talking about the most?

Best practices for social media posts

Focus on direct mentions of company support.

  • Example:  @companysupport or @companyfeedback

  • Rationale: The most valuable posts to analyze usually directly mention the support or customer satisfaction handles of an organization.

Study data associated with specific hashtags

  • Example: #BOGOweekend

  • Rationale: Isolating a specific, brand-relevant hashtag helps you find posts that are relevant to your organization. Daylight has no way to identify irrelevant data, so analyzes your entire dataset and gives all uploaded natural language text equal weight.

Ask open-ended questions to spark conversational, natural responses

  • Example:  What did you think of the products in this month’s subscription box?  

  • Rationale:  Phrasing encourages textual responses to a contextually rich prompt, rather than multiple choice answers.

Remove redundancies before you upload documents

  • Example:  REPOST if @brand should bring back a limited-time product!!!

  • Do: Include organic feedback from the audience.

  • Don’t:  Allow recycled content from reposts to contaminate your dataset. 

  • Rationale:  Recycled verbatim quotes might enforce associations in Daylight that don’t accurately represent the relationships between terms. 

Know the limitations of sarcasm and AI

  • Example: Well I ran out of printer paper during finals week.. So that's cool.. #studentprobs

  • Rationale: Social media posts are sometimes inflected with sarcasm which are impossible for Daylight to detect.

How do I prepare my data?

Use the Preparing a dataset for upload section in our Getting started with Daylight guide to help you format and upload data into Daylight. 

© 2020 Luminoso Technologies. All rights reserved.