Table of contents

Tables exported from Luminoso

Table: doc_subset

Field	Description
doc_id	Unique identifier for each document/verbatim uploaded in a given file - surrogate key of all tables
subset	Special field for Tableau Export containing the subset ID - this field is generated from the export tool, not from Daylight Example: subset 0, subset 1, subset 2
subset_name	The name of the subset category, not the subset Example: Location, product ID, Category, Age
value	The value of the subset Example: New York, or 25 years old

Table: doc_terms

Field	Description
doc_id	Unique identifier for each document/verbatim uploaded in a given file - surrogate key of all tables
term	The smallest units within database knowledge to create concepts Example: “Best Buy” is one concept with two terms Example2: “Leaky pen battery” is one concept with three terms
exact_match	Boolean value (0 or 1) to determine whether the document is an exact match to a specific term Example: “The battery leaked” either has (1) or does not have (0) an exact match to “Battery”
association	The scored relationship between document and term as generated in Daylight (Dot product between vectors)

Table: doc_topic

Field	Description
association	The scored relationship between document and term as generated in Daylight (Dot product between vectors)
doc_id	Unique identifier for each document/verbatim uploaded in a given file
topic	Now identified in the current version of Daylight as “Saved Concept” which can be a combination of concepts or a single concept

Table: drivers

Field	Description
doc_count	Number of documents in a project
driver	The concept that is correlated with a difference in score from the average Example: “great smell” is associated with a higher score, while “leaky” is associated with a lower score
example_doc	First sample document that contains a concept that is driving the score
example_doc2	Second sample document that contains a concept that is driving the score
example_doc3	Third sample document that contains a concept that is driving the score
impact	The numeric value of the drivers score driver difference from the average Example: documents that include “great smell” have an impact of 1.2 higher than the average Documents that include “leaky” have an impact of 2 lower than the average
related_terms	Terms related to the driver Example: “not stinky” is related to “great smell” “Leaky” is related to “bad packaging”
subset	A special field for Tableau Export containing the subset ID - this field is generated from the export tool, not from Daylight Example: subset 0, subset 1, subset 2
type	user_defined: same as a saved concept Example: “leaky” auto_found: system identified drivers

Table: skt

Field	Description
subset	A special field for Tableau Export containing the subset ID - this field is generated from the export tool, not from Daylight Example: subset 0, subset 1, subset 2
term	The smallest units within database knowledge to create concepts Example: “Best Buy” is one concept with two terms Example2: “Leaky pen battery” is one concept with three terms
text_1	First sample document that contains a concept belonging to the subset of the documents
text_2	Second sample document that contains a concept belonging to the subset of the documents
text_3	Third sample document that contains a concept belonging to the subset of the documents
impact	The numeric value of the driver’s score difference from the average Example: documents that include “great smell” have an impact of 1.2 higher than the average Documents that include “leaky” have an impact of 2 lower than the average
conceptual_matches	Number of documents with the highest association scores to a specific concept
exact_matches	Number of documents with the specific concept
odds_ratio	An odds ratio (OR) is a statistic that quantifies the strength of the association between two events, A and B. The odds ratio is defined as the ratio of the odds of A in the presence of B and the odds of A in the absence of B, or equivalently (due to symmetry), the ratio of the odds of B in the presence of A and the odds of B in the absence of A.
p_value	In statistical hypothesis testing, the p-value or probability value or significance is, for a given statistical model, the probability that, when the null hypothesis is true, the statistical summary (such as the absolute value of the sample mean difference between two compared groups) would be greater than or equal to the actual observed results.
total_matches	Sum of conceptual and exact matches
value	The value of the subset Example: New York, or 25 years old

Table: doc_topic

Field	Description
term	The smallest units within database knowledge to create concepts Example: “Best Buy” is one concept with two terms Example2: “Leaky pen battery” is one concept with three terms
exact_matches	Number of documents with the specific concept
related_matches	Number of documents with the specific concept that are not exact matches Total Matches - Exact Matches = Related Matches

Table: themes

Field	Description
cluster_label	List of terms which represent a theme Example: If “Leaky Battery” is a dominant theme, then the cluster label would represent that cluster of concepts Example: term\|language Leaky\|en Batter\|en
docs	Number of documents within the theme cluster
id	Theme cluster identifier (like the subset ID) Example: Theme 0, Theme 1, Theme 2...
name	List of concepts which describes the details of a theme Example: Cluster label “Leaky battery” could also contain “pen battery” or “bad packaging” or “leaky pen”

Visualizations available in Luminoso’s Tableau template

Data Source

This table allows you to view the fields and tables exported from Luminoso.

Auto Themes

Themes detected from the project displayed in clearly identified and labeled bubbles, along with text examples of each theme.

Themes (2)

Same themes as from Auto Themes displayed as horizontal bars instead of bubbles so the viewer may identify the quantity of qualifying documents easily.

Drivers

Similar to the Drivers tool in Daylight, this tab is a scatter plot of terms and their average impact on the scores.

Example: The term "scent" is located at +4.8 average impact and 6k sum of doc count. This can be interpreted as "Documents that reference the scent have a score of 4.8 higher than those who do not reference scent."

Example 2: The term "berry" is located at -14.6 with a volume of about 11k sum of doc count. This can be interpreted as "Documents that reference the scent have a score of -14.6 lower than those who do not reference berry.”

Driver terms on the left sort the drivers by the calculated impact, which factors in both the intensity of the driver and the volume of the documents referencing that term.

Subset Key-Terms

A horizontal bar chart allows viewers to identify and compare key terms by subset.

Example: In the 65 year old age bracket, we see "haven't smoked a cigarette" as a key term, while in the 23 year old age bracket, we see "pods" and "refreshing" as key terms.

Clicking on key terms populates examples below.

SKT (Subset Key Term) Bubbles

A horizontal bar chart shows the volume of subsets

Example: Spot check to see which age demographics are most represented in the dataset

Drivers (2)

Functional duplicate of Drivers above with variable visualization options.

Example: Use the 'Marks' option of the left to adjust the size of the terms based on impact

Low VOCTerms

The trend of average score and document counts is on top, selected months allow viewers to view individual terms are associated with low scores are reviewed by volume and average score.

High VOCTerms

The trend of average score and document counts is on top, selected months allow viewers to view individual terms are associated with high scores are reviewed by volume and average score.

DocCount over Time (2)

Functional Duplicate of VOC Terms drills into the volume of data over time, along with the scores.

Tags w/verbatim

Tags with the highest average scores can be broken down by age, score, and votes. Examples of verbatims including those topics are listed below.

Topics over Time

View topic scores over time, and select individual topics to see examples.