Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Importance: On a scale from 0 to 1, where 0 is something that has no correlation with customer scores and 1 is the most important driver in the project, this driver’s importance is 0.45.

A deeper dive into the science

What’s the scale of the "Impact" score?

The impact score is on scale with the customer score, which might be measured in "stars" or "percent" or something else. You might see an impact of -0.4 on a scale from 1 to 5 stars, meaning that the documents matching the concept get reviews that are an average of 0.4 stars lower. If the same reviews were on a scale from 0 to 100 instead of 1 to 5, you would see an impact of -10. 

The range of impact is always with respect to the customer’s score range for that set of documents.

What’s the scale of the "Confidence" value?

The confidence value is the t value, which comes from Student's t-test. This is a two-sided t-test. The possible situations we're distinguishing are:

...

So a score driver with t = 0.7 could be considered spurious, but some interesting  drivers start to appear around there.

What’s the scale of "Importance"?

A scale of importance is completely arbitrary based on a set of documents, as a result it is a scale of 0 to 1 for each calculation of drivers. The numbers that go into calculating importance vary according to the customer scores, the number of documents overall, the number of conceptual matches for each concept, and the Luminoso relevance scale (which is also arbitrary). 

The importance value is for internal purposes only, because it cannot be used to compare scores from one run of drivers to another, and is used only to sort a unique set of drivers.

FAQ

What happens when you change the set of documents?

When a user filters a set of documents and creates a new project (effectively removing concepts and documents), or adds new documents to an existing set of documents, the list of concepts and documents changes in the project. When the project is rebuilt and the drivers are recalculated, these changes will cause:

  • Different conceptual matches to be found in the project

  • The “Importance” scale to change, because a different concept may now be the most important drive

Can we calculate score drivers separately for each subset?

Yes. Driver subset analysis is can be done directly in the UI by using the metadata filters provided in the filter panel on the left. Driver impact scores will automatically be recalculated based on subset selection.

Are Drivers a causal model?

No. We can't say that the 0.3 star increase is due to the document mentioning "free wi-fi". If we had a causal model, we would have to subtract out the values of all other drivers in that document so we could assign credit to the terms that were really the "cause". But the causation doesn't work that way anyway. The cause of a high review score isn't that the reviewer mentioned "fast check-in". Causality flows in the opposite direction. Rather, the cause is that the customer liked the establishment they're reviewing. Providing fast check-in may have contributed to that. The customer's positive experience caused them to write a positive review, and the need to describe what was positive about their experience caused them to mention "fast check-in". 

...