QuickLearn and ontologies

Any Machine Learning system, including those for Natural Language Understanding (NLU), requires some way of training the system to recognize its inputs. There are two traditional approaches: Deep learning, and the data it requires. 

A Deep Learning approach establishes a machine-learning process that takes input, and learns to produce output whose purpose is to train another step of machine learning. That step might produce output that trains yet another step, and so on, until the last step answers the question at hand. These intermediate steps can represent solving the pieces of a bigger problem.

A problem with deep learning for NLU is that the approach needs hundreds of millions of data points to learn a new domain, and can’t pick up emerging terminology without seeing sufficient examples of its usage. A popular form of machine learning that follows this approach is Ontologies.

Ontologies, such as SNOWMED and Medra, represent knowledge created through the effort of experts, instead of by machine learning. They can provide a lot of accurate, specific information in a domain area, but they require enormous amounts of work to create and update, due to the manually intensive method for maintaining a list of rules and specialized terms associated with a specific domain. As doctors, patients, and vendors introduce new products and terminologies, experts must add these to ontologies manually before tuning and retraining the system.

Luminoso’s approach follows a different path that is known as Common Sense AI. The Common Sense AI approach, used by Luminoso’s QuickLearn technology, requires neither massive amounts of data nor manual updates by experts.

QuickLearn technology uses word embeddings, which understands each word as a vector. Vectors in similar directions represent words with similar meanings. The system then understands human language by creating a vector’s list of numbers to represent each word. Unlike vectors in 3-D space, each word embedding may represent hundreds of dimensions to capture its meaning and nuances. Compared to other machine learning methods, vectors are efficient for computers to mathematically understand.

QuickLearn creates a semantic space, which is a table of word embeddings representing its understanding of words. Luminoso then uses a deep learning technique known as transfer learning. Transfer learning uses what an existing Machine Learning system has learned to solve a new task, with new data. Effective transfer learning allows you to get started solving a problem with much less data, because the system doesn’t have to learn everything from scratch. In this case, the problem is understanding your domain text — for example, clinical trial questionnaires, medical inquiries, or CRM notes from MSLs. Luminoso’s QuickLearn provides transfer learning that learns about your domain-specific terminology from a database of general knowledge known as a background space.

A background space is a space of word embeddings representing what words mean in general, not in specific domains. Background spaces represent things that are “common sense knowledge” and things that are the inherent definitions of words. Luminoso uses a background space called ConceptNet. ConceptNet represents a common sense knowledge of how the world works: over 35 million relationships between concepts mathematically represented in a general domain model. When presented with domain-specific text, such as medical terms, it can apply its general domain knowledge to learn specific terms immediately from context, without manual intervention.

In summary, QuickLearn technology takes in a background space (trained on ConceptNet and a large amount of freely available text), and applies transfer learning to the domain text you want to understand to create a new space of word embeddings tuned for the domain. It uses context to understand specific meanings of in-domain words and derives general meanings of common words from the background space. Then, analysts can use Luminoso products to explore the new semantic space, see the relevance of and relationships between concepts, spot emerging issues and uncover insights.