Apply User EQ Data within Conversation Logic

 

Overview

The Soul Machines Human OS Platform features a patented Digital Brain that makes it possible to deliver human and machine collaboration. Soul Machines' Human OS, Autonomously Animates by combining the quality of hyper-realistic CGI with a fully autonomous, fully animated CGI or digital character.

Our Digital People use their Digital Brain to analyze the end user’s video feed and extract data about the user’s emotions (EQ data) and behavior in real-time. The data extracted includes:

  • User emotions

    • Positivity

    • Negativity

      • Neutrality lies at the mid-point of these two emotional states, and though it is not given explicitly as a signal, the absence of an EQ signal to indicate positivity or negativity can be used to imply neutrality.

    • Confusion

      • This signal is based largely on frowns and displays of facial asymmetry.

  • Face detection signal

    • Indicates if the end user's face is in the frame of their web camera.

  • “Is Attentive” signal

    • Is the end user visibly paying attention, i.e. is the user facing the camera

  • User talking signal

    • Indicates if the end user is speaking

      • Based on both the mouth movement and audio

      • Very accurate if the user is visible

Data Messaging

The extracted emotional and behavior data is available in both the “State” events and “ConversationResult” messages via the Soul Machines Web SDK or Orchestration Server messages. “State” event messages are generated any time there is a change in state, e.g. change in speech status, persona, or user emotion. This data is not useful for conversation routing and not covered in this document.

The “ConversationResult” message is generated whenever a response is received from an NLP service that is directly integrated into the Soul Machine Persona Servers (e.g. using Watson Assistant or Google Dialogflow credentials), rather than via an Orchestration Server. Below is an example ConversationResult message generated from a Dialogflow Response:

{ "body": { "input": { "text": "hello" }, "output": { "context": {}, "text": "Good day! What can I do for you today?" }, "personaId": "1", "provider": { "kind": "dialogflow", "meta": { "dialogflow": { "queryResult": { "action": "input.welcome", "allRequiredParamsPresent": true, "fulfillmentMessages": [ { "text": { "text": [ "Good day! What can I do for you today?" ] } } ], "fulfillmentText": "Good day! What can I do for you today?", "intent": { "displayName": "Default Welcome Intent", "name": "projects/xxxxxxxx/agent/intents/9xxxxxxx" }, "intentDetectionConfidence": 1, "languageCode": "en", "outputContexts": [ { "lifespanCount": 9999, "name": "projects/sentimenttestbcg-gpvy/agent/sessions/xxxxxx/contexts/soulmachines", "parameters": { "Current_Time": "8 04 in the morning", "PersonaTurn_IsAttentive": "", "PersonaTurn_IsTalking": "0.72388756275177002", "Persona_Turn_Confusion": "0.22530610859394073", "Persona_Turn_Negativity": "0.022835826501250267", "Persona_Turn_Positivity": "0.12887850403785706", "UserTurn_IsAttentive": "", "UserTurn_IsTalking": "0.27065098285675049", "User_Turn_Confusion": "0.27124810218811035", "User_Turn_Negativity": "0.1954876035451889", "User_Turn_Positivity": "0.010345778428018093" } } ], "parameters": {}, "queryText": "hello", "sentimentAnalysisResult": { "queryTextSentiment": { "magnitude": 0.2, "score": 0.2 } } }, "responseId": "xxxxxxxxxx" }, "metadata": { "displayName": "Default Welcome Intent", "name": "projects/xxxxx/agent/intents/9xxxxxxxx" } } }, "status": 0 }, "category": "scene", "kind": "event", "name": "conversationResult" }

In real-time, our custom machine learning solution produces this data per conversation turn (i.e. “user_turn_positivity”, “user_turn_negativity” and “user_turn_confusion”) and publishes them in the ConversationResult as values between 0 and 1, depending on the strength of the signal evaluated by the system within the last ~1 second of the turn.

Implementation

IBM Watson

The following user data is sent to Watson Assistant (WA) as context variables along with other data, such as user utterance:

  • UserTurn_IsAttentive

  • UserTurn_IsTalking

  • User_Turn_Confusion

  • User_Turn_Negativity

  • User_Turn_Positivity

When building dialogue nodes, they can be accessed using the standard IBM WA context data notation (e.g. $User_Turn_Positivity) for example:

 

Google Dialogflow

The following user data is sent to Dialogflow inside the “soulmachines” output context:

  • UserTurn_IsAttentive

  • UserTurn_IsTalking

  • User_Turn_Confusion

  • User_Turn_Negativity

  • User_Turn_Positivity

The data can be viewed via the “Raw integration log” accessed from the conversation “History” tab in Dialogflow. Below is an example of this log:

 

A custom fulfillment script must be written to integrate this data inside the conversation logic. Scripts can be created via the “Inline Editor” in the Dialogflow Fulfillment screen (in Node.JS) or via a custom-built webhook. Below is an example script that can run inside the “Inline Editor”.

const functions = require('firebase-functions'); const {WebhookClient} = require('dialogflow-fulfillment'); const {Card, Suggestion} = require('dialogflow-fulfillment'); process.env.DEBUG = 'dialogflow:debug'; // enables lib debugging statements exports.dialogflowFirebaseFulfillment = functions.https.onRequest((request, response) => { const agent = new WebhookClient({ request, response }); console.log('Dialogflow Request headers: ' + JSON.stringify(request.headers)); console.log('Dialogflow Request body: ' + JSON.stringify(request.body)); function userSmiling(agent) { // First get the Soul Machines context from the outputContexts object var smcontxt = agent.getContext('soulmachines'); //Now get the individual user turn EQ data points var turnPositivity = smcontext.parameters.User_Turn_Positivity; var turnNegativity = smcontext.parameters.User_Turn_Negativity; var turnConfusion = smcontext.parameters.User_Turn_Confusion; //Do something with this data if ((turnPositivity > turnNegativity) && (turnPositivity > turnConfusion) ){ agent.add(`Wow! You have a fantastic smile`); } } let intentMap = new Map(); intentMap.set('Smiling Test', userSmiling); agent.handleRequest(intentMap); });

Data Usage Guidelines

For lowest effort and highest impact, it is recommended to consider the following key places for initial implementation:

  • Conversation Bookends: the greeting and goodbye of a conversation are a great place to take the measure of the user’s emotional state so that their initial and final impression is one of emotional engagement.

  • Error Nodes: when the user has been misheard or has asked an out-of-scope question, this is likely to be frustrating, thus emotional engagement at these points in a conversation can help mitigate that frustration.

  • Critical Content Nodes: at nodes where important, complex, or otherwise particularly significant content is being delivered, checking for confusion can help confirm understanding.

Guiding Principals

  1. Preserve conversational structure across different recognised emotions

    Simplify the implementation of dialogue variations according to the user’s emotional state by preserving the structure of the conversation. Rather than driving the conversation in different directions, focus on varying the emotional tone of the language, and the informational content of the utterance. Exceptions can include the addition of a dialogue sub-routine triggered by an expression of confusion, but such digressions should be self-contained.

  2. Manage the risk of false positives in dialogue design

    As much as possible, dialogue variations triggered by EQ data should aim to be robust to misclassifications. Confirmation questions can be used to check that the detected signal was accurate, for example, “Oh, should I explain that differently” to confirm confusion. Hedging strategies can be used to create an ambiguous dialogue routine that can be understood as a response to a detected emotional state, but also as an initiative in its own right. For example, if negativity is detected, the Digital Person could say “I know, this can be a little frustrating.” If the user has really shown frustration, they would probably interpret this as an acknowledgment of that fact; if not, they would just interpret it as an aside, that creates some camaraderie.

  3. Think strategically about where certain signals are used in conversation

    A few, well-placed EQ responses can significantly improve the quality of conversation. Assess the content of the conversation to identify instances where emotional engagement will be most impactful, e.g. delivering information that is disappointing/frustrating for the user, explaining complex concepts, using industry jargon, etc. If the content delivered by the Digital Person is unlikely to elicit a particular response, or if there is no meaningful way to act on a detected signal, consider prioritising other areas of conversation.

Examples

User Positivity

When the user is positive, this can be an opportunity to show more personality or humor.

Recommendations:

  • More liberal use of humor or tone

  • More casual, friendly, even excited

User Negativity

If the user is negative towards the Digital Person, or the conversation in general, it is best to focus on what the user is trying to achieve, and adopt a more business-like tone. Negativity can currently encompass both being upset (sad) and being frustrated (angry), so for now, any dialogue choice should aim to be applicable to both those emotional states.

Recommendations:

  • Language should aim to be succinct, conciliatory, and forthright

  • Design utterances to deliver information with efficiency

  • Do not disengage from this mode until the user does (i.e. when a neutral or a positive state is detected)

  • Attempt to guide the user back to a positive state

User Confusion

If confusion has been confirmed, it is best for the Digital Person to admit fault, and perhaps apologize for the confusing delivery. Context plays a large role in how exactly we should respond to confusion, but in general, language should be simple and direct, and utterances should aim to focus on one thing at a time. Depending on the content of the utterance that elicited confusion, you can consider:

  • Breaking tasks into smaller steps

  • Explaining content in a different way (different words, using an example)

  • Offering definitions for terms or concepts the user is likely to be unfamiliar with

  • Asking what it was the user found confusing

Recommendations:

  • Admit fault, never assign blame to the user

  • Always begin with a calm, relaxed, understanding tone

  • Try not to condescend, instead, act like the Artificial Human and the user are in this together