Real-time Gesturing

Every Digital Person can see you, hear you, respond to your emotions, and has an inner emotional state. This is their base behavior. With Real-Time Gesturing enabled your Digital Person analyzes what it’s saying via Natural Language Processing, and adds emotionally appropriate gesturing and behavior to their speech in real-time.

Real-Time Gesturing is a powerful tool for conversation writers and designers. The words they write bring the digital person’s behavior to life. For example, if a Digital Person’s conversation is sad in tone, their behavior will autonomously express sadness and concern. If they are talking about something surprising and delightful, their behavior will express surprise and happiness.

Real-Time Gesturing is a prerequisite for the Behavior Style feature.

Digital People with Real-Time Gesturing enabled can also direct the attention of users to onscreen content, with glances, hand gestures, and pointing. For more information about this feature visit the Content Awareness section.

Components of Real-time Gesturing

Mood and inner state

The digital person will always return to their base mood, but with Real-Time Gesturing enabled is affected from moment to moment by the content of what the digital person is saying themselves, and the signals they are picking up from the user and the environment.

Emotional gestures

Emotional gestures are the facial expressions the Digital Person performs in accordance with the emotional content of the text they’re speaking. This may include smiles, frowns, concerned expressions and head tilts, and so on. These are associated with the emotional tone of the text being delivered.

Symbolic gestures

Symbolic gestures are gestures that relate closely in meaning to some word or phrase being spoken by the Digital Person. For example, a gesture in which the Person opens their arms out wide when they’re speaking a phrase such as “everybody” or “all of them.” These may vary in the degree to which the gesture and the phrase are semantically tied in meaning-- some might be very tightly linked (think a ‘thumbs up’ gesture in English with the phrase ‘nice work!’ or ‘good job!’) while others may be much more loosely associated (e.g. a ‘smile shrug’ gesture with words like ‘wonderful’ or ‘warmth’).

Beat gestures

Beat gestures are gestures that convey neither meaning nor emotion, but rather are the natural gestures produced in alignment with the rhythm or cadence of speech. Think of someone delivering a short speech and holding their arm slightly out, “beating” on each emphasized word. Or in more informal, day-to-day communication, the way one’s arms or shoulders might just move rhythmically in sync with what we’re saying, without having any real tie to the meaning or emotion of what we’re communicating.

Head & Lip Sync

Lip sync refers to the animations that drive the movement of the Digital Person’s mouth, lips, and related facial muscles when speaking. This is aligned with the text-to-speech audio and phonemes as realistically as possible. Head sync refers to the movement of the Digital Person’s head in alignment with the audio to create realistic motion through the neck and head.

Empathetic Reaction to the User

The digital person also analyses the emotional content of the words the user is saying, and the expressions on their face, and produces a facial expression in reaction. Each behavior style has its own, behavior-appropriate reactive style. Please note: Reacting to User Speech is only available in Human OS 2.3+

Reacting to On Screen Content

When onscreen content is used with Digital People running Human OS 2.2+, the Digital Person will direct their gaze towards it, draw the user’s attention, and gesture towards it with their hand and arm it if the frame and screen layout allow.

Back-channelling behavior

When listening to an end user speak, the digital people display a range of nodding behaviors to acknowledge they have heard. This is available in Human OS 2.2+.
While accessing a reply the digital person then displays a range of thinking behaviors before responding. This is available in Human OS 2.3+.

This back-channeling behaviour is designed to help the end user have confidence they have been heard and will be answered. If the person speaking is not close to the microphone however, neither nodding or thinking will be triggered. The digital person will still hear and respond to speech.

Thinking.mov

The Real-Time Gesturing feature is available in a Soul Machines Digital Person running Human OS 2.0+ for English, Human OS 2.3+ for Japanese, and Human OS 2.4+ for Korean.
Human OS 2-2.2 includes shoulder gesturing, and Human OS 2.2+ includes full hand and arm gestures.
You can temporarily override Real-Time Gesturing for specific sentences or words spoken by your Digital Person via the Behavior Tags feature.
- You can use the behavior tags to override the autonomous behavior with something more appropriate to your use-case. For example, you might want to add brand and product related words to Smiling Gesture (Word-Based).
- You can also use Neutral Long (Sentence-Based) and Neutral Short (Word-Based) to define sentences and words that will temporarily deactivate the real-time gesturing entirely.