Real-time Gesturing

Every Digital Person can see you, hear you, respond to your emotions, and has an inner emotional state. This is their base behavior. With Real-Time Gesturing, your Digital Person analyzes what it’s saying via Natural Language Processing, and adds emotionally appropriate gesturing and behavior to their speech in real-time.

Real-Time Gesturing is a powerful tool for conversation writers and designers. The words they write bring the digital person’s behavior to life. For example, if a Digital Person’s conversation is sad in tone, their behavior will autonomously express sadness and concern. If they are talking about something surprising and delightful, their behavior will express surprise and happiness.

Components of Real-time Gesturing

Mood and inner state

The digital person will always return to their base mood, but their inner state is affected from moment to moment by the content of what the digital person is saying themselves, and the signals they are picking up from the user and the environment.

Emotional gestures

Emotional gestures are the facial expressions the Digital Person performs in accordance with the emotional content of the text they’re speaking. This may include smiles, frowns, concerned expressions and head tilts, and so on. These are associated with the emotional tone of the text being delivered.

Symbolic gestures

Symbolic gestures are gestures that relate closely in meaning to some word or expression being spoken by the Digital Person. For example, a gesture in which the Person opens their arms out wide when they’re speaking a phrase such as “everybody” or “all of them.” These may vary in the degree to which the gesture and the phrase are semantically tied in meaning-- some might be very tightly linked (think of pointing towards yourself when speaking the words “me” or “I”) while others may be much more loosely associated (e.g. a ‘smile shrug’ gesture with words like ‘wonderful’ or ‘warmth’).

Beat gestures

Beat gestures are gestures that convey neither meaning nor emotion, but rather are the natural gestures produced in alignment with the rhythm or cadence of speech. Think of someone delivering a short speech and holding their arm slightly out, “beating” on each emphasized word. Or in more informal, day-to-day communication, the way one’s arms or shoulders might just move rhythmically in sync with what we’re saying, without having any real tie to the meaning or emotion of what we’re communicating.

Head & Lip Sync

Lip sync refers to the animations that drive the movement of the Digital Person’s mouth, lips, and related facial muscles when speaking. This is aligned with the text-to-speech audio and phonemes as realistically as possible. Head sync refers to the movement of the Digital Person’s head in alignment with the audio to create realistic motion through the neck and head.

Empathetic Reaction to the User

The digital person also analyses the emotional content of the words the user is saying, and the expressions on their face, and produces a facial expression in reaction. Each behavior style has its own, behavior-appropriate reactive style. Please note: Reacting to User Speech is only available in Human OS 2.3+

Reacting to On Screen Content

When onscreen content is used with Digital People running Human OS 2.2+, the Digital Person will direct their gaze towards it, draw the user’s attention, and gesture towards it with their hand and arm it if the frame and screen layout allow.

For more information about this feature visit the Content Awareness section.

Back-channelling behavior

When listening to an end user speak, the digital people display a range of nodding behaviors to acknowledge they have heard.
While accessing a reply the digital person then displays a range of thinking behaviors before responding.
Both are available in Human OS 2.6+.

This back-channeling behaviour is designed to help the end user have confidence they have been heard and will be answered. If the person speaking is not close to the microphone however, neither nodding or thinking will be triggered. The digital person will still hear and respond to speech.

Nodding and thinking example

Iconic gestures

Iconic gestures are gestures that closely relate to the semantic content of segments of speech. In general they are not linked to a specific word or expression, but to the meaning that is conveyed given the context of the speech. They visually represent a concept or object they are referring to, such as a Heart Sign representing passion or dedication, or a Peace Sign to illustrate taking a selfie during your vacation. This option will allow the digital person to automatically perform these more expressive gestures based on the language and the intention of the digital person. The feature is currently in beta and can be enabled using the toggle under Other behavioral settings.

You can temporarily override Real-Time Gesturing for specific sentences or words spoken by your Digital Person via the Behavior Tags feature.

You can use the behavior tags to override the autonomous behavior with something more appropriate to your use-case. For example, you might want to add brand and product related words to Smiling Gesture (Word-Based).
You can also use Neutral Long (Sentence-Based) and Neutral Short (Word-Based) to define sentences and words that will temporarily deactivate the real-time gesturing entirely.

Soul Machines Knowledge Base