Real-time Gesturing

Every Digital Person can see you, hear you, respond to your emotions, and has an inner emotional state. This is their base behavior. With Real-Time Gesturing enabled your Digital Person analyzes what it’s saying via Natural Language Processing, and adds emotionally appropriate gesturing and behavior to their speech in real-time.

Real-Time Gesturing is a powerful tool for conversation writers and designers. The words they write bring the digital person’s behavior to life. For example, if a Digital Person’s conversation is sad in tone, their behavior will autonomously express sadness and concern. If they are talking about something surprising and delightful, their behavior will express surprise and happiness.

Real-Time Gesturing is a prerequisite for the Behavior Style feature.

Components of Real-time Gesturing

Mood and inner state

The digital person will always return to their base mood, but with Real-Time Gesturing enabled is affected from moment to moment by the content of what the digital person is saying themselves, and the signals they are picking up from the user and the environment.

Emotional gestures

Emotional gestures are the facial expressions the Digital Person performs in accordance with the emotional content of the text they’re speaking. This may include smiles, frowns, concerned expressions and head tilts, and so on. These are associated with the emotional tone of the text being delivered.

Symbolic gestures

Symbolic gestures are gestures that relate closely in meaning to some word or phrase being spoken by the Digital Person. For example, a gesture in which the Person opens their arms out wide when they’re speaking a phrase such as “everybody” or “all of them.” These may vary in the degree to which the gesture and the phrase are semantically tied in meaning-- some might be very tightly linked (think a ‘thumbs up’ gesture in English with the phrase ‘nice work!’ or ‘good job!’) while others may be much more loosely associated (e.g. a ‘smile shrug’ gesture with words like ‘wonderful’ or ‘warmth’).

Beat gestures

Beat gestures are gestures that convey neither meaning nor emotion, but rather are the natural gestures produced in alignment with the rhythm or cadence of speech. Think of someone delivering a short speech and holding their arm slightly out, “beating” on each emphasized word. Or in more informal, day-to-day communication, the way one’s arms or shoulders might just move rhythmically in sync with what we’re saying, without having any real tie to the meaning or emotion of what we’re communicating.

Head & Lip Sync

Lip sync refers to the animations that drive the movement of the Digital Person’s mouth, lips, and related facial muscles when speaking. This is aligned with the text-to-speech audio and phonemes as realistically as possible. Head sync refers to the movement of the Digital Person’s head in alignment with the audio to create realistic motion through the neck and head.

Empathetic Reaction to the User

The digital person also analyses the emotional content of the words the user is saying, and the expressions on their face, and produces a facial expression in reaction. Each behavior style has its own, behavior-appropriate reactive style. Please note: Reacting to User Speech is only available in Human OS 2.3+

Reacting to On Screen Content

When onscreen content is used with Digital People running Human OS 2.2+, the Digital Person will direct their gaze towards it, draw the user’s attention, and gesture towards it with their hand and arm it if the frame and screen layout allow.

The Real-Time Gesturing feature is available in a Soul Machines Digital Person running Human OS 2.0+ for English, Human OS 2.3+ for Japanese, and Human OS 2.4+ for Korean.
Human OS 2-2.2 includes shoulder gesturing, and Human OS 2.2+ includes full hand and arm gestures.
You can temporarily override Real-Time Gesturing for specific sentences or words spoken by your Digital Person via the Behavior Tags feature.
- You can use the behavior tags to override the autonomous behavior with something more appropriate to your use-case. For example, you might want to add brand and product related words to Smiling Gesture (Word-Based).
- You can also use Neutral Long (Sentence-Based) and Neutral Short (Word-Based) to define sentences and words that will temporarily deactivate the real-time gesturing entirely.