This guide provides information on how to connect to a Natural Language Processing (NLP) platform via an Orchestration Server. NLP is a type of Artificial Intelligence (AI) that extracts meaning from human language to make decisions or responses based on available information.

Soul Machines Digital People leverage NLP platforms to:

Understand the user’s intention or inquiry by analyzing their utterance.
Select the proper pre-written response from a finite set of responses.

If you are using any of the supported NLP platforms, you can connect these services directly to your Digital Person from the Digital DNA Studio. However, if you are using another NLP platform or need a business logic layer between your Digital Person and your NLP, then Building an NLP Skill is recommended.

Alternatively, you can use an Orchestration Server. The Soul Machines Orchestration Layer is an additional layer of logic that is hosted on a separate server from the Soul Machines Cloud Platform, operated by the customer.

Soul Machines can provide you with the basic starting code template (In Node JS programming language) for the Orchestration Server. This template is completely customizable to suit any integration needs.

You can quickly get started with this sample code

Note that you are required to host the Orchestration Server on your own infrastructure, and its endpoints must be publicly accessible.

Audience

This document is intended for Technical personnel, e.g. Solution Architects, Developers, Testers, etc. who want to utilize and configure a separate Orchestration Server to meet their NLP requirements. Information in this document may also be useful to Business users who would like to gain some insight on how other types of NLP platforms can be supported with the use of an Orchestration Server.

Please contact your Soul Machines representative, if you require further assistance in using this guide.

Digital DNA Studio Configuration

To use an Orchestration Server with a Digital DNA Studio deployed Digital Person, add the Orchestration Skill to the Knowledge section in the Digital DNA Studio configuration screen.

Follow the instructions displayed under the Orchestration section to fill in the fields as required to establish the connection to your Orchestration Server.

When developing locally, it is possible to run an Orchestration Server locally for testing or debugging purposes. This requires the server URL (typically ‘localhost’) and your public IP to be specified. You can specify a whitelist with multiple IP address blocks in IPv4 or IPv6 formats. This makes it easier to share a project between multiple developers in different locations. For example, 92.0.2.1/24 or 2001 : db8: 3333 : 4444 : 5555 : 6666 : 7777 : 8888

In Production mode, developing locally is switched off and only a server URL is required (must be HTTPS or wss protocol for security).

Orchestration Server Messages

Overview

Soul Machines uses a bi-directional WebSocket connection to send a message back and forth between our servers and your Orchestration Server. Each message is JSON encoded and sent as a web socket ‘text’ message over HTTPS or WSS. Each JSON encoded message includes some standard fields for the kind of message communicated.

There are three kinds of messages: event, request, and response. In this guide, we only cover the messages relevant to connecting to the NLP platform.

Event

Each event includes the category (‘scene’), kind (‘event’), name, and body. Events are sent by the Digital Person to all connected servers and the connected client web browser (if any).

Request

Each request includes the category (‘scene’), kind (‘request’), name, transaction, and body. Requests are sent from the Server to the Scene. If the transaction is not null, then a response is sent for the request. The transaction is an integer that is unique across all requests and is included in the response. If no transaction is included, then no response is sent—the request is one way.

Response

Each response includes the category (‘scene’), kind (‘response’), name, transaction (from the matching request), status (the C SDK status integer where >= 0 is success, < 0 is failure), and body. Responses are sent from the Scene to the Server.

The following section describes the conversation messages that must be used by any Orchestration Server implementation.

Conversation Messages

The Speech-To-Text (STT) results are sent via the conversationRequest (see input.text field) message.

To instruct the Digital Person to speak you must send a conversationResponse (output.text).

The Orchestration Server can send conversationResponse messages without a prior request. This type of "spontaneous" conversation message can be used when the Orchestration Server wants to command the Digital Person to speak spontaneously (without any prior request).

The Orchestration Server implementation needs to ensure that all conversationRequest messages are matched with a corresponding conversationResponse message, the output text can be empty if necessary.

We do not queue messages from Orchestration servers. Any messages sent to the Digital Person from the Orchestration Server while it is already speaking will not be spoken. To prevent this, implement code that checks if the Digital Person's speechState is 'idle' before sending the next message.

Sample Conversation Messages:

// Example conversationRequest message (sent by the SDK):
{
	"category": "scene",
	"kind": "event",
	"name": "conversationRequest",
	"body": {
    	"personaId": 1,
    	"input" : {
        	"text" : "Can I apply for a credit card?"
    	},
    	"variables" : {
        	// conversation variables
    	}
}



// Example conversationResponse message (sent by the orch server):
{
    	"category": "scene",
    	"kind": "request",
    	"name": "conversationResponse",
    	"transaction": None,
    	"body": {
        	"personaId": 1,
        	"output" : {
        	  	"text": "Yes, I can help you apply for your credit card. Please tell me which type of card you are after, Visa or Mastercard?"
        	},
    	    	"variables" : {
        	    	// conversation variables
    	    	}
      }
}

The output.text is required in the conversationResponse message, while the input.text, variables, metadata, and fallback properties are optional.

Sample Conversation Messages for Soul Machines’ Digital DNA Studio Content Blocks:

This is an example of a conversationResponse message with a type options Content Block. In this case, conversationOptions is an arbitrary ID that the author can define, which needs to match for the “\@ShowCards” command and the variable definition.

The variable name needs to be prefixed with “public-”. Examples of other Content Block types can be found in the section /wiki/spaces/~5a4d740dfed274297effe5c2/pages/1526273789.

{
   "category": "scene",
   "kind": "request",
   "name": "conversationResponse",
   "body": {
       "personaId": 1,
       "output": {
           "text": "You can choose from one of the following options @showcards(conversationOptions)"
       },
       "variables": {
           "public-conversationOptions": {
               "type": "options",
               "data": {
                   "options": [
                       { "label": "option A" },
                       { "label": "option B" },
                       { "label": "option C" }
                   ]
               }
           }
       }
   }
}

Conversation History

You can access the recent conversation history as part of the conversation variables sent by the SDK’s conversationRequest. This feature is currently in beta and is available upon request.

When available, the conversation history includes up to the 10 most recent turns, with a character limit of 16,000 characters. If the 16,000-character limit is reached, the oldest entries in the history will be removed. Therefore, the Conversation_History variable is not intended for tracking the entire session transcript, and a separate record of the transcript is recommended for memory or other purposes.

The Conversation_History variable is designed to track the speaking state of the most recent conversation turns and provide insights into the conversation's processing within those turns.

Conversation turns are listed in an array, with the most recent turn appearing as the first entry. The following information is available from the Conversation_History variable:

'isCompleted': A Boolean flag specifying whether the turn has been completed.
'status': An Integer flag specifying the turn status. If 0, it indicates no errors; otherwise, an error message will be included in the turn variables.
'turnId': The Universal Unique Identifier (UUID) assigned to the turn. This is the same Turn_Id variable received as part of the conversation variables for the respective turn.
'input': The user's speech that initiated the turn.
'output': The Digital Person's response to the user's input. The response is divided into utterances, where each utterance typically represents one sentence or phrase. Each utterance is separately tracked for spoken text, interruptions, and speech finalization.
- 'text': The text string the Digital Person intends to speak.
- 'spokenText': The text string that was actually spoken by the time the conversationRequest message was prepared.
- 'isFinal': A Boolean flag indicating whether the Digital Person has finished the current utterance or is still speaking.
- 'isInterrupted': A Boolean flag that appears if the utterance was interrupted. In this case, 'isFinal' will also be marked as True.

Here is an example of the Conversation_History sent by the SDK, where the the Digital Person is interrupted in the second utterance during the most recent turn.

'Conversation_History': [{
        'input': {
            'text': 'Hello. How are you?'
        },
        'isCompleted': True,
        'output': [{
                'isFinal': True,
                'spokenText': 'Hello!',
                'text': 'Hello!'
            }, {
                'isFinal': True,
                isInterrupted': True,
                'spokenText': 'I\'m feeling "great," thank',
                'text': 'I\'m feeling "great," thank you!'
            }, {
                'isFinal': True,
                'isInterrupted': True,
                'spokenText': '',
                'text': 'How about you?'
            }
        ],
        'status': 0,
        'turnId': 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx'
    }, {
        'input': {
            'text': ''
        },
        'isCompleted': True,
        'output': [{
                'isFinal': True,
                'isInterrupted': True,
                'spokenText': 'This is the start of the conversation. How can',
                'text': 'This is the start of the conversation. How can I help you today?'
            }
        ],
        'status': 0,
        'turnId': 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx'
    }
],

Note: If an utterance is interrupted, all subsequent utterances for that turn will also be marked as interrupted, with their spokenText fields left empty.

Token Configuration

Note: Token configuration is not required for Orchestration Servers configured within DDNA Studio.

The Soul Machines servers require the public URL of your Orchestration Server. This must be supplied inside the JSON Web Token (JWT) from your token server by adding the sm-control field, e.g.:

sm-control = wss://example.com:8080. (where example.com:8080 is the address of your Orchestration Server)

If you are using the Soul Machines Token Server code, you can do this by setting the following environment variable (in .env file):

ORCHESTRATION_SERVER=projectname-orch.mycompany.com

Code Changes to Basic Orchestration Server

The basic Orchestration Code template provided by Soul Machines has limited functionality. It can:

Listen for WebSocket connection requests.
Open new WebSocket Connection to the Soul Machine Servers.
Receive and write to console all JSON messages from Soul Machines.

To complete the integration with your chosen NLP platform, as a minimum, you need to make the following enhancements to this codebase:

Process each message received from Soul Machines.
Extract Text from Final recognizeResults/conversationRequest messages.
Send a request to NLP API.
Extract text from NLP response.
Send startSpeaking/conversationResponse command to Soul Machines.

Linking a Server to your Project

Use an /wiki/spaces/~5a4d740dfed274297effe5c2/pages/1526276314if you're using unsupported NLP services, building a custom backend, implementing rich authentication, or requiring flexible data analytics. Orchestration servers are useful when you are using unsupported NLP services, designing a custom backend, implementing rich authentication, or need flexible data analytics.

To view the configuration fields, navigate to Configure Deployment section within Digital DNA Studio and expand the Orchestration Server option. Fill in the fields as required to establish the connection to your Orchestration Server.

Using a Custom UI with an Orchestration Server

If you are using a Custom UI specifically to support an Orchestration Server, you can create a duplicate of your project within Studio, set it to use the Default UI, and specify your Orchestration Server details. Once deployed, you can utilize the new Digital Person and deprecate the old one.

If you are using a Custom UI for other purposes, you should continue in the same manner. If you are doing any local development, you need to specify only your Public IP, while leaving the server URL blank.

Notes:

There is no limitation in using the same Orchestration Server for different projects, e,g, Orchestration Servers can be referenced from multiple Digital DNA Studio projects.
You need to use a subnet mask when developing locally. Typically a /32 subnet mask can be appended to your IP to satisfy this requirement.

Further information about implementing an Orchestration Server to work with Digital DNA Studio can be found in the /wiki/spaces/~5a4d740dfed274297effe5c2/pages/1526276314.

Linking an Orchestration Server