@simfinity/constellation-client
v1.2.1
Published
```bash npm install @simfinity/constellation-client # or yarn add @simfinity/constellation-client ```
Readme
@simfinity/constellation-client
Installation
npm install @simfinity/constellation-client
# or
yarn add @simfinity/constellation-clientPurpose & Usage
This package is a code wrapper to integrate the Simfinity constellation server. The constellation server is a proxy, managing streaming sessions with third party LLMs. This package provides the programmatic functions covering the complete lifecycle of a streaming session:
- Open/start session
- Callbacks to continuously send and receive streamed data over a persistent connection
- Close/end session
Server implementation insight
The Constellation server is chat-room & session manager: upon receiving a session-start request, it creates a persistent chat-room, initiates the persistent connection with the LLM and configures it accordingly (e.g. system instruction, temperature, audio, transcript subscription...). Clients may lose connection with Constellation, but the chat-room will remain opened on server side, this allows the client to reconnect and resume the session. Clients MUST notify the server that a session has ended, so that Constellation can release allocated resources.
Example
Key steps in pseudo-code:
const client = new WebClient({
sessionEndpoint: "https://simfinity.constellation.com",
streamingEndpoint: "wss://simfinity.constellation.com:30003",
key: "my-key",
});
try {
/* ... */
// Start a chat session
const params: SessionStartParameters = {
llmProvider: "openai",
voiceEnabled: true,
voiceName: "ballad",
behaviour: {
temperature: 0.9,
instructions: "Just have a nice and casual conversation.",
}
}
await client.startSession(params);
await client.joinSession(true, {
onStreamClosed: (reason: string) => {
console.log("Stream connection lost");
},
onAudioResponseChunk: (audioChunk: string) => {
audioPlayer.enqueue(audioChunk);
},
onAudioResponseEnd: () => {
console.log("The model is done talking");
}
});
/* ... */
client.sendAudioChunk("{PCM16 Base64-encoded data}");
client.commitAudioChunks();
/* ... */
}
catch {
}
finally {
await endSession();
}Types
Client Configuration
Configuration required to initiate a connection with the server: In the client, values would typically be stored in secret stores & environment variables
export interface WebClientConfig {
sessionEndpoint: string;
streamingEndpoint: string;
key: string;
}Model configuration
Immutable options defined at session creation time: which LLM provider to use, audio settings... The behavioural settings are potentially mutable depending on the LLM provider chosen, but it is advised to at least provide them once at session creation time.
export interface SessionStartParameters {
llmProvider: LlmName;
voiceEnabled: boolean;
interruptions: boolean;
voiceName?: string;
behaviour?: SessionConfig;
agents?: AgentConfig[];
actions?: ClientAction[];
tools?: string[];
}Model behaviour configuration
Will alter how the model reacts. Omitted properties will remain unchanged in the model. It is theoretically possible to change these settings both at session-starting time and mid-session, however some LLMs may not support the mid-session updates, thus it is advised to define them at session start.
export interface SessionConfig {
temperature?: number;
instructions?: string;
maxResponseToken?: number;
}Event hooks
Callback functions to catch all the propagated server events. Except for the onStreamClosed event, assigning hooks is optional: non-observed events will be silently ignored & lost. For more details about the list of events, when they are fired and how to integrate them, please refer to in-code comments.
// See in-code interface definition
export interface EventHandlers {
// ...
}Audio
- The server expects exclusively base64 encoded PCM, 16k hertz audio data & sends responses in base64 encoded PCM, 24k hertz.
- The server DOES NOT implement VAD (voice activation detection)
- WARNING: Client must implement voice detection and explicit commits
- Suggested high level approach:
- 500ms ring buffer continuously filled with audio input
- Noise detection with minimum threshold
- Confirm voice is detected with consistent sound for ~250ms
- Start streaming audio data, beginning from 250ms in the past in the ring buffer
Interruptions:
- Audio responses can be interrupted when the session was configured to allow it
- Interruptions occur at the moment a new audio input commit is received by the server
- Interruptions will stop the model's current response generation if there is one ongoing
- Interruptions do no take network and latency into account, client implementation should consider:
- A response may have already been fully generated and transmitted by the time the interrupting commit occurs
- Yet the response audio may still be buffering and/or playing on client side: audio playback is typically much slower
- When interruptions are enabled, client audio buffer should be cleared on every new audio commit
Text & Transcript
- The transcript inputs/responses callbacks carry both the text exchanges (in a text-only session) and transcription of audio exchanges
- In an audio session, text and audio inputs will trigger:
- a mirrored transcript text through onTranscriptInput
- an audio response through onAudioResponseChunk
- a text transcript of the audio response through onTranscriptResponse
- onTranscriptInputPart and onTranscriptResponsePart are fired for each new piece of partial text available
- onTranscriptInputPart and onTranscriptResponsePart are typically NOT fired for the last piece: rely on onTranscriptInput and onTranscriptResponse for the final, complete transcript
- In a text-only session, a text input will trigger:
- a mirrored transcript text through onTranscriptInput
- a text response through the onTranscriptResponse callback
- onTranscriptInputPart is expected to fire only once as the input is immediately received and echoed
- onTranscriptResponsePart is fired as soon as a new piece of partial text from the response is available
- onTranscriptResponsePart is not fired for the last piece, rely on onTranscriptResponse for the final, complete response transcript
Agents
The constellation server allows the usage of background agents within a session. An agent is a parallel LLM session/connection, that observes the main session conversation and provides feedback according to its instructions. Typically, given the scenario context (or main conversation model instructions summary), a goal and a formatted response structure, an agent will read the main conversation in real time and produce a feedback regularly (once every X exchange - input+response - according to its "batching" configuration). The agent feedback response can be aimed at the client, to create dynamic feedback UIs, or at the main conversation model: in this case the agent "whispers" prompts to the model, influencing and driving its behaviour in the background.
// See in-code interface definition for detailed agent configuration options
export interface AgentConfig {
// ...
}Client actions
Client actions are client-defined custom functions, they allow client implementation to turn prompts into named events fired back at the client. Typically, they are used to allow immediate UI effects via model prompt. Client actions must be carefully named and described (see the in-code comments of the ClientAction type definition): They must be interpreted and understood by the model in order for it to know how and when to use them appropriately.
Tools
At session creation time, the client can provide a list of tool names. Query REST endpoint '{http constellation url}/suppored_tools' to retrieve the list of supported tools. Tools thus included will be exposed to the model as available functions for it to call, when appropriate.
