ordi-ai-agent-sdk
v0.1.8
Published
A SDK for building AI agents that can interact with users in real time, providing both text and audio responses. This SDK connects to the ORDI AI Gateway, which serves as a bridge to powerful language models and tools.
Readme
🤖 ORDI AI SDK
A SDK for building AI agents that can interact with users in real time, providing both text and audio responses. This SDK connects to the ORDI AI Gateway, which serves as a bridge to powerful language models and tools.
Dependencies
| Dependency | Version | |------------|---------| | ws | ^8.0.0 |
Install
npm
npm install ordi-ai-agent-sdk@latest
yarn
yarn add ordi-ai-agent-sdk@latest
Example
import { AgentClient } from "@ordi/ai-waiter";
import { GoogleGenAI } from "@google/genai";
import wav from "wav";
import fs from "fs";
// Initialize Gemini TTS
const ai = new GoogleGenAI({
apiKey: process.env.GEMINI_API_KEY
});
async function saveWaveFile(
filename,
pcmData,
channels = 1,
rate = 24000,
sampleWidth = 2
) {
return new Promise((resolve, reject) => {
const writer = new wav.FileWriter(filename, {
channels,
sampleRate: rate,
bitDepth: sampleWidth * 8,
});
writer.on("finish", resolve);
writer.on("error", reject);
writer.write(pcmData);
writer.end();
});
}
// Convert text to speech using Gemini
async function generateSpeech(text) {
const response = await ai.models.generateContent({
model: "gemini-2.5-flash-preview-tts",
contents: [{ parts: [{ text }] }],
config: {
responseModalities: ["AUDIO"],
speechConfig: {
voiceConfig: {
prebuiltVoiceConfig: { voiceName: "Kore" },
},
},
},
});
const data =
response.candidates?.[0]?.content?.parts?.[0]?.inlineData?.data;
const audioBuffer = Buffer.from(data, "base64");
const filename = `tts-${Date.now()}.wav`;
await saveWaveFile(filename, audioBuffer);
console.log("🔊 Audio saved:", filename);
}
// Initialize ORDI Agent
const agent = new AgentClient(
{
gatewayUrl: "ws://localhost:8000/ws/chat",
sessionId: "session_tts_demo"
},
{
name: "Jack Daniels",
storeName: "ORDI Restaurant",
tableNumber: "12"
}
);
// Listen for spoken output from the agent
agent.onAudio = async (text) => {
console.log("🔊 Agent audio text:", text);
// Convert to speech
await generateSpeech(text);
};
async function main() {
await agent.connect();
const stream = agent.streamMessage("Hello! Can you recommend a dish?");
for await (const chunk of stream) {
if (chunk?.delta) {
process.stdout.write(chunk.delta);
}
}
console.log("\n");
}
main();Configuration
| Field | Type | Description |
| ------------ | ------ | ----------------------------------- |
| gatewayUrl | string | WebSocket endpoint for the AI Agent |
| sessionId | string | Optional session identifier |
{
gatewayUrl: "ws://localhost:8000/ws/chat",
sessionId: "session_123"
}User configuration
User metadata helps the AI personalize responses.
| Field | Type | Description |
| --------------- | ------ | ------------------------ |
| name | string | User's name |
| storeName | string | Restaurant or store name |
| tableNumber | string | Current table |
| email | string | User email |
| contactNumber | string | Phone number |
{
name: "Jack Daniels",
storeName: "ORDI Restaurant",
tableNumber: "12",
email: "[email protected]",
contactNumber: "0123456789"
}Sending a message
Messages are streamed using an async iterator.
const stream = agent.streamMessage("Show me the menu");
for await (const chunk of stream) {
if (chunk?.delta) {
process.stdout.write(chunk.delta);
}
}Output (Markdown content is printed as it arrives, while audio responses are handled separately):
*I have received your message and am checking on that for you now.*
Hello, **Jack Daniels**! I'm sorry to hear you're hungry, but you've come to the right place.
Welcome to **ORDI Restaurant**. Since you are at **Table 12**, I can help you find something delicious right away. Would you like to see our full menu, or are you in the mood for something specific like Vietnamese food or perhaps a pizza?Handling Audio Responses
Audio responses are delivered via the onAudio callback.
agent.onAudio = (audioText) => {
console.log("Audio:", audioText);
};You can forward this text to any Text-To-Speech (TTS) engine.
Implement Audio Model (Gemini)
npm install @google/genaior
yarn add @google/genaiSample:
async function generateSpeech(text) {
const response = await ai.models.generateContent({
model: "gemini-2.5-flash-preview-tts",
contents: [{ parts: [{ text }] }],
config: {
responseModalities: ["AUDIO"],
speechConfig: {
voiceConfig: {
prebuiltVoiceConfig: { voiceName: "Kore" },
},
},
},
});
const data =
response.candidates?.[0]?.content?.parts?.[0]?.inlineData?.data;
const audioBuffer = Buffer.from(data, "base64");
const filename = `tts-${Date.now()}.wav`;
await saveWaveFile(filename, audioBuffer);
console.log("🔊 Audio saved:", filename);
}
// Listen for spoken output from the agent
agent.onAudio = async (text) => {
// Convert to speech
await generateSpeech(text);
};Streaming Architecture
The ORDI AI SDK streams events from the server in real time.
Server events
| Status | Description |
| -------------- | ----------------------------- |
| text_output | Visual UI text stream |
| audio_output | Spoken response stream |
| tool_ack | Tool execution acknowledgment |
| done | End of response |
text_output → UI stream
audio_output → onAudio callback
tool_ack → status message (for quick response as acknownledgment of user input)
done → stream endsSDK Architecture
Application
│
▼
AgentClient (SDK)
│
▼
WebSocketTransport
│
▼
ORDI AI Gateway
│
▼
LangChain AI Agent