storyframe
v1.1.0
Published
StoryFrame is a modern, TypeScript-based framework for building real-time AI chat applications with streaming responses, tool usage, and audio capabilities.
Downloads
6
Readme
🤖 StoryFrame: Real-time AI Chat Framework
StoryFrame is a modern, TypeScript-based framework for building real-time AI chat applications with streaming responses, tool usage, and audio capabilities.
🌟 Features
Core Features
- ✨ Real-time WebSocket communication
- 🔄 Streaming responses from LLMs
- 🛠️ Extensible tool system
- 💾 Persistent memory with Supabase
- 🎯 Session management
- 🔊 Text-to-Speech support
- 🤖 Model-agnostic design (currently using OpenAI)
Advanced Capabilities
- 🔄 Multi-turn conversations
- 🧠 Conversation memory and context management
- 🎭 Customizable system prompts
- 🔧 Tool-based actions
- 🎵 Audio streaming with Fish Audio TTS
- 🔍 Auto-cleanup of inactive sessions
🏗️ Project Structure
demo-streaming/
├── storyframe/ # Core Framework
│ ├── audio/ # TTS and audio handling
│ ├── chains/ # Conversation management
│ ├── core/ # Core agent implementation
│ ├── memory/ # Memory store implementations
│ ├── tools/ # Tool definitions and router
│ ├── types/ # TypeScript type definitions
│ └── utils/ # Utility functions
│
├── demo-backend/ # Example Backend Implementation
│ └── src/
│ └── server.ts # WebSocket server setup
│
└── demo-frontend/ # Example Frontend Implementation
└── src/
└── components/ # React components🚀 Getting Started
Prerequisites
- Node.js 18+
- pnpm (recommended) or npm
- OpenAI API key
- Fish Audio API key (optional, for TTS)
- Supabase account (optional, for persistent storage)
Setup
- Clone the repository:
git clone <repository-url>
cd demo-streaming- Install dependencies:
pnpm install- Create a
.envfile:
OPENAI_API_KEY=your_openai_api_key
FISH_AUDIO_API_KEY=your_fish_audio_api_key # Optional
SUPABASE_URL=your_supabase_url # Optional
SUPABASE_KEY=your_supabase_key # Optional- Start the development server:
# Start the backend
cd demo-backend
pnpm dev
# Start the frontend (in another terminal)
cd demo-frontend
pnpm dev💡 Usage
Using the Agent Directly
If you don't need WebSocket communication or session management, you can use the agent directly:
import { runAgent, runAgentWithOptions } from './core/agent';
import { ToolRouter } from './tools/toolRouter';
import { MemoryStore } from './memory/memoryStore';
// Setup tools
const toolRouter = new ToolRouter();
toolRouter.register({
name: 'customTool',
description: 'Tool description',
execute: async (args) => {
// Tool implementation
return result;
}
});
// Option 1: Simple usage with runAgent
const input = [
{ role: 'system', content: 'You are a helpful assistant.' },
{ role: 'user', content: 'Hello!' }
];
const { fullTextStream, plainTextStream } = await runAgent(
input,
toolRouter,
[{ name: 'custom', content: 'Additional system prompt' }]
);
// Handle the streams
const reader = fullTextStream.getReader();
while (true) {
const { value, done } = await reader.read();
if (done) break;
console.log(value); // Process each chunk
}
// Option 2: Advanced usage with runAgentWithOptions
const memoryStore = new MemoryStore();
const response = await runAgentWithOptions({
sessionId: 'unique-id',
responseId: 'response-id',
prompt: 'Hello!',
memoryStore,
toolRouter,
saveToMemory: true,
systemPrompts: [
{
name: 'custom',
content: 'Your custom system prompt'
}
]
});
// Handle streams similarly
for await (const chunk of response.fullTextStream) {
console.log(chunk);
}The agent provides two main functions:
runAgent: Basic function for one-off interactionsrunAgentWithOptions: Advanced function with memory and configuration options
Key differences from using ConversationManager:
- No WebSocket handling
- Manual stream processing
- Direct control over conversation flow
- No automatic session management
- Manual memory handling
Using ConversationManager (WebSocket Support)
import { ConversationManager } from './chains/conversationManager';
const manager = new ConversationManager();
// Start a conversation
await manager.startConversation({
sessionId: 'unique-id',
memoryStore,
toolRouter,
saveToMemory: true
});
// Process messages
await manager.processMessage(sessionId, userMessage, {
systemPrompts: [
{
name: 'custom',
content: 'Your custom system prompt'
}
]
});Adding Custom Tools
import { ToolRouter } from './tools/toolRouter';
const toolRouter = new ToolRouter();
toolRouter.register({
name: 'customTool',
description: 'Tool description',
execute: async (args) => {
// Tool implementation
return result;
}
});🔧 Configuration
Memory Store Options
- In-memory store (default)
- Supabase store (persistent)
const memoryStore = new SupabaseMemoryStore(
process.env.SUPABASE_URL,
process.env.SUPABASE_KEY
);🔌 WebSocket Events Documentation
Multimodal WebSocket Server (Port 8081)
The multimodal WebSocket server handles various types of content including text, images, audio, video, and files. Here's a detailed breakdown of the events and message formats:
Client -> Server Messages
- Text-only Message
// Message Format
{
sessionId?: string;
contents: [{
type: 'text',
content: string
}],
metadata?: Record<string, any>
}
// Example: Simple text message
{
"contents": [{
"type": "text",
"content": "What is the capital of France?"
}]
}
// Example: Text message with metadata
{
"sessionId": "user123",
"contents": [{
"type": "text",
"content": "Analyze this conversation"
}],
"metadata": {
"language": "en",
"timezone": "UTC-5"
}
}- File Upload Initialization
// Message Format
{
sessionId?: string,
contents: [{
type: 'image' | 'audio' | 'video' | 'file',
content: string,
mimeType?: string,
filename?: string,
metadata?: {
totalChunks: number,
[key: string]: any
}
}],
metadata?: Record<string, any>
}
// Example: Image upload initialization
{
"sessionId": "user123",
"contents": [{
"type": "image",
"filename": "sunset.jpg",
"mimeType": "image/jpeg",
"metadata": {
"totalChunks": 3,
"imageSize": "2048x1536",
"fileSize": 1024000
}
}]
}
// Example: Audio upload initialization
{
"contents": [{
"type": "audio",
"filename": "recording.mp3",
"mimeType": "audio/mpeg",
"metadata": {
"totalChunks": 2,
"duration": "00:01:30",
"bitrate": "128kbps"
}
}]
}
// Example: Mixed content message
{
"sessionId": "user123",
"contents": [
{
"type": "text",
"content": "Please analyze this image:"
},
{
"type": "image",
"filename": "chart.png",
"mimeType": "image/png",
"metadata": {
"totalChunks": 1
}
}
]
}- File Chunk Upload
// Format: Binary message
// First 36 bytes: Upload ID (UUID)
// Remaining bytes: Chunk data
// Example (pseudo-code showing the structure):
const uploadId = "550e8400-e29b-41d4-a716-446655440000";
const chunk = new Uint8Array([/* chunk data */]);
const message = Buffer.concat([
Buffer.from(uploadId),
chunk
]);Server -> Client Messages
All server responses follow this format:
// Message Format
{
type: 'success' | 'error' | 'progress',
sessionId: string,
messageId: string,
data: {
content: string,
mimeType?: string,
progress?: number,
error?: string,
metadata?: Record<string, any>
}
}
// Example: Connection Success
{
"type": "success",
"sessionId": "550e8400-e29b-41d4-a716-446655440000",
"messageId": "msg-001",
"data": {
"content": "Connected to multimodal server",
"metadata": {
"sessionId": "550e8400-e29b-41d4-a716-446655440000"
}
}
}
// Example: Upload Ready Response
{
"type": "success",
"sessionId": "550e8400-e29b-41d4-a716-446655440000",
"messageId": "msg-002",
"data": {
"content": "Ready for upload",
"metadata": {
"uploadId": "6ba7b810-9dad-11d1-80b4-00c04fd430c8"
}
}
}
// Example: Upload Progress
{
"type": "progress",
"sessionId": "550e8400-e29b-41d4-a716-446655440000",
"messageId": "msg-003",
"data": {
"content": "Upload progress: 67%",
"progress": 67,
"metadata": {
"uploadId": "6ba7b810-9dad-11d1-80b4-00c04fd430c8"
}
}
}
// Example: Upload Complete
{
"type": "success",
"sessionId": "550e8400-e29b-41d4-a716-446655440000",
"messageId": "msg-004",
"data": {
"content": "Upload complete",
"metadata": {
"uploadId": "6ba7b810-9dad-11d1-80b4-00c04fd430c8",
"fileId": "file-001",
"permanentId": "perm-001",
"urls": {
"get": "https://storage.example.com/files/image.jpg",
"put": "https://storage.example.com/upload/image.jpg"
},
"mimeType": "image/jpeg",
"filename": "image.jpg",
"size": 1024000
}
}
}
// Example: Message Processing Complete
{
"type": "success",
"sessionId": "550e8400-e29b-41d4-a716-446655440000",
"messageId": "msg-005",
"data": {
"content": "Message processed",
"metadata": {
"message": {
"sessionId": "550e8400-e29b-41d4-a716-446655440000",
"contents": [
{
"type": "text",
"content": "Analysis complete. The image shows..."
}
],
"metadata": {
"processingTime": "1.2s"
}
}
}
}
}
// Example: Error Response
{
"type": "error",
"sessionId": "550e8400-e29b-41d4-a716-446655440000",
"messageId": "msg-006",
"data": {
"content": "Failed to process file",
"error": "Invalid file format: Only JPEG and PNG are supported",
"metadata": {
"uploadId": "6ba7b810-9dad-11d1-80b4-00c04fd430c8"
}
}
}Complete Flow Example
Here's a complete example showing a typical interaction flow:
// 1. Connect to WebSocket
const ws = new WebSocket('ws://localhost:8081');
// 2. Receive connection success
// Server sends:
{
"type": "success",
"sessionId": "user-session-001",
"messageId": "msg-001",
"data": {
"content": "Connected to multimodal server",
"metadata": { "sessionId": "user-session-001" }
}
}
// 3. Send text message with image upload
ws.send(JSON.stringify({
"sessionId": "user-session-001",
"contents": [
{
"type": "text",
"content": "What's in this image?"
},
{
"type": "image",
"filename": "scene.jpg",
"mimeType": "image/jpeg",
"metadata": {
"totalChunks": 2
}
}
]
}));
// 4. Receive upload ready confirmation
// Server sends:
{
"type": "success",
"sessionId": "user-session-001",
"messageId": "msg-002",
"data": {
"content": "Ready for upload",
"metadata": { "uploadId": "upload-001" }
}
}
// 5. Send file chunks
const chunk1 = new Uint8Array([/* first half of image */]);
const chunk2 = new Uint8Array([/* second half of image */]);
ws.send(Buffer.concat([Buffer.from("upload-001"), chunk1]));
ws.send(Buffer.concat([Buffer.from("upload-001"), chunk2]));
// 6. Receive progress updates
// Server sends after first chunk:
{
"type": "progress",
"sessionId": "user-session-001",
"messageId": "msg-003",
"data": {
"content": "Upload progress: 50%",
"progress": 50,
"metadata": { "uploadId": "upload-001" }
}
}
// 7. Receive upload complete
// Server sends:
{
"type": "success",
"sessionId": "user-session-001",
"messageId": "msg-004",
"data": {
"content": "Upload complete",
"metadata": {
"uploadId": "upload-001",
"fileId": "file-001",
"urls": {
"get": "https://storage.example.com/files/scene.jpg"
},
"mimeType": "image/jpeg",
"filename": "scene.jpg",
"size": 1048576
}
}
}
// 8. Receive message processing result
// Server sends:
{
"type": "success",
"sessionId": "user-session-001",
"messageId": "msg-005",
"data": {
"content": "Message processed",
"metadata": {
"message": {
"sessionId": "user-session-001",
"contents": [
{
"type": "text",
"content": "The image shows a sunny beach scene with palm trees..."
}
]
}
}
}
}🤝 Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
