storyframe

v1.1.0

Published

3 months ago

StoryFrame is a modern, TypeScript-based framework for building real-time AI chat applications with streaming responses, tool usage, and audio capabilities.

Downloads

0High
0Medium
0Low

hank_p65

🤖 StoryFrame: Real-time AI Chat Framework

StoryFrame is a modern, TypeScript-based framework for building real-time AI chat applications with streaming responses, tool usage, and audio capabilities.

🌟 Features

Core Features

✨ Real-time WebSocket communication
🔄 Streaming responses from LLMs
🛠️ Extensible tool system
💾 Persistent memory with Supabase
🎯 Session management
🔊 Text-to-Speech support
🤖 Model-agnostic design (currently using OpenAI)

Advanced Capabilities

🔄 Multi-turn conversations
🧠 Conversation memory and context management
🎭 Customizable system prompts
🔧 Tool-based actions
🎵 Audio streaming with Fish Audio TTS
🔍 Auto-cleanup of inactive sessions

🏗️ Project Structure

demo-streaming/
├── storyframe/           # Core Framework
│   ├── audio/           # TTS and audio handling
│   ├── chains/          # Conversation management
│   ├── core/            # Core agent implementation
│   ├── memory/          # Memory store implementations
│   ├── tools/           # Tool definitions and router
│   ├── types/           # TypeScript type definitions
│   └── utils/           # Utility functions
│
├── demo-backend/        # Example Backend Implementation
│   └── src/
│       └── server.ts   # WebSocket server setup
│
└── demo-frontend/       # Example Frontend Implementation
    └── src/
        └── components/ # React components

🚀 Getting Started

Prerequisites

Node.js 18+
pnpm (recommended) or npm
OpenAI API key
Fish Audio API key (optional, for TTS)
Supabase account (optional, for persistent storage)

Setup

Clone the repository:

git clone <repository-url>
cd demo-streaming

Install dependencies:

pnpm install

Create a .env file:

OPENAI_API_KEY=your_openai_api_key
FISH_AUDIO_API_KEY=your_fish_audio_api_key  # Optional
SUPABASE_URL=your_supabase_url              # Optional
SUPABASE_KEY=your_supabase_key              # Optional

Start the development server:

# Start the backend
cd demo-backend
pnpm dev

# Start the frontend (in another terminal)
cd demo-frontend
pnpm dev

💡 Usage

Using the Agent Directly

If you don't need WebSocket communication or session management, you can use the agent directly:

import { runAgent, runAgentWithOptions } from './core/agent';
import { ToolRouter } from './tools/toolRouter';
import { MemoryStore } from './memory/memoryStore';

// Setup tools
const toolRouter = new ToolRouter();
toolRouter.register({
  name: 'customTool',
  description: 'Tool description',
  execute: async (args) => {
    // Tool implementation
    return result;
  }
});

// Option 1: Simple usage with runAgent
const input = [
  { role: 'system', content: 'You are a helpful assistant.' },
  { role: 'user', content: 'Hello!' }
];

const { fullTextStream, plainTextStream } = await runAgent(
  input,
  toolRouter,
  [{ name: 'custom', content: 'Additional system prompt' }]
);

// Handle the streams
const reader = fullTextStream.getReader();
while (true) {
  const { value, done } = await reader.read();
  if (done) break;
  console.log(value); // Process each chunk
}

// Option 2: Advanced usage with runAgentWithOptions
const memoryStore = new MemoryStore();
const response = await runAgentWithOptions({
  sessionId: 'unique-id',
  responseId: 'response-id',
  prompt: 'Hello!',
  memoryStore,
  toolRouter,
  saveToMemory: true,
  systemPrompts: [
    {
      name: 'custom',
      content: 'Your custom system prompt'
    }
  ]
});

// Handle streams similarly
for await (const chunk of response.fullTextStream) {
  console.log(chunk);
}

The agent provides two main functions:

runAgent: Basic function for one-off interactions
runAgentWithOptions: Advanced function with memory and configuration options

Key differences from using ConversationManager:

No WebSocket handling
Manual stream processing
Direct control over conversation flow
No automatic session management
Manual memory handling

Using ConversationManager (WebSocket Support)

import { ConversationManager } from './chains/conversationManager';

const manager = new ConversationManager();

// Start a conversation
await manager.startConversation({
  sessionId: 'unique-id',
  memoryStore,
  toolRouter,
  saveToMemory: true
});

// Process messages
await manager.processMessage(sessionId, userMessage, {
  systemPrompts: [
    {
      name: 'custom',
      content: 'Your custom system prompt'
    }
  ]
});

Adding Custom Tools

import { ToolRouter } from './tools/toolRouter';

const toolRouter = new ToolRouter();
toolRouter.register({
  name: 'customTool',
  description: 'Tool description',
  execute: async (args) => {
    // Tool implementation
    return result;
  }
});

🔧 Configuration

Memory Store Options

In-memory store (default)
Supabase store (persistent)

const memoryStore = new SupabaseMemoryStore(
  process.env.SUPABASE_URL,
  process.env.SUPABASE_KEY
);

🔌 WebSocket Events Documentation

Multimodal WebSocket Server (Port 8081)

The multimodal WebSocket server handles various types of content including text, images, audio, video, and files. Here's a detailed breakdown of the events and message formats:

Client -> Server Messages

Text-only Message

// Message Format
{
  sessionId?: string;
  contents: [{
    type: 'text',
    content: string
  }],
  metadata?: Record<string, any>
}

// Example: Simple text message
{
  "contents": [{
    "type": "text",
    "content": "What is the capital of France?"
  }]
}

// Example: Text message with metadata
{
  "sessionId": "user123",
  "contents": [{
    "type": "text",
    "content": "Analyze this conversation"
  }],
  "metadata": {
    "language": "en",
    "timezone": "UTC-5"
  }
}

File Upload Initialization

// Message Format
{
  sessionId?: string,
  contents: [{
    type: 'image' | 'audio' | 'video' | 'file',
    content: string,
    mimeType?: string,
    filename?: string,
    metadata?: {
      totalChunks: number,
      [key: string]: any
    }
  }],
  metadata?: Record<string, any>
}

// Example: Image upload initialization
{
  "sessionId": "user123",
  "contents": [{
    "type": "image",
    "filename": "sunset.jpg",
    "mimeType": "image/jpeg",
    "metadata": {
      "totalChunks": 3,
      "imageSize": "2048x1536",
      "fileSize": 1024000
    }
  }]
}

// Example: Audio upload initialization
{
  "contents": [{
    "type": "audio",
    "filename": "recording.mp3",
    "mimeType": "audio/mpeg",
    "metadata": {
      "totalChunks": 2,
      "duration": "00:01:30",
      "bitrate": "128kbps"
    }
  }]
}

// Example: Mixed content message
{
  "sessionId": "user123",
  "contents": [
    {
      "type": "text",
      "content": "Please analyze this image:"
    },
    {
      "type": "image",
      "filename": "chart.png",
      "mimeType": "image/png",
      "metadata": {
        "totalChunks": 1
      }
    }
  ]
}

File Chunk Upload

// Format: Binary message
// First 36 bytes: Upload ID (UUID)
// Remaining bytes: Chunk data

// Example (pseudo-code showing the structure):
const uploadId = "550e8400-e29b-41d4-a716-446655440000";
const chunk = new Uint8Array([/* chunk data */]);
const message = Buffer.concat([
  Buffer.from(uploadId),
  chunk
]);

Server -> Client Messages

All server responses follow this format:

// Message Format
{
  type: 'success' | 'error' | 'progress',
  sessionId: string,
  messageId: string,
  data: {
    content: string,
    mimeType?: string,
    progress?: number,
    error?: string,
    metadata?: Record<string, any>
  }
}

// Example: Connection Success
{
  "type": "success",
  "sessionId": "550e8400-e29b-41d4-a716-446655440000",
  "messageId": "msg-001",
  "data": {
    "content": "Connected to multimodal server",
    "metadata": {
      "sessionId": "550e8400-e29b-41d4-a716-446655440000"
    }
  }
}

// Example: Upload Ready Response
{
  "type": "success",
  "sessionId": "550e8400-e29b-41d4-a716-446655440000",
  "messageId": "msg-002",
  "data": {
    "content": "Ready for upload",
    "metadata": {
      "uploadId": "6ba7b810-9dad-11d1-80b4-00c04fd430c8"
    }
  }
}

// Example: Upload Progress
{
  "type": "progress",
  "sessionId": "550e8400-e29b-41d4-a716-446655440000",
  "messageId": "msg-003",
  "data": {
    "content": "Upload progress: 67%",
    "progress": 67,
    "metadata": {
      "uploadId": "6ba7b810-9dad-11d1-80b4-00c04fd430c8"
    }
  }
}

// Example: Upload Complete
{
  "type": "success",
  "sessionId": "550e8400-e29b-41d4-a716-446655440000",
  "messageId": "msg-004",
  "data": {
    "content": "Upload complete",
    "metadata": {
      "uploadId": "6ba7b810-9dad-11d1-80b4-00c04fd430c8",
      "fileId": "file-001",
      "permanentId": "perm-001",
      "urls": {
        "get": "https://storage.example.com/files/image.jpg",
        "put": "https://storage.example.com/upload/image.jpg"
      },
      "mimeType": "image/jpeg",
      "filename": "image.jpg",
      "size": 1024000
    }
  }
}

// Example: Message Processing Complete
{
  "type": "success",
  "sessionId": "550e8400-e29b-41d4-a716-446655440000",
  "messageId": "msg-005",
  "data": {
    "content": "Message processed",
    "metadata": {
      "message": {
        "sessionId": "550e8400-e29b-41d4-a716-446655440000",
        "contents": [
          {
            "type": "text",
            "content": "Analysis complete. The image shows..."
          }
        ],
        "metadata": {
          "processingTime": "1.2s"
        }
      }
    }
  }
}

// Example: Error Response
{
  "type": "error",
  "sessionId": "550e8400-e29b-41d4-a716-446655440000",
  "messageId": "msg-006",
  "data": {
    "content": "Failed to process file",
    "error": "Invalid file format: Only JPEG and PNG are supported",
    "metadata": {
      "uploadId": "6ba7b810-9dad-11d1-80b4-00c04fd430c8"
    }
  }
}

Complete Flow Example

Here's a complete example showing a typical interaction flow:

// 1. Connect to WebSocket
const ws = new WebSocket('ws://localhost:8081');

// 2. Receive connection success
// Server sends:
{
  "type": "success",
  "sessionId": "user-session-001",
  "messageId": "msg-001",
  "data": {
    "content": "Connected to multimodal server",
    "metadata": { "sessionId": "user-session-001" }
  }
}

// 3. Send text message with image upload
ws.send(JSON.stringify({
  "sessionId": "user-session-001",
  "contents": [
    {
      "type": "text",
      "content": "What's in this image?"
    },
    {
      "type": "image",
      "filename": "scene.jpg",
      "mimeType": "image/jpeg",
      "metadata": {
        "totalChunks": 2
      }
    }
  ]
}));

// 4. Receive upload ready confirmation
// Server sends:
{
  "type": "success",
  "sessionId": "user-session-001",
  "messageId": "msg-002",
  "data": {
    "content": "Ready for upload",
    "metadata": { "uploadId": "upload-001" }
  }
}

// 5. Send file chunks
const chunk1 = new Uint8Array([/* first half of image */]);
const chunk2 = new Uint8Array([/* second half of image */]);

ws.send(Buffer.concat([Buffer.from("upload-001"), chunk1]));
ws.send(Buffer.concat([Buffer.from("upload-001"), chunk2]));

// 6. Receive progress updates
// Server sends after first chunk:
{
  "type": "progress",
  "sessionId": "user-session-001",
  "messageId": "msg-003",
  "data": {
    "content": "Upload progress: 50%",
    "progress": 50,
    "metadata": { "uploadId": "upload-001" }
  }
}

// 7. Receive upload complete
// Server sends:
{
  "type": "success",
  "sessionId": "user-session-001",
  "messageId": "msg-004",
  "data": {
    "content": "Upload complete",
    "metadata": {
      "uploadId": "upload-001",
      "fileId": "file-001",
      "urls": {
        "get": "https://storage.example.com/files/scene.jpg"
      },
      "mimeType": "image/jpeg",
      "filename": "scene.jpg",
      "size": 1048576
    }
  }
}

// 8. Receive message processing result
// Server sends:
{
  "type": "success",
  "sessionId": "user-session-001",
  "messageId": "msg-005",
  "data": {
    "content": "Message processed",
    "metadata": {
      "message": {
        "sessionId": "user-session-001",
        "contents": [
          {
            "type": "text",
            "content": "The image shows a sunny beach scene with palm trees..."
          }
        ]
      }
    }
  }
}

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.