broth-rl-sdk

v0.5.1

Published

3 days ago

BrothRL - Reinforcement Learning SDK for Voice Agents

Downloads

380

0High
0Medium
0Low

lorenzobalderrama-codingsoup

reinforcement-learning voice-agents conversational-ai contextual-bandits vapi twilio conversationrelay websocket real-time speech-recognition text-to-speech rl ai

BrothRL SDK

The Broth to your AI kitchen 🍜
Make your voice agents intelligent - A TypeScript SDK for adding Reinforcement Learning to voice applications.

🎯 What Problem Does It Solve?

Voice apps today make dumb decisions. They follow hard-coded conversation flows that can't adapt or optimize. BrothRL changes that by bringing Reinforcement Learning to voice applications.

The Pain Points

Hard-coded flows: Developers manually program "If user says X, then do Y"
No adaptation: Can't learn from what actually works
Missed optimization: No way to improve conversions, satisfaction, or efficiency
Wasted historical data: Thousands of call recordings with no way to extract strategies
Messy Data: Voice transcripts are unstructured and hard to learn from directly

What This SDK Provides

✅ Makes voice apps intelligent - Apps learn what actions lead to success
✅ Abstracts RL complexity - Developers don't need to be RL experts
✅ Turns data into strategy - Historical calls → optimized policies
✅ Safe production deployment - Built-in guardrails and fallbacks
✅ Platform agnostic - Works with any voice platform (Vapi, Retell, Twilio, etc.)
✅ Continuous improvement - Policies get better as more data comes in ✅ Messy Data Solutions - Built-in tools to convert raw text to structured states

🚀 Quick Start

Installation

npm install broth-rl-sdk

Basic Usage

import {
  Action,
  ActionSpace,
  ActionType,
  ContextualBandit,
  State,
} from 'broth-rl-sdk';

// 1. Define your action space
const actionSpace = new ActionSpace();
actionSpace.addAction(
  Action.create(ActionType.ASK_QUESTION, 'Ask for details', 'Get more info')
);
actionSpace.addAction(
  Action.create(ActionType.PROVIDE_INFO, 'Give solution', 'Solve problem')
);
actionSpace.addAction(
  Action.create(ActionType.TRANSFER_CALL, 'Transfer', 'Escalate to human')
);

// 2. Create a policy
const policy = new ContextualBandit({
  actionSpace,
  explorationRate: 0.1,
});

// 3. Use it in your conversation flow
const state = new State({
  conversationId: 'call_123',
  turnNumber: 1,
  history: [],
  features: {
    userIntent: 'billing_issue',
    sentiment: 'frustrated',
  },
});

// Select the best action
const action = policy.selectAction(state);
console.log('Agent should:', action.name);

// Update with reward when you know the outcome
policy.update(state, action, 1.0); // 1.0 for success, -1.0 for failure

📚 Core Concepts

State

Represents the conversation context at any point in time:

const state = new State({
  conversationId: 'call_123',
  turnNumber: 3,
  history: [
    { speaker: 'user', text: 'I need help', timestamp: '...' },
    { speaker: 'agent', text: 'How can I help?', timestamp: '...' },
  ],
  intent: 'support',
  features: {
    sentiment: 'neutral',
    accountAge: 'new',
  },
});

Handling Messy Data (StateSchema)

New in v0.4.0: Voice data is messy. The StateSchema utility helps you convert raw text into structured features that the Bandit can actually learn from.

import { StateSchema, Feature } from 'broth-rl-sdk';

// 1. Define your Schema (Zod-style)
const SalesState = StateSchema.define({
  // Enum: Buckets text into specific categories
  intent: Feature.enum(['price_objection', 'info_request', 'closing'])
    .matches({
      price_objection: ['too expensive', 'cost', 'price', 'budget'],
      info_request: ['tell me more', 'how does it work', 'details'],
      closing: ['buy', 'sign up', 'ready']
    })
    .default('info_request'),

  // Boolean: Detects presence of concepts
  is_angry: Feature.boolean()
    .matches({ 
      true: ['stupid', 'hate', 'annoying', 'stop'] 
    }),

  // Custom Extractor: For complex logic
  deal_size: Feature.number()
    .extract((text) => {
      const match = text.match(/\$(\d+)/);
      return match ? parseInt(match[1]) : 0;
    })
});

// 2. Auto-convert Text -> BrothRL State
const state = SalesState.toState("I think it's too expensive for my $500 budget.");

console.log(state.toJSON().features);
// Output: { intent: 'price_objection', is_angry: false, deal_size: 500 }

Action

What the agent can do:

const action = Action.create(
  ActionType.ASK_QUESTION,
  'Ask for order number',
  'Request the customer order number',
  { message: 'What is your order number?' }
);

Policy

The "brain" that decides which action to take:

const policy = new ContextualBandit({
  actionSpace,
  explorationRate: 0.1, // 10% random exploration
  useUCB: true, // Use upper confidence bound
});

Reward

How you tell the agent what's good:

const reward = new Reward();

// Immediate feedback
const immediate = reward.calculateImmediate(state, action, {
  sentiment: 'positive',
});

// Delayed feedback (at end of call)
const delayed = reward.calculateDelayed({
  success: true,
  metrics: { userSatisfaction: 0.9 },
});

🛡️ Safety & Guardrails

Production voice apps need safety constraints:

import { Guardrails, CommonGuardrails } from 'broth-rl-sdk';

const guardrails = new Guardrails({
  rules: [
    CommonGuardrails.maxTurns(20), // Don't let conversations go too long
    CommonGuardrails.noRepeat(3), // Don't repeat the same action
    CommonGuardrails.rateLimit('transfer_call', 1), // Max 1 transfer per call
  ],
  defaultFallback: Action.create(ActionType.END_CALL, 'End gracefully', '...'),
});

// Validate actions before taking them
const safeAction = guardrails.validate(state, action);

📊 Monitoring

Track what your agent is doing:

import { Monitor } from 'broth-rl-sdk';

const monitor = new Monitor();

// Log every action
monitor.log(state, action, reward);

// Get statistics
const stats = monitor.getOverallStats();
console.log('Action distribution:', stats.actionDistribution);
console.log('Average turns per conversation:', stats.averageTurnsPerConversation);

// Generate report
console.log(monitor.createReport());

🔌 Platform Adapters

Vapi

import { VapiAdapter } from 'broth-rl-sdk';

const adapter = new VapiAdapter();
adapter.setPolicy(policy);

// In your webhook handler
app.post('/vapi-webhook', async (req, res) => {
  const response = await adapter.handleRequest(req.body);
  res.json(response.raw);
});

Twilio Voice (TwiML)

import { TwilioAdapter } from 'broth-rl-sdk';

const adapter = new TwilioAdapter({ actionUrl: '/voice/action' });
adapter.setPolicy(policy);

// In your webhook handler
app.post('/voice', async (req, res) => {
  const response = await adapter.handleRequest(req.body);
  res.type('text/xml').send(response.raw);
});

Twilio ConversationRelay (Real-time WebSocket)

import { TwilioConversationRelayAdapter } from 'broth-rl-sdk';

const adapter = new TwilioConversationRelayAdapter({
  policy,
  defaultVoice: 'Polly.Joanna',
  welcomeGreeting: 'Hi! How can I help you today?',
});

// HTTP endpoint for incoming calls
app.post('/voice', (req, res) => {
  const twiml = TwilioConversationRelayAdapter.createConnectTwiML({
    websocketUrl: 'wss://your-server.com/websocket',
    welcomeGreeting: 'Hi! How can I help you today?',
  });
  res.type('text/xml').send(twiml);
});

// WebSocket handler for real-time conversation
wss.on('connection', (ws) => {
  ws.on('message', async (data) => {
    const message = JSON.parse(data.toString());
    const response = await adapter.handleRequest(message);
    ws.send(JSON.stringify(response.raw));
  });
});

Generic Webhook

import { WebhookAdapter } from 'broth-rl-sdk';

const adapter = new WebhookAdapter();
adapter.setPolicy(policy);

const response = await adapter.handleRequest({
  conversationId: 'call_123',
  userInput: 'I need help',
  event: 'user_message',
});

📈 Training from Historical Data

Turn your existing call logs into an optimized policy:

import { FlexibleParser } from 'broth-rl-sdk';

// 1. Load your conversation data
const conversations = [
  {
    id: 'conv_001',
    turns: [
      { speaker: 'user', text: 'Help with billing', intent: 'billing' },
      { 
        speaker: 'agent', 
        text: 'Let me help',
        action: { type: 'ask_question', name: 'Ask for details' }
      },
    ],
    outcome: { success: true, userSatisfaction: 0.9 },
  },
  // ... more conversations
];

// 2. Parse into training format
const parser = new FlexibleParser();
const dataset = parser.toTrainingDataset(conversations);

// 3. Train policy
for (const example of dataset.examples) {
  const state = new State({
    conversationId: 'training',
    turnNumber: 0,
    history: [],
    features: example.state,
  });
  
  const action = actionSpace.getAction(example.action.type);
  if (action) {
    policy.update(state, action, example.reward);
  }
}

// 4. Save trained policy
const policyData = policy.toJSON();
fs.writeFileSync('trained_policy.json', JSON.stringify(policyData));

🏗️ Architecture

┌─────────────────────────────────────────────────┐
│                  Voice Platform                 │
│            (Vapi, Twilio, Custom)              │
└────────────────┬────────────────────────────────┘
                 │
                 ▼
┌─────────────────────────────────────────────────┐
│              Platform Adapter                   │
│         (Translates platform ↔ SDK)            │
└────────────────┬────────────────────────────────┘
                 │
                 ▼
┌─────────────────────────────────────────────────┐
│                 Guardrails                      │
│            (Safety Constraints)                 │
└────────────────┬────────────────────────────────┘
                 │
                 ▼
┌─────────────────────────────────────────────────┐
│                   Policy                        │
│         (RL Algorithm - Selects Action)        │
└────────────────┬────────────────────────────────┘
                 │
                 ▼
┌─────────────────────────────────────────────────┐
│                  Monitor                        │
│           (Tracks Performance)                  │
└─────────────────────────────────────────────────┘

📦 What's Included

Core Components

State: Conversation state representation
Action: Action definitions and space
Policy: RL policy interface
Reward: Reward calculation
Environment: Simulation environment

Algorithms

ContextualBandit: Simple, effective RL for action selection
EpsilonGreedy: Exploration strategy

Data

Schema: Standard conversation data format
Parsers: Convert various formats to training data
FlexibleParser: Handles different conversation log formats
StateSchema (New): Zod-like tool to convert text to structured features

Adapters

VapiAdapter: Vapi platform integration
TwilioAdapter: Twilio Voice integration
WebhookAdapter: Generic webhook integration

Safety

Guardrails: Safety rules and constraints
Monitor: Logging and performance tracking

🎓 Examples

The snippets above show basic usage. Complete example applications are coming soon.

🧪 Testing

npm test

🛣️ Roadmap

[ ] More RL algorithms (Q-Learning, Policy Gradients)
[ ] Advanced reward shaping
[ ] Multi-objective optimization
[ ] A/B testing framework
[ ] Cloud-based policy training
[ ] Pre-trained policies for common use cases

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

📄 License

MIT License - see LICENSE file for details

💬 Support

Issues: GitHub Issues
Discussions: GitHub Discussions

🙏 Acknowledgments

Built with inspiration from:

OpenAI's work on RLHF
DeepMind's RL research
The voice AI community

Made with ❤️ for the voice AI community