npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2025 – Pkg Stats / Ryan Hefner

broth-rl-sdk

v0.5.1

Published

BrothRL - Reinforcement Learning SDK for Voice Agents

Downloads

380

Readme

BrothRL SDK

The Broth to your AI kitchen 🍜

Make your voice agents intelligent - A TypeScript SDK for adding Reinforcement Learning to voice applications.

License: MIT TypeScript

🎯 What Problem Does It Solve?

Voice apps today make dumb decisions. They follow hard-coded conversation flows that can't adapt or optimize. BrothRL changes that by bringing Reinforcement Learning to voice applications.

The Pain Points

  • Hard-coded flows: Developers manually program "If user says X, then do Y"
  • No adaptation: Can't learn from what actually works
  • Missed optimization: No way to improve conversions, satisfaction, or efficiency
  • Wasted historical data: Thousands of call recordings with no way to extract strategies
  • Messy Data: Voice transcripts are unstructured and hard to learn from directly

What This SDK Provides

Makes voice apps intelligent - Apps learn what actions lead to success
Abstracts RL complexity - Developers don't need to be RL experts
Turns data into strategy - Historical calls → optimized policies
Safe production deployment - Built-in guardrails and fallbacks
Platform agnostic - Works with any voice platform (Vapi, Retell, Twilio, etc.)
Continuous improvement - Policies get better as more data comes in ✅ Messy Data Solutions - Built-in tools to convert raw text to structured states

🚀 Quick Start

Installation

npm install broth-rl-sdk

Basic Usage

import {
  Action,
  ActionSpace,
  ActionType,
  ContextualBandit,
  State,
} from 'broth-rl-sdk';

// 1. Define your action space
const actionSpace = new ActionSpace();
actionSpace.addAction(
  Action.create(ActionType.ASK_QUESTION, 'Ask for details', 'Get more info')
);
actionSpace.addAction(
  Action.create(ActionType.PROVIDE_INFO, 'Give solution', 'Solve problem')
);
actionSpace.addAction(
  Action.create(ActionType.TRANSFER_CALL, 'Transfer', 'Escalate to human')
);

// 2. Create a policy
const policy = new ContextualBandit({
  actionSpace,
  explorationRate: 0.1,
});

// 3. Use it in your conversation flow
const state = new State({
  conversationId: 'call_123',
  turnNumber: 1,
  history: [],
  features: {
    userIntent: 'billing_issue',
    sentiment: 'frustrated',
  },
});

// Select the best action
const action = policy.selectAction(state);
console.log('Agent should:', action.name);

// Update with reward when you know the outcome
policy.update(state, action, 1.0); // 1.0 for success, -1.0 for failure

📚 Core Concepts

State

Represents the conversation context at any point in time:

const state = new State({
  conversationId: 'call_123',
  turnNumber: 3,
  history: [
    { speaker: 'user', text: 'I need help', timestamp: '...' },
    { speaker: 'agent', text: 'How can I help?', timestamp: '...' },
  ],
  intent: 'support',
  features: {
    sentiment: 'neutral',
    accountAge: 'new',
  },
});

Handling Messy Data (StateSchema)

New in v0.4.0: Voice data is messy. The StateSchema utility helps you convert raw text into structured features that the Bandit can actually learn from.

import { StateSchema, Feature } from 'broth-rl-sdk';

// 1. Define your Schema (Zod-style)
const SalesState = StateSchema.define({
  // Enum: Buckets text into specific categories
  intent: Feature.enum(['price_objection', 'info_request', 'closing'])
    .matches({
      price_objection: ['too expensive', 'cost', 'price', 'budget'],
      info_request: ['tell me more', 'how does it work', 'details'],
      closing: ['buy', 'sign up', 'ready']
    })
    .default('info_request'),

  // Boolean: Detects presence of concepts
  is_angry: Feature.boolean()
    .matches({ 
      true: ['stupid', 'hate', 'annoying', 'stop'] 
    }),

  // Custom Extractor: For complex logic
  deal_size: Feature.number()
    .extract((text) => {
      const match = text.match(/\$(\d+)/);
      return match ? parseInt(match[1]) : 0;
    })
});

// 2. Auto-convert Text -> BrothRL State
const state = SalesState.toState("I think it's too expensive for my $500 budget.");

console.log(state.toJSON().features);
// Output: { intent: 'price_objection', is_angry: false, deal_size: 500 }

Action

What the agent can do:

const action = Action.create(
  ActionType.ASK_QUESTION,
  'Ask for order number',
  'Request the customer order number',
  { message: 'What is your order number?' }
);

Policy

The "brain" that decides which action to take:

const policy = new ContextualBandit({
  actionSpace,
  explorationRate: 0.1, // 10% random exploration
  useUCB: true, // Use upper confidence bound
});

Reward

How you tell the agent what's good:

const reward = new Reward();

// Immediate feedback
const immediate = reward.calculateImmediate(state, action, {
  sentiment: 'positive',
});

// Delayed feedback (at end of call)
const delayed = reward.calculateDelayed({
  success: true,
  metrics: { userSatisfaction: 0.9 },
});

🛡️ Safety & Guardrails

Production voice apps need safety constraints:

import { Guardrails, CommonGuardrails } from 'broth-rl-sdk';

const guardrails = new Guardrails({
  rules: [
    CommonGuardrails.maxTurns(20), // Don't let conversations go too long
    CommonGuardrails.noRepeat(3), // Don't repeat the same action
    CommonGuardrails.rateLimit('transfer_call', 1), // Max 1 transfer per call
  ],
  defaultFallback: Action.create(ActionType.END_CALL, 'End gracefully', '...'),
});

// Validate actions before taking them
const safeAction = guardrails.validate(state, action);

📊 Monitoring

Track what your agent is doing:

import { Monitor } from 'broth-rl-sdk';

const monitor = new Monitor();

// Log every action
monitor.log(state, action, reward);

// Get statistics
const stats = monitor.getOverallStats();
console.log('Action distribution:', stats.actionDistribution);
console.log('Average turns per conversation:', stats.averageTurnsPerConversation);

// Generate report
console.log(monitor.createReport());

🔌 Platform Adapters

Vapi

import { VapiAdapter } from 'broth-rl-sdk';

const adapter = new VapiAdapter();
adapter.setPolicy(policy);

// In your webhook handler
app.post('/vapi-webhook', async (req, res) => {
  const response = await adapter.handleRequest(req.body);
  res.json(response.raw);
});

Twilio Voice (TwiML)

import { TwilioAdapter } from 'broth-rl-sdk';

const adapter = new TwilioAdapter({ actionUrl: '/voice/action' });
adapter.setPolicy(policy);

// In your webhook handler
app.post('/voice', async (req, res) => {
  const response = await adapter.handleRequest(req.body);
  res.type('text/xml').send(response.raw);
});

Twilio ConversationRelay (Real-time WebSocket)

import { TwilioConversationRelayAdapter } from 'broth-rl-sdk';

const adapter = new TwilioConversationRelayAdapter({
  policy,
  defaultVoice: 'Polly.Joanna',
  welcomeGreeting: 'Hi! How can I help you today?',
});

// HTTP endpoint for incoming calls
app.post('/voice', (req, res) => {
  const twiml = TwilioConversationRelayAdapter.createConnectTwiML({
    websocketUrl: 'wss://your-server.com/websocket',
    welcomeGreeting: 'Hi! How can I help you today?',
  });
  res.type('text/xml').send(twiml);
});

// WebSocket handler for real-time conversation
wss.on('connection', (ws) => {
  ws.on('message', async (data) => {
    const message = JSON.parse(data.toString());
    const response = await adapter.handleRequest(message);
    ws.send(JSON.stringify(response.raw));
  });
});

Generic Webhook

import { WebhookAdapter } from 'broth-rl-sdk';

const adapter = new WebhookAdapter();
adapter.setPolicy(policy);

const response = await adapter.handleRequest({
  conversationId: 'call_123',
  userInput: 'I need help',
  event: 'user_message',
});

📈 Training from Historical Data

Turn your existing call logs into an optimized policy:

import { FlexibleParser } from 'broth-rl-sdk';

// 1. Load your conversation data
const conversations = [
  {
    id: 'conv_001',
    turns: [
      { speaker: 'user', text: 'Help with billing', intent: 'billing' },
      { 
        speaker: 'agent', 
        text: 'Let me help',
        action: { type: 'ask_question', name: 'Ask for details' }
      },
    ],
    outcome: { success: true, userSatisfaction: 0.9 },
  },
  // ... more conversations
];

// 2. Parse into training format
const parser = new FlexibleParser();
const dataset = parser.toTrainingDataset(conversations);

// 3. Train policy
for (const example of dataset.examples) {
  const state = new State({
    conversationId: 'training',
    turnNumber: 0,
    history: [],
    features: example.state,
  });
  
  const action = actionSpace.getAction(example.action.type);
  if (action) {
    policy.update(state, action, example.reward);
  }
}

// 4. Save trained policy
const policyData = policy.toJSON();
fs.writeFileSync('trained_policy.json', JSON.stringify(policyData));

🏗️ Architecture

┌─────────────────────────────────────────────────┐
│                  Voice Platform                 │
│            (Vapi, Twilio, Custom)              │
└────────────────┬────────────────────────────────┘
                 │
                 ▼
┌─────────────────────────────────────────────────┐
│              Platform Adapter                   │
│         (Translates platform ↔ SDK)            │
└────────────────┬────────────────────────────────┘
                 │
                 ▼
┌─────────────────────────────────────────────────┐
│                 Guardrails                      │
│            (Safety Constraints)                 │
└────────────────┬────────────────────────────────┘
                 │
                 ▼
┌─────────────────────────────────────────────────┐
│                   Policy                        │
│         (RL Algorithm - Selects Action)        │
└────────────────┬────────────────────────────────┘
                 │
                 ▼
┌─────────────────────────────────────────────────┐
│                  Monitor                        │
│           (Tracks Performance)                  │
└─────────────────────────────────────────────────┘

📦 What's Included

Core Components

  • State: Conversation state representation
  • Action: Action definitions and space
  • Policy: RL policy interface
  • Reward: Reward calculation
  • Environment: Simulation environment

Algorithms

  • ContextualBandit: Simple, effective RL for action selection
  • EpsilonGreedy: Exploration strategy

Data

  • Schema: Standard conversation data format
  • Parsers: Convert various formats to training data
  • FlexibleParser: Handles different conversation log formats
  • StateSchema (New): Zod-like tool to convert text to structured features

Adapters

  • VapiAdapter: Vapi platform integration
  • TwilioAdapter: Twilio Voice integration
  • WebhookAdapter: Generic webhook integration

Safety

  • Guardrails: Safety rules and constraints
  • Monitor: Logging and performance tracking

🎓 Examples

The snippets above show basic usage. Complete example applications are coming soon.

🧪 Testing

npm test

🛣️ Roadmap

  • [ ] More RL algorithms (Q-Learning, Policy Gradients)
  • [ ] Advanced reward shaping
  • [ ] Multi-objective optimization
  • [ ] A/B testing framework
  • [ ] Cloud-based policy training
  • [ ] Pre-trained policies for common use cases

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

📄 License

MIT License - see LICENSE file for details

💬 Support

🙏 Acknowledgments

Built with inspiration from:

  • OpenAI's work on RLHF
  • DeepMind's RL research
  • The voice AI community

Made with ❤️ for the voice AI community