@rankfor/dice-roller

v1.3.1

Published

2 days ago

AI response stability analyzer - measure how consistently LLMs recommend your brand across Gemini, OpenAI, and Grok

@rankfor/dice-roller

AI Response Stability Analyzer - Measure how consistently LLMs recommend your brand.

What is AI Response Stability?

When you ask an AI "What are the best CRM tools?", the answer varies each time. This variability matters for brand visibility:

Stable mentions = Your brand is consistently recommended
Variable mentions = Your brand appears sometimes, but not reliably
Missing mentions = Opportunity gap to address

This library helps you understand how AI models perceive your brand by running the same prompt multiple times and analyzing patterns.

Features

Multi-Model Support: Analyze responses from Gemini, OpenAI GPT, and Grok
Search Mode: Compare "memory" (training data) vs "search" (live web) responses
Experiment Mode: Run cross-model comparisons with statistical analysis
Shannon Entropy: Measure diversity of brand recommendations
Gini Coefficient: Detect concentration bias toward specific brands
Jaccard Similarity: Calculate cross-model overlap in recommendations
Brand Mention Tracking: Count and analyze sentiment around your brand
Message Classification: Identify core (stable), variable, and outlier messages
Citations Support: Get source URLs when using search mode
Zero Dependencies: Only peer dependencies for the LLM SDKs you use

Installation

npm install @rankfor/dice-roller

Install the LLM SDK(s) you plan to use:

# For Gemini
npm install @google/generative-ai

# For OpenAI or Grok
npm install openai

Quick Start

Single Model Analysis

import { analyzeStability } from '@rankfor/dice-roller';

const result = await analyzeStability({
  prompt: 'What are the best CRM tools for small businesses?',
  iterations: 5,
  model: 'gemini',
  apiKey: process.env.GEMINI_API_KEY,
  brandName: 'Salesforce', // Optional: track specific brand mentions
});

console.log(`Consistency Score: ${result.consistencyScore}%`);
console.log('Core Messages:', result.analysis.coreStableMessages);
console.log('Brand Mentions:', result.brandMentions.average);

Cross-Model Experiment

import { runExperiment } from '@rankfor/dice-roller';

const result = await runExperiment({
  prompt: 'What project management tools do you recommend?',
  apiKeys: {
    geminiApiKey: process.env.GEMINI_API_KEY,
    openaiApiKey: process.env.OPENAI_API_KEY,
    grokApiKey: process.env.GROK_API_KEY,
  },
  iterations: 5,
  brandName: 'Asana',
});

// Messages ALL models agree on
console.log('Universal Messages:', result.universalCoreMessages);

// Brands mentioned by ALL models
console.log('Universal Brands:', result.universalBrands);

// Statistical analysis
console.log('Shannon Entropy:', result.statistics.shannonEntropy);
console.log('Gini Coefficients:', result.statistics.giniCoefficients);
console.log('Cross-Model Overlap:', result.crossModelOverlap);

API Reference

`analyzeStability(options)`

Analyze response stability for a single model.

Options:

| Option | Type | Default | Description | |--------|------|---------|-------------| | prompt | string | required | The prompt to analyze | | iterations | number | 5 | Number of iterations (1-10) | | model | 'gemini' \| 'openai' \| 'grok' | 'gemini' | LLM to use | | apiKey | string | required | API key for the chosen model | | temperature | number | 0.7 | Model temperature | | maxTokens | number | 2048 | Max tokens per response | | brandName | string | optional | Brand to track mentions for | | iterationDelayMs | number | 1000 | Delay between iterations | | searchMode | 'memory' \| 'search' | 'memory' | Use training data or live web search | | onProgress | function | optional | Progress callback | | onError | function | optional | Error callback for failed iterations | | analysisConfig | AnalysisConfig | optional | Custom stop words, synonyms, indicators |

Returns: Promise<StabilityResult>

`runExperiment(options)`

Run cross-model analysis with statistical comparisons.

Options:

| Option | Type | Default | Description | |--------|------|---------|-------------| | prompt | string | required | The prompt to analyze | | apiKeys | ExperimentApiKeys | required | API keys for each provider | | iterations | number | 5 | Iterations per model | | temperature | number | 0.7 | Model temperature | | maxTokens | number | 2048 | Max tokens per response | | brandName | string | optional | Brand for focused analysis | | iterationDelayMs | number | 1000 | Delay between iterations | | searchMode | 'memory' \| 'search' | 'memory' | Use training data or live web search | | onProgress | function | optional | Progress callback (model, iteration, total) | | onError | function | optional | Error callback with model name | | analysisConfig | AnalysisConfig | optional | Custom stop words, synonyms, indicators |

Returns: Promise<ExperimentResult>

Understanding the Results

Consistency Score (0-100)

A weighted score combining:

40%: Semantic overlap between responses
30%: Brand mention consistency (low variance = higher score)
30%: Presence of core stable messages

Message Classification

| Category | Threshold | Meaning | |----------|-----------|---------| | Core Stable | 80%+ appearances | Always mentioned - reliable AI perception | | Variable | 30-80% appearances | Sometimes mentioned - needs reinforcement | | Outliers | <30% appearances | Rarely mentioned - potential opportunities |

Statistical Metrics

Shannon Entropy

Measures diversity of brand recommendations:

Low (<0.5): Stereotyped - few dominant brands
Moderate (0.5-0.8): Balanced recommendations
High (>0.8): Diverse - many brands recommended equally

Gini Coefficient

Measures concentration inequality:

0: Perfect equality (all brands mentioned equally)
1: Perfect inequality (one brand dominates)
>0.6: High concentration - market dominated by few players

Jaccard Index

Measures cross-model overlap:

1: Perfect overlap (models recommend same brands)
0: No overlap (completely different recommendations)
>0.5: High agreement between models

Search Mode: Memory vs Live Web

A key feature of this library is comparing how AI responds using its training data versus live web search.

Why This Matters

Memory mode (searchMode: 'memory'): Shows what the AI "believes" based on training
Search mode (searchMode: 'search'): Shows what AI finds via real-time web search

Comparing both reveals:

Whether your brand appears in AI's "memory" (training data)
Whether your brand is discoverable via web search
Gaps between perceived vs actual brand presence

Using Search Mode

// Memory mode (default) - uses training data only
const memoryResult = await analyzeStability({
  prompt: 'Best CRM tools for startups',
  model: 'gemini',
  apiKey: process.env.GEMINI_API_KEY,
  searchMode: 'memory', // or omit - it's the default
});

// Search mode - uses live web search with citations
const searchResult = await analyzeStability({
  prompt: 'Best CRM tools for startups',
  model: 'gemini',
  apiKey: process.env.GEMINI_API_KEY,
  searchMode: 'search',
});

// Compare results
console.log('Memory mentions:', memoryResult.brandMentions.average);
console.log('Search mentions:', searchResult.brandMentions.average);

// Search mode includes citations
for (const response of searchResult.responses) {
  if (response.citations) {
    console.log('Sources:', response.citations.map(c => c.url));
  }
}

Cross-Model Experiment with Search

const experiment = await runExperiment({
  prompt: 'What analytics tools do enterprises use?',
  apiKeys: {
    geminiApiKey: process.env.GEMINI_API_KEY,
    openaiApiKey: process.env.OPENAI_API_KEY,
    grokApiKey: process.env.GROK_API_KEY,
  },
  searchMode: 'search', // All models will use web search
  iterations: 5,
});

// See what each model finds via web search
for (const modelResult of experiment.modelResults) {
  console.log(`${modelResult.model} found:`, modelResult.uniqueBrands);
}

Search Mode by Provider

| Provider | Memory Mode | Search Mode | |----------|-------------|-------------| | Gemini | Standard generation | Google Search Grounding | | OpenAI | Chat completions | Web Search Tool | | Grok/xAI | Standard generation | Agentic Search (web + X) |

Pricing Considerations

Search mode may incur additional costs:

Gemini: First 1,500 grounding queries/day free, then $35/1,000 queries
OpenAI: Uses search-enabled models (check current pricing)
Grok/xAI: Free during beta, check current pricing

Models Supported

| Provider | Model | Model ID | |----------|-------|----------| | Google | Gemini 3 Pro | gemini-3-pro-preview | | OpenAI | GPT-5.2 | gpt-5.2 | | xAI | Grok 4 | grok-4 |

Configuration

Customize the analysis for different industries or languages:

const result = await analyzeStability({
  prompt: 'What CRM tools do you recommend?',
  model: 'gemini',
  apiKey: process.env.GEMINI_API_KEY,
  analysisConfig: {
    // Replace default English stop words (for non-English content)
    customStopWords: ['der', 'die', 'das', 'und', 'oder', ...],

    // Add industry-specific synonym groups
    customSynonyms: [
      ['CRM', 'customer relationship management', 'client management'],
      ['SaaS', 'cloud software', 'software as a service'],
    ],

    // Extend sentiment detection
    customPositiveIndicators: ['enterprise-grade', 'scalable'],
    customNegativeIndicators: ['legacy', 'clunky'],
  },

  // Error handling
  onError: (error) => {
    console.error(`Iteration ${error.iteration} failed: ${error.message}`);
  },
});

Helper Functions

The library exports several utility functions for custom analysis:

import {
  cleanResponse,           // Remove markdown formatting
  analyzeSentiment,        // Detect positive/neutral/negative
  extractBrandMentions,    // Find brand mentions with context
  extractKeyPoints,        // Extract key points from text
  calculateSemanticOverlap, // Word-based similarity (0-100)
  calculateJaccardSimilarity, // Set-based similarity (0-1)
  calculateShannonEntropy,    // Brand diversity metric
  calculateGiniCoefficient,   // Concentration inequality
  estimateTokenCount,         // Approximate token count (~4 chars/token)
} from '@rankfor/dice-roller';

Use Cases

Brand Monitoring

Track how consistently AI recommends your brand vs competitors:

const result = await analyzeStability({
  prompt: 'What email marketing tools do you recommend?',
  brandName: 'Mailchimp',
  iterations: 10,
  model: 'gemini',
  apiKey: process.env.GEMINI_API_KEY,
});

if (result.brandMentions.average < 1) {
  console.log('Low visibility - brand rarely mentioned');
}

Content Strategy

Identify messages to reinforce in your content:

const result = await runExperiment({
  prompt: 'What are the benefits of cloud storage?',
  apiKeys: { geminiApiKey, openaiApiKey },
  iterations: 5,
});

// Use these in your content
console.log('Messages to reinforce:', result.universalCoreMessages);

// Gaps to address
console.log('Opportunities:', result.recommendations.differentiationOpportunities);

Competitive Intelligence

Compare how AI perceives you vs competitors:

const yourBrand = await analyzeStability({
  prompt: 'Best project management tools',
  brandName: 'Asana',
  ...
});

const competitor = await analyzeStability({
  prompt: 'Best project management tools',
  brandName: 'Monday.com',
  ...
});

console.log(`Your visibility: ${yourBrand.brandMentions.average}`);
console.log(`Competitor visibility: ${competitor.brandMentions.average}`);

Browser Usage

For browser environments (e.g., React apps), you can use this library client-side. The OpenAI SDK requires a flag:

// The library handles this internally for browser environments
const OpenAI = new OpenAI({
  apiKey: userProvidedKey,
  dangerouslyAllowBrowser: true, // Required for client-side
});

Note: Never expose API keys in production client-side code. This is suitable for tools where users provide their own keys.

TypeScript Support

Full TypeScript support with exported types:

import type {
  // Core types
  StabilityOptions,
  StabilityResult,
  ExperimentOptions,
  ExperimentResult,
  ResponseData,
  // Search mode
  SearchMode,
  SearchCitation,
  // Brand analysis
  BrandMentionContext,
  BrandMentionStats,
  // Statistical analysis
  EntropyResult,
  GiniResult,
  CrossModelOverlap,
} from '@rankfor/dice-roller';

Contributing

We welcome contributions. Please see CONTRIBUTING.md for guidelines.

License

MIT License - see LICENSE for details.

About Rankfor.AI

Rankfor.AI helps brands understand and improve their visibility in AI-powered recommendations. This library is part of our open-source toolkit for AI visibility analysis.

Live Demo - Try the Dice Roller tool online
Rankfor.AI - AI Brand Visibility Platform
Research Papers - Our published research on AI recommendation patterns

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@rankfor/dice-roller

What is AI Response Stability?

Features

Installation

Quick Start

Single Model Analysis

Cross-Model Experiment

API Reference

analyzeStability(options)

runExperiment(options)

Understanding the Results

Consistency Score (0-100)

Message Classification

Statistical Metrics

Shannon Entropy

Gini Coefficient

Jaccard Index

Search Mode: Memory vs Live Web

Why This Matters

Using Search Mode

Cross-Model Experiment with Search

Search Mode by Provider

Pricing Considerations

Models Supported

Configuration

Helper Functions

Use Cases

Brand Monitoring

Content Strategy

Competitive Intelligence

Browser Usage

TypeScript Support

Contributing

License

About Rankfor.AI

Related

`analyzeStability(options)`

`runExperiment(options)`