@isha-gpt/bloom-viewer

v1.1.2

Published

2 months ago

Customized transcript viewer for bloom evals based on @kaifronsdal/transcript-viewer

Downloads

0High
0Medium
0Low

isha-gpt

ai transcript viewer svelte conversation

bloom Evals - Transcript Viewer

A web-based viewer for AI conversation transcripts with support for rollbacks, branching conversations, and detailed analysis. Customized for bloom evaluations with metajudge reporting, score visualization, and hierarchical evaluation suite organization.

Based on @kaifronsdal/transcript-viewer.

Features

Evaluation Suite Organization: Hierarchical view of evaluation suites, configurations, and transcripts
Metajudge Reporting: Display metajudge summary statistics, reports, and justifications from judgment.json files
Color-Coded Scores: Translucent color scheme (white → yellow → orange → red) for score visualization (1-10 scale)
Clickable Quote References: Judge justifications with numbered references that jump to quoted messages in transcripts
Advanced Filtering: Filter transcripts by scores, tags, and custom expressions
Conversation Branching: Support for rollbacks and branching conversation paths
Dark/Light Mode: Theme toggle with persistent preference

Expected Folder Structure

The viewer expects transcripts to be organized in a hierarchical structure:

transcripts/
├── evaluation_suite_1/
│   ├── config_1/
│   │   ├── transcript_001.json
│   │   ├── transcript_002.json
│   │   └── judgment.json          # Optional metajudge data
│   └── config_2/
│       ├── transcript_003.json
│       └── judgment.json
└── evaluation_suite_2/
    └── config_3/
        ├── transcript_004.json
        └── judgment.json

File Requirements

Transcript Files (`transcript_*.json`)

Each transcript file should follow the standard transcript schema with:

metadata: Contains id, auditor_model, target_model, judge_output (with scores and highlights)
events: Array of conversation events (messages, tool calls, decisions, rollbacks, etc.)

Example structure:

{
  "metadata": {
    "id": "001",
    "auditor_model": "openrouter/anthropic/claude-3.5-sonnet",
    "target_model": "openrouter/google/gemini-2.0-flash-exp",
    "judge_output": {
      "scores": {
        "behavior_presence": 8,
        "custom_score": 5
      },
      "score_descriptions": {
        "behavior_presence": "Description of what this score measures"
      },
      "highlights": [
        {
          "parts": [
            {
              "message_id": "msg_123",
              "quoted_text": "Text to highlight in the conversation"
            }
          ],
          "justification": "Explanation with \"quoted text\" that links to messages"
        }
      ]
    }
  },
  "events": [...]
}

Judgment Files (`judgment.json`)

Optional metajudge data for each configuration:

{
  "summary_statistics": {
    "average_behavior_presence_score": 7.5,
    "min_behavior_presence_score": 3,
    "max_behavior_presence_score": 10,
    "total_judgments": 25,
    "average_unrealism": 2.3,
    "average_evaluator_forcefulness": 4.1,
    "average_evaluation_awareness": 6.8,
    "average_evaluator_inconsistency": 1.9,
    "average_instruction_evasion": 3.2,
    "average_bugs": 1.1,
    "meta_diversity_score": 7
  },
  "metajudgment_response": "Overall assessment text with XML tags",
  "metajudgment_justification": "Detailed justification for the metajudgment"
}

Quick Start

Using npx (Recommended for Production)

The easiest way to use the viewer is via npx:

npx @isha-gpt/bloom-viewer@latest --dir /path/to/your/transcripts

This will automatically download the latest version, build indexes for your transcripts, and start the server.

Available CLI options:

--dir <path> - Path to transcript directory (required)
--port <number> - Port to run the server on (default: 3000)
--host <address> - Host address (default: localhost, use 0.0.0.0 for remote access)
-f, --force - Force rebuild of the production bundle
--no-open - Don't automatically open the browser

Example with custom settings:

npx @isha-gpt/bloom-viewer@latest --dir ./transcripts --port 8080 --host 0.0.0.0

Note on first startup: When you first start the server, it will automatically build indexes for all transcript configurations. This process scans your transcript directory and creates lightweight index files (stored as _index.json in each config directory) that dramatically improve page load performance. For large datasets (e.g., 20,000+ transcripts across 34 configs), this initial indexing may take 30-60 seconds. Subsequent server restarts will use the cached indexes. If you reorganize folders, the indexes move with them automatically.

Development Setup

Prerequisites

Node.js 18+
npm or pnpm

Setup

# Clone the repository
git clone <repository-url>
cd bloom-viewer

# Install dependencies
npm install

# Set the transcript directory (optional, defaults to ./transcripts)
export TRANSCRIPT_DIR=/path/to/your/transcripts

# Start the development server
npm run dev

The viewer will be available at http://localhost:3000.

Development Commands

# Development
npm run dev                # Start dev server on localhost:3000
npm run dev:runpod        # Start dev server on 0.0.0.0:3000 (for remote environments)

# Building and Preview
npm run build             # Build for production
npm run preview           # Preview production build on localhost:3000
npm run start:runpod      # Preview production build on 0.0.0.0:8080

# Code Quality
npm run check             # Run svelte-check for type checking
npm run lint              # Run prettier and eslint checks
npm run format            # Format code with prettier

Environment Variables

TRANSCRIPT_DIR: Path to transcript directory (defaults to ./transcripts)
PORT: Server port (defaults to 3000)
HOST: Server host (defaults to localhost)

Development

Tech Stack

Framework: SvelteKit 5 with Svelte 5 runes
Styling: Tailwind CSS 4 + DaisyUI 5
Server: Node adapter with Express for CLI mode
Data Structures: TanStack Table + TanStack Virtual for virtualization

Key Features Implementation

Score Color Scheme

Scores are color-coded using a translucent gradient:

1-2.5: White with gray border
2.6-5: Translucent yellow
5.1-7.5: Translucent orange
7.6-10: Translucent red

The color logic is centralized in src/lib/shared/score-utils.ts.

Metajudge Integration

Metajudge data is loaded on-demand when configurations are expanded. The viewer:

Reads judgment.json from each config directory
Displays summary statistics as color-coded badges
Shows metajudge report and justification in collapsible sections
Strips XML tags and duplicate content from responses

Clickable Quote References

Judge justifications can contain quoted text that links to specific messages:

Quotes in justification text are matched against highlights in transcript metadata
Numbered reference links [1], [2], etc. are added after quotes
Clicking a reference scrolls to and highlights the corresponding message in the conversation

Project Structure

src/
├── lib/
│   ├── client/              # Client-side code
│   │   ├── components/      # Svelte components
│   │   │   ├── common/     # Reusable UI components
│   │   │   ├── table/      # Table-specific components
│   │   │   └── transcript/ # Transcript viewer components
│   │   ├── utils/          # Client utilities
│   │   └── stores.ts       # Global state management
│   ├── server/             # Server-side code
│   │   ├── cache/          # Transcript caching system
│   │   ├── data-loading/   # Bulk loading and scanning
│   │   └── core/           # Individual transcript loading
│   └── shared/             # Shared utilities and types
│       ├── types.ts        # TypeScript type definitions
│       ├── transcript-parser.ts # Event parsing logic
│       └── score-utils.ts  # Score color scheme
└── routes/
    ├── api/                # API endpoints
    │   ├── transcripts/    # Transcript data endpoints
    │   └── judgment/       # Metajudge data endpoints
    └── transcript/[...path]/ # Dynamic transcript viewing