@isha-gpt/bloom-viewer
v1.1.2
Published
Customized transcript viewer for bloom evals based on @kaifronsdal/transcript-viewer
Maintainers
Readme
bloom Evals - Transcript Viewer
A web-based viewer for AI conversation transcripts with support for rollbacks, branching conversations, and detailed analysis. Customized for bloom evaluations with metajudge reporting, score visualization, and hierarchical evaluation suite organization.
Based on @kaifronsdal/transcript-viewer.
Features
- Evaluation Suite Organization: Hierarchical view of evaluation suites, configurations, and transcripts
- Metajudge Reporting: Display metajudge summary statistics, reports, and justifications from
judgment.jsonfiles - Color-Coded Scores: Translucent color scheme (white → yellow → orange → red) for score visualization (1-10 scale)
- Clickable Quote References: Judge justifications with numbered references that jump to quoted messages in transcripts
- Advanced Filtering: Filter transcripts by scores, tags, and custom expressions
- Conversation Branching: Support for rollbacks and branching conversation paths
- Dark/Light Mode: Theme toggle with persistent preference
Expected Folder Structure
The viewer expects transcripts to be organized in a hierarchical structure:
transcripts/
├── evaluation_suite_1/
│ ├── config_1/
│ │ ├── transcript_001.json
│ │ ├── transcript_002.json
│ │ └── judgment.json # Optional metajudge data
│ └── config_2/
│ ├── transcript_003.json
│ └── judgment.json
└── evaluation_suite_2/
└── config_3/
├── transcript_004.json
└── judgment.jsonFile Requirements
Transcript Files (transcript_*.json)
Each transcript file should follow the standard transcript schema with:
metadata: Containsid,auditor_model,target_model,judge_output(with scores and highlights)events: Array of conversation events (messages, tool calls, decisions, rollbacks, etc.)
Example structure:
{
"metadata": {
"id": "001",
"auditor_model": "openrouter/anthropic/claude-3.5-sonnet",
"target_model": "openrouter/google/gemini-2.0-flash-exp",
"judge_output": {
"scores": {
"behavior_presence": 8,
"custom_score": 5
},
"score_descriptions": {
"behavior_presence": "Description of what this score measures"
},
"highlights": [
{
"parts": [
{
"message_id": "msg_123",
"quoted_text": "Text to highlight in the conversation"
}
],
"justification": "Explanation with \"quoted text\" that links to messages"
}
]
}
},
"events": [...]
}Judgment Files (judgment.json)
Optional metajudge data for each configuration:
{
"summary_statistics": {
"average_behavior_presence_score": 7.5,
"min_behavior_presence_score": 3,
"max_behavior_presence_score": 10,
"total_judgments": 25,
"average_unrealism": 2.3,
"average_evaluator_forcefulness": 4.1,
"average_evaluation_awareness": 6.8,
"average_evaluator_inconsistency": 1.9,
"average_instruction_evasion": 3.2,
"average_bugs": 1.1,
"meta_diversity_score": 7
},
"metajudgment_response": "Overall assessment text with XML tags",
"metajudgment_justification": "Detailed justification for the metajudgment"
}Quick Start
Using npx (Recommended for Production)
The easiest way to use the viewer is via npx:
npx @isha-gpt/bloom-viewer@latest --dir /path/to/your/transcriptsThis will automatically download the latest version, build indexes for your transcripts, and start the server.
Available CLI options:
--dir <path>- Path to transcript directory (required)--port <number>- Port to run the server on (default: 3000)--host <address>- Host address (default: localhost, use 0.0.0.0 for remote access)-f, --force- Force rebuild of the production bundle--no-open- Don't automatically open the browser
Example with custom settings:
npx @isha-gpt/bloom-viewer@latest --dir ./transcripts --port 8080 --host 0.0.0.0Note on first startup: When you first start the server, it will automatically build indexes for all transcript configurations. This process scans your transcript directory and creates lightweight index files (stored as _index.json in each config directory) that dramatically improve page load performance. For large datasets (e.g., 20,000+ transcripts across 34 configs), this initial indexing may take 30-60 seconds. Subsequent server restarts will use the cached indexes. If you reorganize folders, the indexes move with them automatically.
Development Setup
Prerequisites
- Node.js 18+
- npm or pnpm
Setup
# Clone the repository
git clone <repository-url>
cd bloom-viewer
# Install dependencies
npm install
# Set the transcript directory (optional, defaults to ./transcripts)
export TRANSCRIPT_DIR=/path/to/your/transcripts
# Start the development server
npm run devThe viewer will be available at http://localhost:3000.
Development Commands
# Development
npm run dev # Start dev server on localhost:3000
npm run dev:runpod # Start dev server on 0.0.0.0:3000 (for remote environments)
# Building and Preview
npm run build # Build for production
npm run preview # Preview production build on localhost:3000
npm run start:runpod # Preview production build on 0.0.0.0:8080
# Code Quality
npm run check # Run svelte-check for type checking
npm run lint # Run prettier and eslint checks
npm run format # Format code with prettierEnvironment Variables
TRANSCRIPT_DIR: Path to transcript directory (defaults to./transcripts)PORT: Server port (defaults to 3000)HOST: Server host (defaults to localhost)
Development
Tech Stack
- Framework: SvelteKit 5 with Svelte 5 runes
- Styling: Tailwind CSS 4 + DaisyUI 5
- Server: Node adapter with Express for CLI mode
- Data Structures: TanStack Table + TanStack Virtual for virtualization
Key Features Implementation
Score Color Scheme
Scores are color-coded using a translucent gradient:
- 1-2.5: White with gray border
- 2.6-5: Translucent yellow
- 5.1-7.5: Translucent orange
- 7.6-10: Translucent red
The color logic is centralized in src/lib/shared/score-utils.ts.
Metajudge Integration
Metajudge data is loaded on-demand when configurations are expanded. The viewer:
- Reads
judgment.jsonfrom each config directory - Displays summary statistics as color-coded badges
- Shows metajudge report and justification in collapsible sections
- Strips XML tags and duplicate content from responses
Clickable Quote References
Judge justifications can contain quoted text that links to specific messages:
- Quotes in justification text are matched against
highlightsin transcript metadata - Numbered reference links
[1],[2], etc. are added after quotes - Clicking a reference scrolls to and highlights the corresponding message in the conversation
Project Structure
src/
├── lib/
│ ├── client/ # Client-side code
│ │ ├── components/ # Svelte components
│ │ │ ├── common/ # Reusable UI components
│ │ │ ├── table/ # Table-specific components
│ │ │ └── transcript/ # Transcript viewer components
│ │ ├── utils/ # Client utilities
│ │ └── stores.ts # Global state management
│ ├── server/ # Server-side code
│ │ ├── cache/ # Transcript caching system
│ │ ├── data-loading/ # Bulk loading and scanning
│ │ └── core/ # Individual transcript loading
│ └── shared/ # Shared utilities and types
│ ├── types.ts # TypeScript type definitions
│ ├── transcript-parser.ts # Event parsing logic
│ └── score-utils.ts # Score color scheme
└── routes/
├── api/ # API endpoints
│ ├── transcripts/ # Transcript data endpoints
│ └── judgment/ # Metajudge data endpoints
└── transcript/[...path]/ # Dynamic transcript viewing