datahive
v1.0.0
Published
Capture, store, and analyze coding sessions for AI model training. A production-ready CLI tool with cloud sync.
Downloads
15
Maintainers
Readme
DataHive
Capture, store, and analyze your coding sessions for AI model training
🎯 Overview
DataHive is a production-ready CLI tool that captures terminal sessions with high fidelity and syncs them to the cloud. Built for developers, researchers, and teams training AI coding models on real-world development workflows.
Key Features
- 🎬 High-Fidelity Capture - Uses
node-ptyfor accurate terminal recording - ☁️ Cloud-Native - Real-time sync to Supabase with automatic fallback to local storage
- 🔧 Universal Compatibility - Works with any CLI tool (Claude, Cursor, Gemini, Vim, etc.)
- 🚀 Zero-Config Installation - Global NPM package, ready in seconds
- 📊 Structured Data - Outputs clean, queryable session metadata
- 🔒 Secure - Credentials stored locally with industry-standard encryption
📦 Installation
Prerequisites
- Node.js
>= 14.0.0(Download) - Supabase Account (Sign up free)
Quick Install
npm install -g datahiveVerify Installation
datahive --version🚀 Quick Start
1. Setup Supabase
First, create the required database table:
-- Run this in your Supabase SQL Editor
CREATE TABLE sessions (
id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
created_at TIMESTAMPTZ DEFAULT NOW(),
tool_name TEXT NOT NULL,
start_time TIMESTAMPTZ,
end_time TIMESTAMPTZ,
raw_log_path TEXT,
content TEXT
);
-- Enable Row Level Security
ALTER TABLE sessions ENABLE ROW LEVEL SECURITY;
-- Allow inserts from authenticated/anon users
CREATE POLICY "Enable insert for anon users"
ON sessions FOR INSERT TO anon WITH CHECK (true);
CREATE POLICY "Enable read for anon users"
ON sessions FOR SELECT TO anon USING (true);2. Configure DataHive
datahive config \
--url "https://your-project.supabase.co" \
--key "your-anon-key"💡 Find your credentials at:
Supabase Dashboard → Settings → API
3. Start Capturing
# Capture a Claude Code session
datahive claude
# Capture any CLI tool
datahive run vim script.py
datahive run cursor .
datahive run npm test📖 Usage
Commands
| Command | Description | Example |
|---------|-------------|---------|
| datahive config | Configure Supabase credentials | datahive config --url <url> --key <key> |
| datahive claude | Start a captured Claude Code session | datahive claude |
| datahive gemini | Start a captured Gemini CLI session | datahive gemini |
| datahive cursor | Start a captured Cursor session | datahive cursor |
| datahive run <cmd> | Capture any command | datahive run python app.py |
Examples
# Capture a debugging session
datahive run node --inspect app.js
# Capture a git workflow
datahive run git commit -m "feat: add feature"
# Capture an interactive session
datahive run python📊 Data Schema
Each captured session creates a record with:
{
id: UUID,
created_at: Timestamp,
tool_name: string, // e.g., "claude", "vim", "generic"
start_time: Timestamp,
end_time: Timestamp,
raw_log_path: string, // Local backup path
content: string // Full terminal output
}Querying Your Data
-- Get all Claude sessions from today
SELECT * FROM sessions
WHERE tool_name = 'claude'
AND created_at > CURRENT_DATE;
-- Calculate average session duration
SELECT AVG(EXTRACT(EPOCH FROM (end_time - start_time))) / 60 AS avg_minutes
FROM sessions;🔧 Configuration
Storage Locations
- Config:
~/.config/datahive/config.json(macOS/Linux) - Logs:
./raw_data/session_*.log(local backup)
Environment Variables
You can also configure via environment variables:
export SUPABASE_URL="https://your-project.supabase.co"
export SUPABASE_KEY="your-key"🐛 Troubleshooting
"Command not found: datahive"
Solution: Ensure NPM global bin is in PATH:
npm config get prefix # Should be in your PATH
export PATH="$(npm config get prefix)/bin:$PATH""Sync failed: Supabase credentials not found"
Solution: Re-run config:
datahive config --url <url> --key <key>Sessions not appearing in Supabase
Checklist:
- ✅ Verify credentials:
datahive config --url ... --key ... - ✅ Check table exists: Run the schema SQL above
- ✅ Verify RLS policies are set
- ✅ Check local logs exist:
ls raw_data/
🤝 Contributing
We welcome contributions! See docs/DEVELOPER.md for:
- Architecture overview
- Development setup
- Testing guidelines
- Code standards
📁 Project Structure
DataHive/
├── bin/
│ └── cli.js # CLI entry point
├── lib/
│ ├── capture.js # PTY session capture
│ ├── cleaner.js # Terminal output cleaning
│ ├── config.js # Configuration management
│ ├── db.js # Supabase database operations
│ └── exporter.js # JSONL export functionality
├── docs/
│ ├── schema.sql # Database schema
│ ├── DEVELOPER.md # Developer guide
│ ├── CHANGELOG.md # Version history
│ └── ... # Other documentation
├── examples/ # Sample export files
├── raw_data/ # Local session logs (gitignored)
└── exports/ # Generated exports (gitignored)📄 License
MIT © DataHive Contributors
