@elgap/edukaai
v0.2.1-beta.0
Published
Dataset Management for LLM Fine-Tuning with zero setup
Downloads
37
Maintainers
Readme
EdukaAI
Privacy-first, simple training data management for LLM fine-tuning
EdukaAI is a local, self-hosted web application designed to help you collect, organize, and manage training data for fine-tuning Large Language Models (LLMs). Built for privacy-conscious developers and AI enthusiasts who want full control over their data.

🎯 Why EdukaAI?
Privacy First: Your data never leaves your machine. Local SQLite database, no cloud dependencies, no data tracking.
Beginner Friendly: Clean, intuitive interface. No complex setup. Start collecting training samples in minutes.
Powerful for Experts: Bulk operations, import/export in multiple formats, fine-grained status tracking, goal management, and Live Capture integration.
Zero Configuration: Works out of the box. Just run and start building your dataset.
✨ Key Features
📊 Dataset Management
- Create multiple datasets for different fine-tuning projects
- Set custom goals and track progress with visual indicators
- Organize datasets by purpose (coding, creative writing, Q&A, etc.)
📝 Training Sample Management
- Core Fields: Instruction, Input, Output, System Prompt
- Rich Metadata: Category, Difficulty, Quality Rating (1-5 stars), Tags, Notes
- Status Tracking: Draft → In Review → Approved/Rejected workflow
- Bulk Operations: Select multiple samples and approve, categorize, or delete
📥 Import & Export
- Import: JSON files (Alpaca, ShareGPT formats), sample datasets
- Export: Multiple formats compatible with major training platforms
- Alpaca (JSON)
- ShareGPT (JSON)
- Raw JSON
- JSONL
- CSV
🎨 Workflow Features
- Keyboard Shortcuts: Ctrl+Enter to save, Esc to cancel
- Progress Tracking: Milestones (10%, 25%, 50%, 100%) with visual indicators
- Sample Navigation: Previous/Next buttons to quickly review samples
- Filtering: By status, category, source, quality rating
🔴 Live Capture (New in 0.2.1-beta.0)
Real-time data collection from coding agents and AI assistants. Perfect for capturing high-quality training examples as you work.
- Universal API: Simple REST endpoint for any integration
- Source Management: Register and manage multiple capture sources
- Default Configuration: Set default dataset, status, and quality for captures
- Enable/Disable: Toggle live capture on/off as needed
- Duplicate Detection: Automatic deduplication with similarity matching
- Metadata Enrichment: Auto-categorization and quality scoring
Example use cases:
- Capture conversations from coding assistants (OpenCode, Continue.dev, etc.)
- Collect AI pair programming sessions
- Build datasets from real-world problem-solving workflows
- Stream training data from automated agents
🔒 Privacy & Security
- 100% Local: SQLite database stored on your machine
- No Cloud: No internet connection required after installation
- No Tracking: Zero analytics, zero data collection
- Open Source: Full transparency
🚀 Quick Start
NPM Package Installation (Recommended)
The easiest way to use EdukaAI is via the npm package:
Option 1: npx (No Installation)
npx @elgap/edukaaiOption 2: Global Install
npm install -g @elgap/edukaai
edukaaiThen open http://localhost:3030 in your browser.
📡 Live Capture API
Integrate EdukaAI with your coding agents and AI assistants for seamless data collection.
Quick Integration Example
# Capture a conversation curl -X POST http://localhost:3030/api/capture \ -H "Content-Type: application/json" \ -d '{
"source": "my-coding-agent",
"apiVersion": "1.0",
"records": [
{
"instruction": "Explain recursion in Python",
"output": "Recursion is when a function calls itself...",
"context": {
"model": { "name": "claude-3-sonnet" },
"files": [{ "path": "example.py", "content": "def factorial(n):..." }]
}
}
]
}'Configuration
Configure Live Capture settings via the Import page:
- Default Dataset: Where captured samples are stored
- Default Status: Draft (for review) or Approved (ready for training)
- Default Quality: 1-5 star rating for captured samples
- Enable/Disable: Toggle live capture on/off
API Documentation
Full API documentation is available at http://localhost:3030/docs when running EdukaAI.
Endpoint: POST /api/capture
Request Format (Universal EdukaAI Record):
{
"source": "your-source-key",
"apiVersion": "1.0",
"records": [
{
"instruction": "The user's question or task",
"output": "The AI's response",
"input": "Optional additional context",
"systemPrompt": "Optional system instructions",
"category": "coding",
"difficulty": "intermediate",
"qualityRating": 4,
"tags": ["python", "algorithms"],
"context": {
"files": [...],
"model": { "name": "gpt-4" },
"tokens": { "input": 100, "output": 500 }
}
}
],
"options": {
"datasetId": 1,
"autoApprove": false,
"skipDuplicates": true
}
}💻 CLI Reference
EdukaAI provides a powerful CLI for managing your training data workflow:
Available Commands
| Command | Description |
| ----------------------- | -------------------------------- |
| edukaai | Start server |
| edukaai reset | Reset database with confirmation |
| edukaai reset --force | Force reset without confirmation |
| edukaai clean | Alias for reset |
| edukaai help | Show help and available commands |
More to come soon. Stay tuned!
Environment Variables Supported:
- EDUKAAI_HOST (default: localhost)
- EDUKAAI_PORT (default: 3030)
- EDUKAAI_DATA_DIR (default: ~/.edukaai)
- DATABASE_URL (default: ./data/edukaai.db)
📖 Usage Guide
Creating Training Samples
Each training sample represents one example for your model:
Instruction: "Explain the concept of machine learning in simple terms"
Input: "" (optional - leave empty for direct instruction)
Output: "Machine learning is like teaching a computer to recognize patterns..."
System Prompt: "You are a helpful AI assistant" (optional)
Category: "explanation"
Quality: ⭐⭐⭐⭐⭐Dataset Organization
Think of datasets as projects:
- 🎯 Coding Examples: Programming problems and solutions
- 🎯 Creative Writing: Story prompts and completions
- 🎯 Q&A Pairs: Question-answer training data
- 🎯 Roleplay: Character-based conversations
- 🎯 Agent Sessions: Real-time captures from AI assistants
Quality Workflow
Track your samples through the review process:
- 📝 Draft: Work in progress, not ready
- 👀 In Review: Needs review before approval
- ✅ Approved: Ready for training
- ❌ Rejected: Not suitable (won't be exported)
Importing Existing Data
Have training data in JSON format?
# Prepare your JSON file (Alpaca format)
[
{
"instruction": "Your instruction here",
"input": "Optional input",
"output": "Expected output",
"category": "coding"
}
]Then use the Import page to upload and automatically categorize.
Live Capture from Coding Agents
- Install your preferred coding agent (e.g., OpenCode, Continue.dev)
- Configure the agent to point to your EdukaAI instance
- Set defaults in EdukaAI (Import → Configure Live Capture)
- Work normally - conversations are automatically captured
- Review and approve captured samples in EdukaAI
The Live Capture endpoint supports:
- Automatic categorization based on content
- Code snippet context preservation
- Model and token usage tracking
- Duplicate detection to avoid storing similar conversations
💻 For Developers
Tech Stack
- Frontend: Vue 3 + Nuxt 4 + Tailwind CSS
- Backend: Nuxt 4 API routes (Server-side rendering)
- Database: SQLite (local file)
- ORM: Drizzle ORM
Project Structure
edukaai/
├── app/ # Nuxt 4 application
│ ├── components/ # Vue components
│ ├── layouts/ # Page layouts
│ ├── pages/ # Routes (index, samples, import, export, docs)
│ └── components/ # Reusable UI components
├── server/ # Backend API
│ ├── api/ # REST endpoints
│ ├── db/ # Database schema & migrations
│ └── utils/ # Server utilities
├── bin/ # CLI scripts
└── package.jsonBuilding from Source
# Clone the repository
git clone https://github.com/elgap/edukaai.git
cd edukaai
# Install dependencies
npm install
# Run in development mode
npm run dev
# Optionally, build for production
npm run build
npm run startCLI Commands
# Reset database (with migrations)
npm run db:reset
# Run tests
npm run test
# Type checking
npm run typecheck
# Linting
npm run lint🤝 Contributing
Contributions are welcome. We will publish contribution guidelines soon.
📄 License
MIT License - see LICENSE file for details.
🙏 Acknowledgments
- Inspired by the need for simple, private LLM training tools
- Built with Nuxt, Vue, and Tailwind
- Icons by Lucide
Built with ❤️ for the AI community
