npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@elgap/edukaai

v0.2.1-beta.0

Published

Dataset Management for LLM Fine-Tuning with zero setup

Readme

EdukaAI

Privacy-first, simple training data management for LLM fine-tuning

npm version CI npm downloads License: MIT

EdukaAI is a local, self-hosted web application designed to help you collect, organize, and manage training data for fine-tuning Large Language Models (LLMs). Built for privacy-conscious developers and AI enthusiasts who want full control over their data.

EdukaAI Screenshot

🎯 Why EdukaAI?

Privacy First: Your data never leaves your machine. Local SQLite database, no cloud dependencies, no data tracking.

Beginner Friendly: Clean, intuitive interface. No complex setup. Start collecting training samples in minutes.

Powerful for Experts: Bulk operations, import/export in multiple formats, fine-grained status tracking, goal management, and Live Capture integration.

Zero Configuration: Works out of the box. Just run and start building your dataset.

✨ Key Features

📊 Dataset Management

  • Create multiple datasets for different fine-tuning projects
  • Set custom goals and track progress with visual indicators
  • Organize datasets by purpose (coding, creative writing, Q&A, etc.)

📝 Training Sample Management

  • Core Fields: Instruction, Input, Output, System Prompt
  • Rich Metadata: Category, Difficulty, Quality Rating (1-5 stars), Tags, Notes
  • Status Tracking: Draft → In Review → Approved/Rejected workflow
  • Bulk Operations: Select multiple samples and approve, categorize, or delete

📥 Import & Export

  • Import: JSON files (Alpaca, ShareGPT formats), sample datasets
  • Export: Multiple formats compatible with major training platforms
    • Alpaca (JSON)
    • ShareGPT (JSON)
    • Raw JSON
    • JSONL
    • CSV

🎨 Workflow Features

  • Keyboard Shortcuts: Ctrl+Enter to save, Esc to cancel
  • Progress Tracking: Milestones (10%, 25%, 50%, 100%) with visual indicators
  • Sample Navigation: Previous/Next buttons to quickly review samples
  • Filtering: By status, category, source, quality rating

🔴 Live Capture (New in 0.2.1-beta.0)

Real-time data collection from coding agents and AI assistants. Perfect for capturing high-quality training examples as you work.

  • Universal API: Simple REST endpoint for any integration
  • Source Management: Register and manage multiple capture sources
  • Default Configuration: Set default dataset, status, and quality for captures
  • Enable/Disable: Toggle live capture on/off as needed
  • Duplicate Detection: Automatic deduplication with similarity matching
  • Metadata Enrichment: Auto-categorization and quality scoring

Example use cases:

  • Capture conversations from coding assistants (OpenCode, Continue.dev, etc.)
  • Collect AI pair programming sessions
  • Build datasets from real-world problem-solving workflows
  • Stream training data from automated agents

🔒 Privacy & Security

  • 100% Local: SQLite database stored on your machine
  • No Cloud: No internet connection required after installation
  • No Tracking: Zero analytics, zero data collection
  • Open Source: Full transparency

🚀 Quick Start

NPM Package Installation (Recommended)

The easiest way to use EdukaAI is via the npm package:

Option 1: npx (No Installation)

npx @elgap/edukaai

Option 2: Global Install

npm install -g @elgap/edukaai
edukaai

Then open http://localhost:3030 in your browser.

📡 Live Capture API

Integrate EdukaAI with your coding agents and AI assistants for seamless data collection.

Quick Integration Example

# Capture a conversation curl -X POST http://localhost:3030/api/capture \ -H "Content-Type: application/json" \ -d '{
  "source": "my-coding-agent",
  "apiVersion": "1.0",
  "records": [
    {
      "instruction": "Explain recursion in Python",
      "output": "Recursion is when a function calls itself...",
      "context": {
        "model": { "name": "claude-3-sonnet" },
        "files": [{ "path": "example.py", "content": "def factorial(n):..." }]
      }
    }
  ]
}'

Configuration

Configure Live Capture settings via the Import page:

  • Default Dataset: Where captured samples are stored
  • Default Status: Draft (for review) or Approved (ready for training)
  • Default Quality: 1-5 star rating for captured samples
  • Enable/Disable: Toggle live capture on/off

API Documentation

Full API documentation is available at http://localhost:3030/docs when running EdukaAI.

Endpoint: POST /api/capture

Request Format (Universal EdukaAI Record):

{
  "source": "your-source-key",
  "apiVersion": "1.0",
  "records": [
    {
      "instruction": "The user's question or task",
      "output": "The AI's response",
      "input": "Optional additional context",
      "systemPrompt": "Optional system instructions",
      "category": "coding",
      "difficulty": "intermediate",
      "qualityRating": 4,
      "tags": ["python", "algorithms"],
      "context": {
        "files": [...],
        "model": { "name": "gpt-4" },
        "tokens": { "input": 100, "output": 500 }
      }
    }
  ],
  "options": {
    "datasetId": 1,
    "autoApprove": false,
    "skipDuplicates": true
  }
}

💻 CLI Reference

EdukaAI provides a powerful CLI for managing your training data workflow:

Available Commands

| Command | Description | | ----------------------- | -------------------------------- | | edukaai | Start server | | edukaai reset | Reset database with confirmation | | edukaai reset --force | Force reset without confirmation | | edukaai clean | Alias for reset | | edukaai help | Show help and available commands |

More to come soon. Stay tuned!

Environment Variables Supported:

  • EDUKAAI_HOST (default: localhost)
  • EDUKAAI_PORT (default: 3030)
  • EDUKAAI_DATA_DIR (default: ~/.edukaai)
  • DATABASE_URL (default: ./data/edukaai.db)

📖 Usage Guide

Creating Training Samples

Each training sample represents one example for your model:

Instruction: "Explain the concept of machine learning in simple terms"
Input: "" (optional - leave empty for direct instruction)
Output: "Machine learning is like teaching a computer to recognize patterns..."
System Prompt: "You are a helpful AI assistant" (optional)
Category: "explanation"
Quality: ⭐⭐⭐⭐⭐

Dataset Organization

Think of datasets as projects:

  • 🎯 Coding Examples: Programming problems and solutions
  • 🎯 Creative Writing: Story prompts and completions
  • 🎯 Q&A Pairs: Question-answer training data
  • 🎯 Roleplay: Character-based conversations
  • 🎯 Agent Sessions: Real-time captures from AI assistants

Quality Workflow

Track your samples through the review process:

  • 📝 Draft: Work in progress, not ready
  • 👀 In Review: Needs review before approval
  • Approved: Ready for training
  • Rejected: Not suitable (won't be exported)

Importing Existing Data

Have training data in JSON format?

# Prepare your JSON file (Alpaca format)
[
  {
    "instruction": "Your instruction here",
    "input": "Optional input",
    "output": "Expected output",
    "category": "coding"
  }
]

Then use the Import page to upload and automatically categorize.

Live Capture from Coding Agents

  1. Install your preferred coding agent (e.g., OpenCode, Continue.dev)
  2. Configure the agent to point to your EdukaAI instance
  3. Set defaults in EdukaAI (Import → Configure Live Capture)
  4. Work normally - conversations are automatically captured
  5. Review and approve captured samples in EdukaAI

The Live Capture endpoint supports:

  • Automatic categorization based on content
  • Code snippet context preservation
  • Model and token usage tracking
  • Duplicate detection to avoid storing similar conversations

💻 For Developers

Tech Stack

  • Frontend: Vue 3 + Nuxt 4 + Tailwind CSS
  • Backend: Nuxt 4 API routes (Server-side rendering)
  • Database: SQLite (local file)
  • ORM: Drizzle ORM

Project Structure

edukaai/
├── app/                 # Nuxt 4 application
│   ├── components/      # Vue components
│   ├── layouts/         # Page layouts
│   ├── pages/           # Routes (index, samples, import, export, docs)
│   └── components/      # Reusable UI components
├── server/             # Backend API
│   ├── api/            # REST endpoints
│   ├── db/             # Database schema & migrations
│   └── utils/          # Server utilities
├── bin/                # CLI scripts
└── package.json

Building from Source

# Clone the repository
git clone https://github.com/elgap/edukaai.git
cd edukaai

# Install dependencies
npm install

# Run in development mode
npm run dev

# Optionally, build for production
npm run build
npm run start

CLI Commands

# Reset database (with migrations)
npm run db:reset

# Run tests
npm run test

# Type checking
npm run typecheck

# Linting
npm run lint

🤝 Contributing

Contributions are welcome. We will publish contribution guidelines soon.

📄 License

MIT License - see LICENSE file for details.

🙏 Acknowledgments

  • Inspired by the need for simple, private LLM training tools
  • Built with Nuxt, Vue, and Tailwind
  • Icons by Lucide

Built with ❤️ for the AI community

⬆ Back to Top