npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, πŸ‘‹, I’m Ryan HefnerΒ  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you πŸ™

Β© 2026 – Pkg Stats / Ryan Hefner

donut-corp

v1.0.0

Published

🍩 Local LLM runtime optimized for minimal disk usage β€” run AI models locally without the bloat.

Readme

🍩 Donut Corp

Local LLM runtime designed for minimal disk usage β€” run open-source AI models without the bloat.

  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ•—   β–ˆβ–ˆβ•—β–ˆβ–ˆβ•—   β–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—
  β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ•—  β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘   β–ˆβ–ˆβ•‘β•šβ•β•β–ˆβ–ˆβ•”β•β•β•
  β–ˆβ–ˆβ•‘  β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘   β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β–ˆβ–ˆβ•— β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘   β–ˆβ–ˆβ•‘   β–ˆβ–ˆβ•‘   
  β–ˆβ–ˆβ•‘  β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘   β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘β•šβ–ˆβ–ˆβ•—β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘   β–ˆβ–ˆβ•‘   β–ˆβ–ˆβ•‘   
  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•β•šβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ•‘ β•šβ–ˆβ–ˆβ–ˆβ–ˆβ•‘β•šβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•   β–ˆβ–ˆβ•‘   
  β•šβ•β•β•β•β•β•  β•šβ•β•β•β•β•β• β•šβ•β•  β•šβ•β•β•β• β•šβ•β•β•β•β•β•    β•šβ•β•   
  CORP β€” local AI, minimal space

Why Donut?

| | Ollama | Donut Corp | |---|---|---| | Smallest usable model | ~3.8GB | 320MB | | Default quantization | Q4 | Q4 (configurable) | | KV cache compression | ❌ | βœ… Q4 KV cache | | Partial download resume | ❌ | βœ… | | OpenAI-compatible API | βœ… | βœ… |

Donut defaults to the smallest viable quantization for each model and uses Q4 KV-cache during inference to cut RAM usage by up to 40% compared to full-precision caches.


Install

npm install -g donut-corp

Requirements:

  • Node.js β‰₯ 18
  • ~500MB–5GB free disk space (per model)
  • 4GB+ RAM (8GB recommended for 7B models)

Quick Start

# Download the smallest model (638MB!)
donut pull tinyllama

# Chat interactively
donut run tinyllama

# Or run a bigger model (auto-downloads)
donut run llama3.2-3b

# Start OpenAI-compatible server
donut serve

Commands

donut pull <model>

Download a model with optimal quantization.

donut pull tinyllama          # 638MB β€” default Q4
donut pull llama3.2-3b        # 2GB
donut pull mistral-7b --quant q3  # 3.1GB (smaller, lower quality)
donut pull mistral-7b --quant q5  # 4.8GB (larger, higher quality)

donut run <model>

Chat interactively with a model.

donut run tinyllama
donut run llama3.2-3b --ctx 4096   # bigger context window
donut run mistral-7b --threads 8

In-chat commands:

  • /clear β€” reset conversation context
  • /history β€” show message count
  • /exit β€” quit

donut list

Show downloaded models and disk usage.

donut list
donut list --sort size

donut info <model>

Show available quantizations and download status.

donut info llama3.2-3b

donut remove <model>

Remove a model and free disk space.

donut remove tinyllama
donut remove mistral-7b:q3   # specific quant variant

donut prune

Remove partial/incomplete downloads.

donut prune
donut prune --dry-run    # preview first

donut serve

Start the OpenAI-compatible API server.

donut serve                     # localhost:11434
donut serve --port 8080
donut serve --host 0.0.0.0     # expose to network

API endpoints:

  • GET /v1/models β€” list installed models
  • POST /v1/chat/completions β€” chat (streaming supported)
  • GET /health β€” server status

Available Models

| Model | Params | Q4 Size | Best For | |-------|--------|---------|----------| | tinyllama | 1.1B | 638MB | Testing, low-power devices | | qwen2.5-1.5b | 1.5B | 986MB | Multilingual tasks | | gemma2-2b | 2B | 1.6GB | Google's small powerhouse | | phi3-mini | 3.8B | 2.2GB | Strong reasoning | | llama3.2-3b | 3B | 2.0GB | General purpose | | mistral-7b | 7B | 4.1GB | Best quality 7B |


Disk Savings Explained

Donut uses GGUF quantization (via llama.cpp) to dramatically shrink model files:

mistral-7b (original FP16):  ~14GB
mistral-7b Q8:               ~7.2GB
mistral-7b Q4 (default):     ~4.1GB  ← Donut default
mistral-7b Q3:               ~3.1GB
mistral-7b Q2:               ~2.5GB  (noticeable quality loss)

Q4_K_M is the sweet spot: ~70% smaller than the original with minimal quality loss.

During inference, Donut also quantizes the KV attention cache to Q4 (instead of FP16), reducing RAM usage by ~30-40% during chat β€” so you can run larger models on smaller machines.


API Usage Example

// Works with any OpenAI-compatible client
import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'http://localhost:11434/v1',
  apiKey: 'donut',  // not checked
});

const response = await client.chat.completions.create({
  model: 'llama3.2-3b:q4',
  messages: [{ role: 'user', content: 'Hello!' }],
  stream: true,
});

for await (const chunk of response) {
  process.stdout.write(chunk.choices[0]?.delta?.content || '');
}

License

MIT Β© Donut Corp