npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

osmia-ai

v0.3.1

Published

AI-powered data enrichment CLI tool

Readme

🐝 Osmia AI

A stateless, AI-powered CLI tool for data enrichment. Unix philosophy: File-In ➔ File-Out.

Overview

Osmia takes raw JSON/JSONL data, enriches it via web search + LLM, and outputs enhanced data without introducing a database or backend.

cat input.json | npx osmia-ai --config config.yaml > enriched.json

Features

  • Stateless: Pure data transformation, no persistent state
  • Unix Pipes: Native stdin/stdout support
  • Resilient: Retries with backoff and 429 handling for search and LLM calls
  • Concurrent: Configurable workers with separate throttles for search and LLM
  • Smart Skip: Skip already-enriched records (--skip-if-exists)
  • Configurable: YAML config with templated search queries
  • JSONL Support: Works with JSONL input and output formats

Installation

Requires Node.js 24 LTS or newer.

npm install -g osmia-ai
# or use directly
npx osmia-ai --config config.yaml --input data.json --output enriched.json

Quick Start

  1. Create config.yaml

    osmia-ai init

    The new wizard asks for your LLM settings, search template, extraction prompt, and schema fields, then writes a valid YAML config for you. Run it in an interactive terminal, not via a pipe or CI stdin.

  2. Set API keys (depends on your search provider — default is Exa):

    export OLLAMA_API_KEY="your-ollama-cloud-api-key"
    export EXA_API_KEY="your-exa-api-key"
  3. Run:

    osmia-ai --config config.yaml --input data.json --output enriched.json

Try the bundled examples

Sample data and ready-made configs live in examples/:

| File | Purpose | | --- | --- | | catalog-config.yaml | Standard catalog enrichment (Exa search) | | catalog-batch-config.yaml | Same schema, conservative rate limits for large batches | | catalog-duckduckgo-config.yaml | Same schema, no search API key required | | sample-input.json | Two sample products (JSON array) | | sample-input.jsonl | Same records as JSONL |

export OLLAMA_API_KEY="your-ollama-cloud-api-key"
export EXA_API_KEY="your-exa-api-key"

osmia-ai \
  --config examples/catalog-config.yaml \
  --input examples/sample-input.json \
  --output enriched.json

For a quick local try without an Exa key, use the DuckDuckGo example instead:

export OLLAMA_API_KEY="your-ollama-cloud-api-key"

osmia-ai \
  --config examples/catalog-duckduckgo-config.yaml \
  --input examples/sample-input.json \
  --output enriched.json

Usage

Usage: osmia-ai [options]

Options:
  -c, --config <path>            YAML configuration file
  -i, --input <path>             Input JSON/JSONL file (reads stdin if not provided)
  -o, --output <path>           Output file (writes stdout if not provided)
  -s, --skip-if-exists <fields>  Comma-separated fields to skip if non-empty
  -w, --workers <n>             Concurrent workers (default: 1)
  --dry-run                     Simulate without LLM calls
  --wizard [path]               Launch an interactive wizard and create a YAML config file
  -v, --verbose                 Verbosity (use -v or -vv)

Create a config interactively:

osmia-ai init
# or
osmia-ai --wizard config.yaml

Examples

Basic Usage

osmia-ai --config config.yaml --input data.json --output enriched.json

Generate Config Interactively

osmia-ai init config.yaml

Unix Pipe

cat data.json | osmia-ai --config config.yaml > enriched.json

With Skip Logic

osmia-ai -c config.yaml -i data.json -o enriched.json -s category,description,specs

Concurrent Processing

osmia-ai --config config.yaml --input data.json --workers 5 --verbose

Dry Run (Debug Prompts)

osmia-ai --config config.yaml --input data.json --dry-run -vv

Configuration

Templating: Use {fieldName} placeholders in searchQuery—they're replaced from input records.

Use config.yaml.template for the canonical default structure. The examples/ directory adds catalog-focused configs and sample input data. osmia-ai init is the fastest way to generate a valid starting point interactively.

Search providers

Set research.provider in your YAML config. Supported values: exa (default), duckduckgo, google, ollama.

| Provider | Required environment variables | | --- | --- | | exa | EXA_API_KEY | | duckduckgo | none | | google | GOOGLE_API_KEY, GOOGLE_SEARCH_ENGINE_ID | | ollama | OLLAMA_API_KEY |

The LLM always uses the key named by llm.apiKeyEnv (default: OLLAMA_API_KEY).

Use Cases

  • E-commerce: Enrich product catalogs with specs and descriptions
  • Research: Augment datasets with web metadata
  • Content: Generate summaries, tags, categorizations
  • Contacts: Enrich contact lists with company info

Development

nvm use
npm install
npm run build
npm test

Both camelCase and legacy snake_case config keys are accepted when loading YAML files.

Runs abort before writing output if any record fails, so batch jobs do not silently leave behind partial result files.

For large batches, start conservatively with --workers 2 or --workers 3 and increase requestsPerMinute only after confirming that both your search provider and LLM endpoint accept the traffic without returning 429 responses.

License

MIT