npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

omp-lilac-provider

v1.0.0

Published

Lilac provider plugin for OMP — Access Kimi K2.6, GLM 5.1, Gemma 4, and MiniMax M2.7 models through Lilac's OpenAI-compatible API on idle GPUs

Downloads

130

Readme

💜 omp-lilac-provider

Kimi K2.6, GLM 5.1, Gemma 4 & more on idle GPUs via Lilac

A OMP provider plugin for cost-efficient GPU inference.

OMP plugin license


Access Kimi K2.6, GLM 5.1, MiniMax M2.7, and Gemma 4 models through Lilac's OpenAI-compatible API on idle GPUs.

Features

  • 4 AI Models — Kimi K2.6, GLM 5.1, Gemma 4, and MiniMax M2.7
  • OpenAI-Compatible API — Just change the base URL and API key
  • Cost Tracking — Per-model pricing with cache read discounts
  • Reasoning Models — Chain-of-thought via chat_template_kwargs (all models)
  • Vision Support — Image input on Kimi K2.6 and Gemma 4
  • Context Caching — Cache read pricing on Kimi K2.6 and GLM 5.1
  • Idle GPU Scheduling — Lilac leverages idle GPU capacity for cost-efficient inference
  • Live Model Sync — Stale-while-revalidate: serve cached models instantly, hot-swap from the API in the background
  • Discount Tracking — Fetches subscription discounts from the /status endpoint and applies them to model costs

Quickstart

# 1. Install
omp plugin install omp-lilac-provider

# 2. Add your API key
omp
/login lilac

# 3. Pick a model and go
/model lilac

That's it. Lilac models now appear in /model. No -e flag, no manual clone, no config files.

Models

| Model | Context | Vision | Reasoning | Input $/M | Cache Read $/M | Output $/M | |-------|---------|--------|-----------|-----------|-----------------|------------| | Gemma 4 | 262K | ✅ | ✅ | $0.11 | — | $0.35 | | GLM 5.1 | 203K | ❌ | ✅ | $0.90 | $0.27 | $3.00 | | Kimi K2.6 | 262K | ✅ | ✅ | $0.70 | $0.20 | $3.50 | | MiniMax M2.7 | 205K | ❌ | ✅ | $0.30 | $0.06 | $1.20 |

Costs are per million tokens. Prices subject to change — check getlilac.com for current pricing.

Notes:

  • Gemma 4 has reasoning off by default — OMP enables it when you set a thinking level (Shift+Tab)
  • Kimi K2.6 and GLM 5.1 have reasoning on by default
  • Cache read pricing applies to repeated input tokens served from cache on supported models
  • Gemma 4 does not support cache read pricing

API key

/login lilac prompts for your Lilac API key, validates it against Lilac's authenticated chat-completions endpoint, and stores it. Or set it explicitly:

export LILAC_API_KEY=your-api-key

Get a key at getlilac.com.

Other install paths

# From GitHub
omp plugin install https://github.com/ryan-brosas/omp-lilac-provider

# Local development
git clone https://github.com/ryan-brosas/omp-lilac-provider.git
omp plugin link ./omp-lilac-provider

Usage

After loading the extension, use the /model command in OMP to select your preferred model:

/model lilac moonshotai/kimi-k2.6

Or start OMP directly with a Lilac model:

omp --provider lilac --model moonshotai/kimi-k2.6

Thinking Mode

All Lilac models support chain-of-thought reasoning via chat_template_kwargs. OMP uses the qwen-chat-template thinking format to send both thinking and enable_thinking keys, which works across all model families:

  • Kimi K2.6: Honors thinking key (Moonshot template)
  • GLM 5.1: Honors enable_thinking key (Z.ai template)
  • Gemma 4: Honors enable_thinking key (Google template)

In OMP, reasoning models automatically use the appropriate thinking format. Use Shift+Tab to control thinking level.

Vision

Kimi K2.6 and Gemma 4 support image inputs. Pass images in messages and OMP will handle the formatting automatically.

Gemma 4 also supports video by accepting a sequence of frames as images.

Model Resolution

Models are discovered from the Lilac /v1/models API and stored in models.json. Custom definitions and overrides are layered via patch.json and custom-models.json.

The extension uses a stale-while-revalidate strategy for zero-latency startup:

  1. Serve stale immediately: disk cache → embedded models.json (zero-latency)
  2. Revalidate in background: live API /models → merge with embedded → cache → hot-swap
  3. patch.json + custom-models.json applied on top of whichever source won

| File | Purpose | |---|---| | models.json | Auto-generated from Lilac API (model discovery). Regenerated by node scripts/update-models.js — do not edit manually | | patch.json | Manual overrides (reasoning, compat, notes, limits, etc.) applied on top of models.json | | custom-models.json | Models not available via the API (e.g. per-slug endpoint models) |

Models are loaded by merging models.json → apply patch.json → merge custom-models.json.

Adding Custom Models

To customize:

  • Override an existing model: Add entries to patch.json (reasoning, compat, notes, maxTokens, etc.)
  • Add new models not in the API: Add entries to custom-models.json:
[
  {
    "id": "my-org/my-model",
    "name": "My Custom Model",
    "reasoning": false,
    "input": ["text"],
    "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 },
    "contextWindow": 131072,
    "maxTokens": 16384,
    "baseUrl": "https://api.getlilac.com/my-model-slug/v1"
  }
]

API Notes

  • Each model is accessible at https://api.getlilac.com/v1/chat/completions (unified endpoint)
  • The API is OpenAI-compatible (chat completions format)
  • All models are hosted on vLLM
  • Lilac serves models via a customized fork of vLLM tuned for idle-GPU scheduling and shared warm endpoints

vLLM Caveats

These issues are common to all vLLM-hosted providers and affect Lilac models:

  • GLM 5.1 intermittent tool call loss: vLLM's streaming parser intermittently emits finish_reason: "tool_calls" without any delta.tool_calls chunks — even with tool_stream: true (set via zaiToolStream in compat). OMP maps this to stopReason: "toolUse" with zero toolCall blocks, causing an "abrupt stop". The extension's message_end handler converts this to a retryable error that triggers OMP's built-in auto-retry mechanism, so the agent automatically re-prompts and typically succeeds on the next attempt.
  • GLM 5.1 chain-of-thought leakage: On the current vLLM build, disabling reasoning on GLM 5.1 may still leak chain-of-thought into content terminated by a marker. Post-process the response to discard text up to and including the first when reasoning is disabled. See vllm-project/vllm#31319.
  • Gemma 4 reasoning parser: vLLM's reasoning parser can fail to populate the reasoning field when special tokens are stripped before the parser runs. Clients that require a clean split should post-process <|channel|>thought ... <|channel|> markers. See vllm-project/vllm#38855.
  • Gemma 4 structured output: Combining enable_thinking: false with response_format: json_schema can silently disable xgrammar-backed structured output. If you rely on structured output with Gemma 4, leave thinking enabled or validate output client-side. See vllm-project/vllm#39130.

Compat Settings

Lilac's API is OpenAI-compatible with these specifics:

  • thinkingFormat: "qwen-chat-template" — All reasoning models. Lilac uses chat_template_kwargs (with thinking and enable_thinking keys) to toggle reasoning. OMP sends both keys for forward compatibility.
  • maxTokensField: "max_completion_tokens" — All models. Lilac supports max_completion_tokens (preferred for reasoning models as it includes reasoning tokens).
  • supportsDeveloperRole: true — All models. Lilac's vLLM backend maps the developer role to system.
  • supportsStore: false — All models. Lilac doesn't support the store parameter.

Updating Models

Run the update script to fetch the latest models from Lilac's API:

export LILAC_API_KEY=your-api-key
node scripts/update-models.js

This will:

  1. Fetch models from https://api.getlilac.com/v1/models
  2. Convert per-token pricing to per-million-tokens
  3. Preserve existing curated data (pricing, compat) for known models
  4. Apply overrides from patch.json
  5. Update models.json and the README model table

A GitHub Actions workflow runs this daily and creates a PR if models have changed.

Environment Variables

| Variable | Required | Description | |----------|----------|-------------| | LILAC_API_KEY | No | Your Lilac API key (fallback if not stored via /login) |

License

MIT