npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

omp-makora-provider

v1.0.1

Published

Makora provider plugin for OMP — Access DeepSeek V4, GLM 5.1, Kimi K2.6, Llama 3.3, Qwen 3.6, and more through the Makora inference API

Readme

🔁 omp-makora-provider

Open-weight models through Makora

DeepSeek V4, Kimi K2.6, GLM 5.1 / 5.2, Qwen 3.6 — with client-side tool call repair for OMP / pi.

OMP plugin npm license


Models

| Model | ID | Reasoning | Notes | |-------|----|-----------|-------| | DeepSeek V4 Flash | deepseek-ai/DeepSeek-V4-Flash | Yes | maxTokens 32768; include_reasoning + chat_template_kwargs.thinking via before_provider_request payload rewrite; returns reasoning field | | DeepSeek V4 Pro | deepseek-ai/DeepSeek-V4-Pro | Yes | maxTokens 32768; chat_template_kwargs.thinking via before_provider_request payload rewrite; returns reasoning_content field | | GLM 5.1 FP8 | zai-org/GLM-5.1-FP8 | Yes | maxTokens 16384; enable_thinking via qwen-chat-template; returns reasoning_content field; client-side tool call parsing (vLLM streaming parser bypass) | | GLM 5.2 FP8 | zai-org/GLM-5.2-FP8 | Yes | maxTokens 16384; enable_thinking via qwen-chat-template; returns reasoning field; native tool calls work in both stream and non-stream (no client-side repair needed) | | GPT-OSS 120B | openai/gpt-oss-120b | Yes | maxTokens 16384; reasoning always on | | Kimi K2.6 NVFP4 | nvidia/Kimi-K2.6-NVFP4 | Yes | maxTokens 16384; vision maxImagesPerRequest 5; reasoning on by default; client-side tool call parsing (vLLM streaming parser bypass) | | Kimi K2.7 Code | moonshotai/Kimi-K2.7-Code | Yes | maxTokens 16384; vision maxImagesPerRequest 5; reasoning on by default; client-side tool call parsing (vLLM streaming parser bypass) | | Llama 3.3 70B FP8 | amd/Llama-3.3-70B-Instruct-FP8-KV | No | maxTokens 16384; custom per-slug endpoint | | Llama 3.3 70B Instruct | meta-llama/Llama-3.3-70B-Instruct | No | maxTokens 8192; non-reasoning text-only model | | MiniMax M3 MXFP8 | MiniMaxAI/MiniMax-M3-MXFP8 | Yes | maxTokens 16384; vision maxImagesPerRequest 5; reasoning via chat_template_kwargs.enable_thinking; returns reasoning_content field | | Qwen 3.6 27B NVFP4 | unsloth/Qwen3.6-27B-NVFP4 | Yes | maxTokens 16384; enable_thinking via qwen-chat-template; client-side tool call parsing (vLLM streaming parser bypass) | | Qwen 3.6 35B A3B NVFP4 | unsloth/Qwen3.6-35B-A3B-NVFP4 | Yes | maxTokens 16384; enable_thinking via qwen-chat-template; client-side tool call parsing (vLLM streaming parser bypass) |

Quickstart

Install from npm, then log in once:

# 1. Install the plugin from npm
omp plugin install omp-makora-provider

# 2. Open OMP / pi
omp

# 3. Add your Makora API key
/login makora

# 4. Pick a Makora model
/model makora

That's it. Makora models now appear in /model. No -e flag, no manual clone, no config files.

API key

/login makora prompts for your Makora API key, validates it, and stores it.

If you prefer environment variables:

export MAKORA_OPTIMIZE_TOKEN=your-api-key

Get a key at inference.makora.com.

Install sources

# npm registry (recommended)
omp plugin install omp-makora-provider

# GitHub
omp plugin install https://github.com/ryan-brosas/omp-makora-provider

# Local development
git clone https://github.com/ryan-brosas/omp-makora-provider.git
omp plugin link ./omp-makora-provider

Model Resolution

Models are discovered from the Makora /v1/models API and stored in models.json. Custom definitions and overrides are layered via patch.json and custom-models.json.

| File | Purpose | |---|---| | models.json | Auto-generated from Makora API (model discovery). Regenerated by node scripts/update-models.js — do not edit manually | | patch.json | Manual overrides (reasoning, compat, notes, limits, etc.) applied on top of models.json | | custom-models.json | Models not available via the API (e.g. per-slug endpoint models) |

Models are loaded by merging models.json → apply patch.json → merge custom-models.json.

Patch metadata fields

patch.json supports the same model metadata fields consumed by the provider, including reasoning, input, contextWindow, maxTokens, vision, notes, thinkingLevelMap, and compat. Use maxTokens for safe output caps because Makora model discovery does not report max output tokens. Use vision.maxImagesPerRequest for multimodal request limits when a model declares input: ["text", "image"].

Adding Custom Models

Do not edit models.json directly — it is auto-generated from the API. To customize:

  • Override an existing model: Add entries to patch.json (reasoning, compat, notes, maxTokens, etc.)
  • Add new models not in the API: Add entries to custom-models.json:
[
  {
    "id": "my-org/my-model",
    "name": "My Custom Model",
    "reasoning": false,
    "input": ["text"],
    "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 },
    "contextWindow": 131072,
    "maxTokens": 16384,
    "baseUrl": "https://inference.makora.com/my-model-slug/v1"
  }
]

API Notes

  • Each model is accessible at https://inference.makora.com/v1/chat/completions (unified endpoint)
  • Models with a baseUrl override use their per-slug endpoint instead
  • The API is OpenAI-compatible (chat completions format)
  • All models are hosted on vLLM
  • The developer role is not supported (prompts are silently dropped); supportsDeveloperRole is set to false for all models

vLLM Caveats

These issues are common to all vLLM-hosted providers and affect Makora models:

  • GLM 5.1 tool calling: vLLM's streaming tool call handling is broken for GLM — the model outputs Zhipu's native <tool_call> XML format as raw text. The message_end hook parses this into toolCall blocks so OMP / pi can execute the tools. A context hook then strips tool_calls from assistant messages before follow-up requests, converting them back to <tool_call> text to avoid a ZAI/vLLM server crash (500: 'str object' has no attribute 'items') that occurs when any assistant message contains a tool_calls field. If upstream fixes both the streaming parser and the 500 crash, the message_end hook gracefully skips (existing valid toolCall blocks are preserved), and the context hook's text-stripping is harmless (GLM natively understands <tool_call> text).

    • Kimi K2.6 + Qwen 3.6 tool calling: vLLM's streaming tool call handling is broken or missing for these models. The before_provider_request hook sets tool_choice: "none" and skip_special_tokens: false so the model's tool call tokens pass through as plain text. The message_end hook then re-parses into toolCall blocks:

      • Kimi K2.6: Uses <|tool_call_begin|>...<|tool_call_end|> tokens. Makora's vLLM is missing both --enable-auto-tool-choice and --tool-call-parser for this model.
      • Qwen 3.6: Uses hermes-style <function=...> XML, sometimes with delimiters. Same vLLM flag limitation as Kimi.
  • GLM 5.1 CoT leak: On some vLLM builds, disabling reasoning may still leak chain-of-thought into content terminated by a ``` marker. See vllm-project/vllm#31319.

  • DeepSeek V4 reasoning: The official DeepSeek API uses thinking: { type: "enabled" } which Makora's vLLM silently ignores. The before_provider_request hook rewrites the payload to use vLLM-native params instead:

    • DS V4 Pro: chat_template_kwargs: { thinking: true }. Returns reasoning_content.
    • DS V4 Flash: include_reasoning: true + chat_template_kwargs: { thinking: true }. include_reasoning alone returns reasoning: null on this vLLM build — both params are required. Returns reasoning.
  • GLM 5.1 reasoning: Returns reasoning_content (not reasoning). OMP / pi's OpenAI completions handler checks reasoning_content first, so this is handled correctly.

  • MiniMax M3 reasoning: Uses chat_template_kwargs.enable_thinking to toggle thinking (not chat_template_kwargs.thinking like DeepSeek). The before_provider_request hook rewrites the DeepSeek API-style thinking param into vLLM-native chat_template_kwargs: { enable_thinking: true }. Returns reasoning_content field.