pi-onnx

v0.2.4

Published

15 days ago

Run Hugging Face onnx-community models locally inside pi: registers a chat provider for ONNX text-generation models and a set of tools (embeddings, classification, ASR) backed by @huggingface/transformers and onnxruntime-node.

0High
0Medium
0Low

jarkkojs

pi-package pi-extension pi-provider coding-agent onnx onnxruntime huggingface onnx-community transformers local-inference

pi-onnx

Runs Hugging Face onnx-community models locally inside the pi coding agent using @huggingface/transformers.

Implements a chat provider and several tool calls:

onnx_embed({ texts: string[] }): array of vectors (and dimensionality).
onnx_classify({ text, labels? }): top-K labels with scores; when labels is provided, runs zero-shot classification.
onnx_transcribe({ path, language?, task? }): transcript text and segments.

Install

pi install npm:pi-onnx

Configure

Copy example-config.json from this package as a starting point:

cp example-config.json ~/.pi/agent/pi-onnx.json

Top-level

| Field | Type | Default | Notes | | --------------------- | -------------------------------------- | --------------------------------- | --------------------------------------------------------- | | cacheDir | string \| null | null (HF default) | Forwarded to env.cacheDir. | | device | "cpu" \| "webgpu" \| "wasm" \| "gpu" | "cpu" | onnxruntime execution provider hint. | | defaultDtype | Dtype | "q4" | Per-model dtype overrides this. | | preloadDefaultModel | boolean | false | Preload the first configured model on session start. | | models | ModelEntry[] | [Qwen2.5-Coder-0.5B-Instruct] | Each entry becomes a onnx-community/<id> chat model. | | discovery | object | enabled, limit 50 | Append compatible onnx-community/* models from the HF Hub. | | tools | object | embed only | Toggles for onnx_embed / _classify / _transcribe. |

Dtype is one of "fp32", "fp16", "q8", "int8", "uint8", "q4", "bnb4", "q4f16".

`models[]`

| Field | Type | Default | Notes | | --------------- | -------- | -------------- | ---------------------------------------------------- | | id | string | — | Hugging Face repo path (onnx-community/ prefixed). | | name | string | id | Display name shown in the model picker. | | contextWindow | number | — | Context window size in tokens. | | maxTokens | number | 1024 | Default max_new_tokens for completions. | | dtype | Dtype | defaultDtype | Quantization for this model only. |

Only id is required; the onnx-community/ prefix is added if missing. Pinned models are checked against the Hugging Face Hub when possible. Repositories that are not compatible with @huggingface/transformers text generation, such as onnxruntime-genai image-text-to-text exports, are skipped instead of being offered as broken chat models.

Example:

{
  "id": "onnx-community/Qwen3-0.6B-ONNX",
  "name": "Qwen3-0.6B (ONNX, q4)",
  "contextWindow": 32768,
  "maxTokens": 2048,
  "dtype": "q4"
}

`discovery`

| Field | Type | Default | Notes | | -------------- | --------------- | ----------------------------------------------------------- | --------------------------------------- | | enabled | boolean | true | Append discovered models to models[]. | | limit | number | 50 | Per pipeline tag. | | pipelineTags | PipelineTag[] | ["text-generation"] | Hugging Face pipeline tags to scan. |

Discovery only registers transformers.js text-generation repositories that expose a supported onnx/model*.onnx file. It also records the matching dtype, so models such as gpt-oss-20b-ONNX use q4f16 instead of the global default q4. If a discovered model is also pinned in models without an explicit dtype, discovery fills in that dtype automatically.

`tools.embed`

| Field | Type | Default | Notes | | ----------- | --------------------------- | ----------------------------------- | ------------------------------ | | enabled | boolean | true | Toggles onnx_embed. | | model | string | onnx-community/all-MiniLM-L6-v2 | Any feature-extraction model. | | pooling | "mean" \| "cls" | "mean" | Pooling strategy. | | normalize | boolean | true | L2-normalize output vectors. |

`tools.classify`

| Field | Type | Default | Notes | | --------- | --------- | ------------------------------------------------------------------ | ------------------------------------ | | enabled | boolean | false | Toggles onnx_classify. | | model | string | onnx-community/distilbert-base-uncased-finetuned-sst-2-english | Classifier or NLI model (zero-shot). | | topK | number | 5 | Maximum labels returned. |

`tools.transcribe`

| Field | Type | Default | Notes | | ----------------- | ------------------------------- | ----------------------------- | ------------------------------------------------- | | enabled | boolean | false | Toggles onnx_transcribe. | | model | string | onnx-community/whisper-tiny | Any ASR model. | | language | string \| null | null | Default language hint (e.g. "en"). | | task | "transcribe" \| "translate" | "transcribe" | Default ASR task. | | maxDecodedBytes | number | 268435456 | Maximum decoded f32 audio bytes to buffer in RAM. |

Limitations

No tool calling support for ONNX chat models.
Tokens are approximated from the tokenizer.
First call to a model blocks while weights download.
onnx_transcribe shells out to ffmpeg (must be on PATH) to decode the input audio file to a Float32Array before inference, capped by maxDecodedBytes.

License

pi-onnx is licensed under MIT. See LICENSE for more information.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

pi-onnx

Install

Configure

Top-level

models[]

discovery

tools.embed

tools.classify

tools.transcribe

Limitations

License

`models[]`

`discovery`

`tools.embed`

`tools.classify`

`tools.transcribe`