@ondeinference/cli

v0.3.1

Published

12 days ago

A CLI for managing your Onde Inference account and models.

0High
0Medium
0Low

setoelkahfi

keypair34

paydii

onde cli onde-cli inference llm ai on-device developer tools

Install

npm install -g @ondeinference/cli

npm installs the right native binary for your platform automatically.

It works on:

macOS (Apple Silicon and Intel)
Linux (x64 and arm64)
Windows (x64 and arm64)

Other ways to install

| Method | Command | |---|---| | Homebrew | brew install ondeinference/homebrew-tap/onde | | pip | pip install onde-cli | | uv | uv tool install onde-cli | | Dart pub | dart pub global activate onde_cli | | .NET tool | dotnet tool install --global Onde.Cli | | Cargo | cargo install onde-cli |

Run it

onde

That opens the terminal UI.

From there you can:

sign up or sign in
create and manage apps
assign models
fine-tune supported local models
export merged models to GGUF

No browser needed.

Basic keys

| Key | Action | |---|---| | Tab | Move between fields | | Enter | Submit or confirm | | Ctrl+L | Go to sign in | | Ctrl+N | Go to create account | | Ctrl+C | Quit |

Fine-tuning

onde can fine-tune Qwen2, Qwen2.5, and Qwen3 safetensors models with LoRA.

Training runs locally:

Metal on Apple Silicon
CPU on other platforms

So yes, no cloud training setup and no Python environment to babysit.

If you want a quick mental model for what the network is doing once it starts running, Onde has a short write-up on the forward pass.

Supported base models

| Model | Size | |---|---| | Qwen/Qwen3-0.6B | ~1.2 GB | | Qwen/Qwen2.5-1.5B-Instruct | ~3.0 GB | | Qwen/Qwen3-1.7B | ~3.4 GB | | Qwen/Qwen3-4B | ~8.0 GB |

Only safetensors models can be fine-tuned. GGUF models are already quantized, so their weights are not differentiable.

Training data

Use one JSON object per line. Each object needs a text field containing the full conversation in Qwen's chat template.

{"text": "<|im_start|>user\nWhat is the boiling point of water?<|im_end|>\n<|im_start|>assistant\n100°C at sea level.<|im_end|>"}

Running a fine-tune

Open the Models tab.
Pick a safetensors model with ↑ / ↓.
Press f to open the fine-tune config.
Set your data path, LoRA rank (default 8), epochs (default 3), and learning rate (default 0.0001).
Start training.

A rank-8 adapter for the 0.6B model is about 1.5 MB, so the output stays pretty small.

After training

Press m to merge the adapter into the base weights.
Press g to export the merged model to GGUF.

The exported file loads directly in the Onde SDK for on-device inference.

What is Onde?

Onde Inference is for running LLMs on the user's device. No server round-trips, no sending prompts off to somebody else's machine.

It ships as native SDKs for:

The CLI is for account management and local fine-tuning. The SDKs are what you ship in your app.

Debug

Logs are written to ~/.cache/onde/debug.log.

License

Dual-licensed under MIT and Apache 2.0.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme