@ondeinference/cli
v0.3.1
Published
A CLI for managing your Onde Inference account and models.
Readme
Install
npm install -g @ondeinference/clinpm installs the right native binary for your platform automatically.
It works on:
- macOS (Apple Silicon and Intel)
- Linux (x64 and arm64)
- Windows (x64 and arm64)
Other ways to install
| Method | Command |
|---|---|
| Homebrew | brew install ondeinference/homebrew-tap/onde |
| pip | pip install onde-cli |
| uv | uv tool install onde-cli |
| Dart pub | dart pub global activate onde_cli |
| .NET tool | dotnet tool install --global Onde.Cli |
| Cargo | cargo install onde-cli |
Run it
ondeThat opens the terminal UI.
From there you can:
- sign up or sign in
- create and manage apps
- assign models
- fine-tune supported local models
- export merged models to GGUF
No browser needed.
Basic keys
| Key | Action |
|---|---|
| Tab | Move between fields |
| Enter | Submit or confirm |
| Ctrl+L | Go to sign in |
| Ctrl+N | Go to create account |
| Ctrl+C | Quit |
Fine-tuning
onde can fine-tune Qwen2, Qwen2.5, and Qwen3 safetensors models with LoRA.
Training runs locally:
- Metal on Apple Silicon
- CPU on other platforms
So yes, no cloud training setup and no Python environment to babysit.
If you want a quick mental model for what the network is doing once it starts running, Onde has a short write-up on the forward pass.
Supported base models
| Model | Size |
|---|---|
| Qwen/Qwen3-0.6B | ~1.2 GB |
| Qwen/Qwen2.5-1.5B-Instruct | ~3.0 GB |
| Qwen/Qwen3-1.7B | ~3.4 GB |
| Qwen/Qwen3-4B | ~8.0 GB |
Only safetensors models can be fine-tuned. GGUF models are already quantized, so their weights are not differentiable.
Training data
Use one JSON object per line. Each object needs a text field containing the full conversation in Qwen's chat template.
{"text": "<|im_start|>user\nWhat is the boiling point of water?<|im_end|>\n<|im_start|>assistant\n100°C at sea level.<|im_end|>"}Running a fine-tune
- Open the Models tab.
- Pick a safetensors model with
↑/↓. - Press
fto open the fine-tune config. - Set your data path, LoRA rank (default
8), epochs (default3), and learning rate (default0.0001). - Start training.
A rank-8 adapter for the 0.6B model is about 1.5 MB, so the output stays pretty small.
After training
- Press
mto merge the adapter into the base weights. - Press
gto export the merged model to GGUF.
The exported file loads directly in the Onde SDK for on-device inference.
What is Onde?
Onde Inference is for running LLMs on the user's device. No server round-trips, no sending prompts off to somebody else's machine.
It ships as native SDKs for:
The CLI is for account management and local fine-tuning. The SDKs are what you ship in your app.
Debug
Logs are written to ~/.cache/onde/debug.log.
License
Dual-licensed under MIT and Apache 2.0.
Copyright
© 2026 Onde Inference (Splitfire AB).
