linkllm
v0.0.1
Published
The unified LLM runtime — local inference, API proxy, and monitoring in one blazing-fast tool. A powerful alternative to Ollama + LiteLLM, built in Rust.
Maintainers
Readme
# Install and run your first model in under 60 seconds
curl -fsSL https://install.linkllm.dev | sh
linkllm pull mistralai/Mistral-7B-Instruct-v0.3-GGUF
linkllm chat mistralWhat is LinkLLM?
LinkLLM is a single tool that replaces both Ollama and LiteLLM — plus goes further. It gives you:
- Local inference of any GGUF model (llama.cpp-powered, pure-Rust candle backend)
- API proxy to OpenAI, Gemini, Anthropic, Groq, and any OpenAI-compatible endpoint
- Model management — pull any model from HuggingFace with one command
- Production-ready REST API with OpenAI-compatible routes, auth, rate limiting, TLS
- Real-time monitoring dashboard right inside your terminal
- Multi-model routing with fallback chains and cost tracking
All in a single binary. No Docker required. Works on Windows, macOS, Linux, and Termux.
✨ Features
🦀 Rust-Powered Core
Built on Tokio + Axum — async from the ground up. Memory safe, no garbage collector pauses, minimal footprint.
🤖 Local Model Inference
- Run GGUF models via
llama.cppFFI bindings — same performance, Rust-safe wrapper - Pure Rust inference with candle (no C++ dependency)
- GPU acceleration: CUDA, ROCm, Apple Metal — auto-detected
- Quantization: Q4_K_M, Q5_K_S, Q8_0, F16 and more
🌐 Universal API Proxy
Route requests to any provider through a single unified API:
| Provider | Models | |---|---| | OpenAI | gpt-4o, o1, gpt-4-turbo, ... | | Google Gemini | gemini-2.0-flash, gemini-1.5-pro, ... | | Anthropic | claude-3-5-sonnet, claude-3-opus, ... | | Groq | llama3, mixtral (ultra-fast) | | Together AI | 50+ open models | | Any OpenAI-compat | Custom base URL |
📦 HuggingFace Model Pull
linkllm pull mistralai/Mistral-7B-Instruct-v0.3-GGUF
linkllm pull TheBloke/Llama-2-13B-chat-GGUF --quant Q4_K_M
linkllm pull google/gemma-2-9b-itResume interrupted downloads. SHA-256 integrity check. Auto-conversion to GGUF.
📊 Terminal Monitoring Dashboard
linkllm monitorReal-time TUI powered by Ink:
- Tokens/second live graph
- Latency histograms (p50 / p95 / p99)
- Active model memory usage
- Per-provider cost breakdown
- Request log (live tail)
- API key usage tracker
- Error rate + alerts
🔐 Security-First Design
- AES-256-GCM encrypted API key store (OS keychain integration)
- TLS 1.3 by default, mTLS for production
- HMAC request signing in the Rust SDK
- JWT bearer tokens for server access
- Per-key rate limits and quotas
- Sandboxed model inference
🔀 Multi-Model Routing
Define routing rules in linkllm.toml:
[routing]
default = "mistral"
[[routing.rules]]
match = "code"
model = "deepseek-coder"
[[routing.rules]]
match = "long-context"
model = "gemini-1.5-pro"
fallback = ["gpt-4o", "claude-3-opus"]⚡ Quick Start
1. Install
Linux / macOS / Termux:
curl -fsSL https://install.linkllm.dev | shWindows (PowerShell):
irm https://install.linkllm.dev/windows | iexHomebrew:
brew install linkllm/tap/linkllmnpm (CLI only):
npm install -g linkllmpip (Python SDK + CLI):
pip install linkllmFrom source:
git clone https://github.com/linkllm/linkllm
cd linkllm
cargo build --release2. Pull a Model
# Pull from HuggingFace (GGUF auto-detected)
linkllm pull mistralai/Mistral-7B-Instruct-v0.3-GGUF
# Specify quantization
linkllm pull TheBloke/Llama-2-13B-chat-GGUF --quant Q4_K_M
# List downloaded models
linkllm list3. Chat in Terminal
linkllm chat mistral
linkllm chat gpt-4o # routes to OpenAI (needs API key)
linkllm chat gemini-flash # routes to Google Gemini4. Start the Server
linkllm serve
# Server running at http://localhost:11434
# OpenAI-compatible API at http://localhost:11434/v15. Monitor
linkllm monitor🔌 API
LinkLLM exposes a fully OpenAI-compatible REST API. Drop it in as a replacement for api.openai.com:
Chat Completions
curl http://localhost:11434/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-api-key" \
-d '{
"model": "mistral",
"messages": [{"role": "user", "content": "Hello!"}],
"stream": true
}'Python — OpenAI SDK Compatible
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:11434/v1",
api_key="your-api-key"
)
response = client.chat.completions.create(
model="mistral",
messages=[{"role": "user", "content": "Explain Rust ownership"}],
stream=True
)
for chunk in response:
print(chunk.choices[0].delta.content, end="")Python — LinkLLM Native SDK
pip install linkllmimport linkllm
client = linkllm.Client()
# Chat with any model — local or API
response = client.chat("mistral", "What is the capital of France?")
print(response.text)
# Streaming
for token in client.stream("gpt-4o", "Write a haiku about Rust"):
print(token, end="", flush=True)
# Pull a model programmatically
client.pull("TheBloke/Mistral-7B-Instruct-v0.2-GGUF")
# List local models
models = client.list()
for m in models:
print(f"{m.name} — {m.size_gb:.1f} GB")TypeScript / JavaScript
npm install linkllmimport { LinkLLM } from "linkllm";
const client = new LinkLLM({ baseUrl: "http://localhost:11434" });
// Chat
const response = await client.chat({
model: "mistral",
messages: [{ role: "user", content: "Hello from TypeScript!" }],
});
console.log(response.content);
// Streaming
const stream = client.stream({
model: "gpt-4o",
messages: [{ role: "user", content: "Tell me a story" }],
});
for await (const token of stream) {
process.stdout.write(token);
}Rust SDK
# Cargo.toml
[dependencies]
linkllm = "0.1"
tokio = { version = "1", features = ["full"] }use linkllm::{Client, ChatMessage, Role};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let client = Client::new("http://localhost:11434")?;
let response = client
.chat("mistral")
.message(Role::User, "What is Rust?")
.send()
.await?;
println!("{}", response.content());
Ok(())
}⚙️ Configuration
LinkLLM is configured via ~/.linkllm/config.toml:
[server]
host = "127.0.0.1"
port = 11434
tls = false
[models]
default = "mistral"
model_dir = "~/.linkllm/models"
[inference]
gpu_layers = -1 # -1 = auto (offload all to GPU)
context_size = 4096
threads = 8
[api_keys]
# Encrypted. Use `linkllm key add` to set these safely.
openai = ""
gemini = ""
anthropic = ""
groq = ""
[routing]
default = "mistral"
fallback_chain = ["mistral", "gpt-4o-mini"]
[monitoring]
enabled = true
metrics_port = 9090 # Prometheus-compatible /metrics
log_level = "info"Managing API Keys
linkllm key add openai sk-...
linkllm key add gemini AIza...
linkllm key add anthropic sk-ant-...
linkllm key list
linkllm key rm openaiKeys are stored encrypted with AES-256-GCM, tied to your OS keychain.
📋 CLI Reference
linkllm <command> [options]
Commands:
serve Start the LinkLLM server
chat [model] Start an interactive chat session
pull <user/model> Pull a model from HuggingFace
push <model> Push a model to the LinkLLM registry
list List all local models
rm <model> Remove a local model
show <model> Show model info and metadata
monitor Open the TUI monitoring dashboard
key <add|rm|list> Manage encrypted API keys
config <get|set> View or update configuration
run <model> Pull (if needed) and start chatting
Options:
--host Server host (default: 127.0.0.1)
--port Server port (default: 11434)
--model-dir Override model storage directory
--log-level Log verbosity: error|warn|info|debug|trace
-v, --version Print version
-h, --help Show help🆚 Comparison
| | LinkLLM | Ollama | LiteLLM | |---|---|---|---| | Local GGUF inference | ✅ | ✅ | ❌ | | API proxy (OpenAI / Gemini / etc.) | ✅ | ❌ | ✅ | | HuggingFace model pull | ✅ | Partial | ❌ | | TUI monitoring dashboard | ✅ | ❌ | Web UI only | | Multi-model routing + fallback | ✅ | ❌ | ✅ | | Encrypted API key management | ✅ | ❌ | Partial | | Rust core (memory safe) | ✅ | Go | Python | | OpenAI-compatible REST API | ✅ | ✅ | ✅ | | Native Rust SDK | ✅ | ❌ | ❌ | | Pure-Rust inference (candle) | ✅ | ❌ | ❌ | | Mobile / Termux | ✅ | Limited | Limited | | Cost tracking per request | ✅ | ❌ | ✅ | | Single binary, no Docker | ✅ | ✅ | ❌ |
🏗️ Architecture
┌─────────────────────────────────────────────────┐
│ User Interface Layer │
│ CLI Chat · TUI Monitor · Model Manager │
│ (TypeScript + Ink) │
└────────────────────┬────────────────────────────┘
│
┌────────────────────▼────────────────────────────┐
│ API Gateway (Rust/Axum) │
│ REST API · Auth · Rate Limiter · TLS │
└────────────────────┬────────────────────────────┘
│
┌────────────────────▼────────────────────────────┐
│ Core Engine (Rust/Tokio) │
│ Router · Pipeline · Context · Metrics │
└──────┬─────────────┬───────────────┬────────────┘
│ │ │
┌──────▼──────┐ ┌────▼─────┐ ┌──────▼──────┐
│ Local GGUF │ │ Python │ │ API Proxy │
│ llama.cpp │ │ Bridge │ │ OAI/Gemini/ │
│ + candle │ │ HF │ │ Anthropic │
└─────────────┘ └──────────┘ └─────────────┘See the full Architecture Document for details.
📦 Packages
| Package | Registry | Install |
|---|---|---|
| linkllm (binary) | GitHub Releases | curl -fsSL https://install.linkllm.dev \| sh |
| linkllm (CLI) | npm | npm install -g linkllm |
| linkllm (Python SDK) | PyPI | pip install linkllm |
| linkllm (Rust SDK) | crates.io | cargo add linkllm |
🚀 Roadmap
- [x] Core Rust engine + Axum server
- [x] OpenAI-compatible API
- [x] llama.cpp GGUF inference
- [x] HuggingFace model pull
- [x] API proxy (OpenAI, Gemini, Anthropic)
- [x] TUI monitoring dashboard
- [x] Encrypted API key management
- [ ] Multi-model routing (in progress)
- [ ] candle pure-Rust inference
- [ ] WebUI dashboard
- [ ] Model fine-tuning support
- [ ] Plugin / middleware system
- [ ] LoRA adapter merge
- [ ] Distributed inference
- [ ] LinkLLM Cloud (hosted)
🤝 Contributing
Contributions are welcome! Please read CONTRIBUTING.md before submitting a PR.
git clone https://github.com/linkllm/linkllm
cd linkllm
# Build Rust core
cargo build
# Run tests
cargo test
# Build CLI
cd cli && npm install && npm run build
# Run Python bridge tests
cd python && pip install -e ".[dev]" && pytestGood first issues are labeled good-first-issue on GitHub.
📄 License
MIT License © 2025 AJ Ashik
