linkllm

v0.0.1

Published

2 months ago

The unified LLM runtime — local inference, API proxy, and monitoring in one blazing-fast tool. A powerful alternative to Ollama + LiteLLM, built in Rust.

0High
0Medium
0Low

theajashik

llm ai ollama litellm openai gemini anthropic local-llm llama gguf huggingface inference cli rust model-router

# Install and run your first model in under 60 seconds
curl -fsSL https://install.linkllm.dev | sh
linkllm pull mistralai/Mistral-7B-Instruct-v0.3-GGUF
linkllm chat mistral

What is LinkLLM?

LinkLLM is a single tool that replaces both Ollama and LiteLLM — plus goes further. It gives you:

Local inference of any GGUF model (llama.cpp-powered, pure-Rust candle backend)
API proxy to OpenAI, Gemini, Anthropic, Groq, and any OpenAI-compatible endpoint
Model management — pull any model from HuggingFace with one command
Production-ready REST API with OpenAI-compatible routes, auth, rate limiting, TLS
Real-time monitoring dashboard right inside your terminal
Multi-model routing with fallback chains and cost tracking

All in a single binary. No Docker required. Works on Windows, macOS, Linux, and Termux.

✨ Features

🦀 Rust-Powered Core

Built on Tokio + Axum — async from the ground up. Memory safe, no garbage collector pauses, minimal footprint.

🤖 Local Model Inference

Run GGUF models via llama.cpp FFI bindings — same performance, Rust-safe wrapper
Pure Rust inference with candle (no C++ dependency)
GPU acceleration: CUDA, ROCm, Apple Metal — auto-detected
Quantization: Q4_K_M, Q5_K_S, Q8_0, F16 and more

🌐 Universal API Proxy

Route requests to any provider through a single unified API:

| Provider | Models | |---|---| | OpenAI | gpt-4o, o1, gpt-4-turbo, ... | | Google Gemini | gemini-2.0-flash, gemini-1.5-pro, ... | | Anthropic | claude-3-5-sonnet, claude-3-opus, ... | | Groq | llama3, mixtral (ultra-fast) | | Together AI | 50+ open models | | Any OpenAI-compat | Custom base URL |

📦 HuggingFace Model Pull

linkllm pull mistralai/Mistral-7B-Instruct-v0.3-GGUF
linkllm pull TheBloke/Llama-2-13B-chat-GGUF --quant Q4_K_M
linkllm pull google/gemma-2-9b-it

Resume interrupted downloads. SHA-256 integrity check. Auto-conversion to GGUF.

📊 Terminal Monitoring Dashboard

linkllm monitor

Real-time TUI powered by Ink:

Tokens/second live graph
Latency histograms (p50 / p95 / p99)
Active model memory usage
Per-provider cost breakdown
Request log (live tail)
API key usage tracker
Error rate + alerts

🔐 Security-First Design

AES-256-GCM encrypted API key store (OS keychain integration)
TLS 1.3 by default, mTLS for production
HMAC request signing in the Rust SDK
JWT bearer tokens for server access
Per-key rate limits and quotas
Sandboxed model inference

🔀 Multi-Model Routing

Define routing rules in linkllm.toml:

[routing]
default = "mistral"

[[routing.rules]]
match = "code"
model = "deepseek-coder"

[[routing.rules]]
match = "long-context"
model = "gemini-1.5-pro"
fallback = ["gpt-4o", "claude-3-opus"]

⚡ Quick Start

1. Install

Linux / macOS / Termux:

curl -fsSL https://install.linkllm.dev | sh

Windows (PowerShell):

irm https://install.linkllm.dev/windows | iex

Homebrew:

brew install linkllm/tap/linkllm

npm (CLI only):

npm install -g linkllm

pip (Python SDK + CLI):

pip install linkllm

From source:

git clone https://github.com/linkllm/linkllm
cd linkllm
cargo build --release

2. Pull a Model

# Pull from HuggingFace (GGUF auto-detected)
linkllm pull mistralai/Mistral-7B-Instruct-v0.3-GGUF

# Specify quantization
linkllm pull TheBloke/Llama-2-13B-chat-GGUF --quant Q4_K_M

# List downloaded models
linkllm list

3. Chat in Terminal

linkllm chat mistral
linkllm chat gpt-4o          # routes to OpenAI (needs API key)
linkllm chat gemini-flash    # routes to Google Gemini

4. Start the Server

linkllm serve
# Server running at http://localhost:11434
# OpenAI-compatible API at http://localhost:11434/v1

5. Monitor

linkllm monitor

🔌 API

LinkLLM exposes a fully OpenAI-compatible REST API. Drop it in as a replacement for api.openai.com:

Chat Completions

curl http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-api-key" \
  -d '{
    "model": "mistral",
    "messages": [{"role": "user", "content": "Hello!"}],
    "stream": true
  }'

Python — OpenAI SDK Compatible

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:11434/v1",
    api_key="your-api-key"
)

response = client.chat.completions.create(
    model="mistral",
    messages=[{"role": "user", "content": "Explain Rust ownership"}],
    stream=True
)

for chunk in response:
    print(chunk.choices[0].delta.content, end="")

Python — LinkLLM Native SDK

pip install linkllm

import linkllm

client = linkllm.Client()

# Chat with any model — local or API
response = client.chat("mistral", "What is the capital of France?")
print(response.text)

# Streaming
for token in client.stream("gpt-4o", "Write a haiku about Rust"):
    print(token, end="", flush=True)

# Pull a model programmatically
client.pull("TheBloke/Mistral-7B-Instruct-v0.2-GGUF")

# List local models
models = client.list()
for m in models:
    print(f"{m.name} — {m.size_gb:.1f} GB")

TypeScript / JavaScript

npm install linkllm

import { LinkLLM } from "linkllm";

const client = new LinkLLM({ baseUrl: "http://localhost:11434" });

// Chat
const response = await client.chat({
  model: "mistral",
  messages: [{ role: "user", content: "Hello from TypeScript!" }],
});
console.log(response.content);

// Streaming
const stream = client.stream({
  model: "gpt-4o",
  messages: [{ role: "user", content: "Tell me a story" }],
});

for await (const token of stream) {
  process.stdout.write(token);
}

Rust SDK

# Cargo.toml
[dependencies]
linkllm = "0.1"
tokio = { version = "1", features = ["full"] }

use linkllm::{Client, ChatMessage, Role};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = Client::new("http://localhost:11434")?;

    let response = client
        .chat("mistral")
        .message(Role::User, "What is Rust?")
        .send()
        .await?;

    println!("{}", response.content());
    Ok(())
}

⚙️ Configuration

LinkLLM is configured via ~/.linkllm/config.toml:

[server]
host = "127.0.0.1"
port = 11434
tls = false

[models]
default = "mistral"
model_dir = "~/.linkllm/models"

[inference]
gpu_layers = -1        # -1 = auto (offload all to GPU)
context_size = 4096
threads = 8

[api_keys]
# Encrypted. Use `linkllm key add` to set these safely.
openai = ""
gemini = ""
anthropic = ""
groq = ""

[routing]
default = "mistral"
fallback_chain = ["mistral", "gpt-4o-mini"]

[monitoring]
enabled = true
metrics_port = 9090    # Prometheus-compatible /metrics
log_level = "info"

Managing API Keys

linkllm key add openai sk-...
linkllm key add gemini AIza...
linkllm key add anthropic sk-ant-...
linkllm key list
linkllm key rm openai

Keys are stored encrypted with AES-256-GCM, tied to your OS keychain.

📋 CLI Reference

linkllm <command> [options]

Commands:
  serve               Start the LinkLLM server
  chat [model]        Start an interactive chat session
  pull <user/model>   Pull a model from HuggingFace
  push <model>        Push a model to the LinkLLM registry
  list                List all local models
  rm <model>          Remove a local model
  show <model>        Show model info and metadata
  monitor             Open the TUI monitoring dashboard
  key <add|rm|list>   Manage encrypted API keys
  config <get|set>    View or update configuration
  run <model>         Pull (if needed) and start chatting

Options:
  --host              Server host (default: 127.0.0.1)
  --port              Server port (default: 11434)
  --model-dir         Override model storage directory
  --log-level         Log verbosity: error|warn|info|debug|trace
  -v, --version       Print version
  -h, --help          Show help

🆚 Comparison

| | LinkLLM | Ollama | LiteLLM | |---|---|---|---| | Local GGUF inference | ✅ | ✅ | ❌ | | API proxy (OpenAI / Gemini / etc.) | ✅ | ❌ | ✅ | | HuggingFace model pull | ✅ | Partial | ❌ | | TUI monitoring dashboard | ✅ | ❌ | Web UI only | | Multi-model routing + fallback | ✅ | ❌ | ✅ | | Encrypted API key management | ✅ | ❌ | Partial | | Rust core (memory safe) | ✅ | Go | Python | | OpenAI-compatible REST API | ✅ | ✅ | ✅ | | Native Rust SDK | ✅ | ❌ | ❌ | | Pure-Rust inference (candle) | ✅ | ❌ | ❌ | | Mobile / Termux | ✅ | Limited | Limited | | Cost tracking per request | ✅ | ❌ | ✅ | | Single binary, no Docker | ✅ | ✅ | ❌ |

🏗️ Architecture

┌─────────────────────────────────────────────────┐
│              User Interface Layer                │
│   CLI Chat · TUI Monitor · Model Manager        │
│              (TypeScript + Ink)                  │
└────────────────────┬────────────────────────────┘
                     │
┌────────────────────▼────────────────────────────┐
│              API Gateway (Rust/Axum)             │
│   REST API · Auth · Rate Limiter · TLS          │
└────────────────────┬────────────────────────────┘
                     │
┌────────────────────▼────────────────────────────┐
│            Core Engine (Rust/Tokio)              │
│   Router · Pipeline · Context · Metrics         │
└──────┬─────────────┬───────────────┬────────────┘
       │             │               │
┌──────▼──────┐ ┌────▼─────┐ ┌──────▼──────┐
│  Local GGUF │ │  Python  │ │  API Proxy  │
│  llama.cpp  │ │  Bridge  │ │ OAI/Gemini/ │
│  + candle   │ │    HF    │ │  Anthropic  │
└─────────────┘ └──────────┘ └─────────────┘

See the full Architecture Document for details.

📦 Packages

| Package | Registry | Install | |---|---|---| | linkllm (binary) | GitHub Releases | curl -fsSL https://install.linkllm.dev \| sh | | linkllm (CLI) | npm | npm install -g linkllm | | linkllm (Python SDK) | PyPI | pip install linkllm | | linkllm (Rust SDK) | crates.io | cargo add linkllm |

🚀 Roadmap

[x] Core Rust engine + Axum server
[x] OpenAI-compatible API
[x] llama.cpp GGUF inference
[x] HuggingFace model pull
[x] API proxy (OpenAI, Gemini, Anthropic)
[x] TUI monitoring dashboard
[x] Encrypted API key management
[ ] Multi-model routing (in progress)
[ ] candle pure-Rust inference
[ ] WebUI dashboard
[ ] Model fine-tuning support
[ ] Plugin / middleware system
[ ] LoRA adapter merge
[ ] Distributed inference
[ ] LinkLLM Cloud (hosted)

🤝 Contributing

Contributions are welcome! Please read CONTRIBUTING.md before submitting a PR.

git clone https://github.com/linkllm/linkllm
cd linkllm

# Build Rust core
cargo build

# Run tests
cargo test

# Build CLI
cd cli && npm install && npm run build

# Run Python bridge tests
cd python && pip install -e ".[dev]" && pytest

Good first issues are labeled good-first-issue on GitHub.