@c0mpute/worker
v2.3.1
Published
Native CLI worker for the c0mpute.ai distributed inference network. Runs LLM inference via ollama and connects to the orchestrator via Socket.io.
Downloads
310
Maintainers
Readme
c0mpute-worker
Native CLI worker for the c0mpute.ai distributed inference network. Runs LLM inference via ollama and connects to the orchestrator via Socket.io.
Quick Start
- Install ollama and make sure it's running (
ollama serve) - Run the worker:
npx @c0mpute/worker --token <your-token>On first run, the worker will automatically:
- Pull the base model (~17GB download)
- Create a custom
c0mpute-maxmodel with optimized settings - Run a speed benchmark
- Connect to the orchestrator and start serving jobs
How It Works
- Verifies ollama is running locally
- Pulls and configures the model (Qwen 3.5 27B abliterated)
- Runs a speed benchmark to measure your hardware
- Connects to the c0mpute.ai orchestrator via WebSocket
- Accepts and processes inference jobs, streaming tokens back in real time
Capabilities
- Thinking — model uses chain-of-thought reasoning with
<think>tags - Vision — accepts images (base64) alongside text messages
- Tool calling — model can invoke tools (web search, etc.) defined by the orchestrator
- Uncensored — abliterated model with no content restrictions
- Long context — 256K context window
Options
--token <token> Authentication token from c0mpute.ai (required)
--url <url> Orchestrator URL (default: https://c0mpute.ai)
--benchmark Run benchmark only, then exit
--version Show version
--help Show helpRequirements
- Node.js 18+
- ollama installed and running
- GPU with 20GB+ VRAM recommended (NVIDIA RTX 3090/4090, Apple Silicon 32GB+)
- ~17GB disk space for the model
Default Model
Qwen 3.5 27B Abliterated — an uncensored 27B parameter model with 256K context window, vision support, and thinking capabilities.
Architecture
The worker delegates all inference to ollama's local HTTP API. This means:
- No CUDA/Metal build issues — ollama handles GPU acceleration
- Easy model management — ollama pulls and caches models
- Automatic GPU detection — ollama picks the best backend for your hardware
The worker is a dumb relay — it passes tool definitions to the model and relays tool calls back to the orchestrator for execution. Tools are defined and managed server-side.
Earnings
Workers earn credits for completing inference jobs. Earnings are based on tokens generated and your hardware tier. Check your earnings at c0mpute.ai.
