sage-agent

v0.1.0

Published

2 months ago

Local LLM coding agent CLI

0High
0Medium
0Low

sheepsucktiny

llm coding-agent cli openai-compatible local-llm

Sage — Local LLM Coding Agent CLI

A TypeScript CLI coding agent that connects to any OpenAI-compatible local model server and performs coding tasks autonomously. Built for developers who want Claude Code-style agentic workflows powered by locally-hosted LLMs.

What It Does

Sage is an autonomous coding assistant that can:

Read and edit files in your codebase
Execute shell commands
Search for files and patterns
Understand and respond to natural language requests
Maintain conversation context across sessions
Stream responses in real-time

Unlike cloud-based AI assistants, Sage runs entirely on your local infrastructure using models hosted on Ollama, llama.cpp, vLLM, LM Studio, or any other OpenAI-compatible server.

Features

Tool Calling: Six essential coding tools (read, write, edit, bash, glob, grep)
Streaming Responses: Real-time token streaming with markdown rendering
Multi-Model Support: Works with any OpenAI-compatible API endpoint
Conversation Persistence: Save and restore conversation sessions
Context Management: Automatic context window tracking and management
Safety Controls: Workspace boundary enforcement and destructive command confirmations
Minimal Dependencies: Built with raw fetch and native Node.js APIs
Configuration Hierarchy: CLI args → env vars → config file → defaults

Quick Start

Using npx (no installation)

npx sage-agent

Global installation

npm install -g sage-agent
sage

From source

git clone https://github.com/user/sage-agent.git
cd sage-agent
npm install
npm run dev

Requirements

Node.js 18 or higher
A local LLM server running an OpenAI-compatible API:
- Ollama (recommended for ease of use)
- llama.cpp server
- vLLM
- LM Studio
- Text Generation WebUI (with OpenAI extension)

Configuration

CLI Arguments

sage --base-url http://localhost:8080/v1 \
     --model codellama \
     --max-tokens 4096 \
     --temperature 0.7 \
     --no-confirm

| Argument | Description | Default | |----------|-------------|---------| | --base-url <url> | API base URL | http://localhost:11434/v1 | | --model <name> | Model name | qwen2.5-coder:14b | | --max-tokens <n> | Maximum tokens per response | 4096 | | --temperature <n> | Temperature (0-2) | 0.7 | | --no-confirm | Skip confirmation for destructive bash commands | false | | --help | Show help message | - |

Environment Variables

export SAGE_BASE_URL=http://localhost:11434/v1
export SAGE_MODEL=qwen2.5-coder:14b
export SAGE_MAX_TOKENS=4096
export SAGE_TEMPERATURE=0.7
export SAGE_NO_CONFIRM=false

Config File

Create ~/.config/sage/config.json:

{
  "baseUrl": "http://localhost:11434/v1",
  "model": "qwen2.5-coder:14b",
  "maxTokens": 4096,
  "temperature": 0.7,
  "noConfirm": false,
  "contextWindow": 8192
}

Available Tools

Sage provides six essential tools for coding tasks:

| Tool | Description | Parameters | |------|-------------|------------| | read | Read file contents with line numbers | file_path, optional offset/limit | | write | Create or overwrite files | file_path, content | | edit | Exact string replacement in files | file_path, old_string, new_string | | bash | Execute shell commands with timeout | command, optional timeout | | glob | Find files matching patterns | pattern, optional path | | grep | Search file contents with regex | pattern, optional path/glob/flags |

All file operations are restricted to the current working directory for safety.

Slash Commands

While in the REPL, you can use these special commands:

| Command | Description | |---------|-------------| | /save [name] | Save current conversation (auto-generates name if not provided) | | /load <name> | Load a saved conversation session | | /sessions | List all saved conversation sessions | | /clear | Clear conversation history and start fresh | | /history | Display current conversation history | | /exit or /quit | Exit Sage |

Supported Model Servers

Sage works with any server that implements the OpenAI Chat Completions API with streaming support:

Ollama (Recommended)

# Install Ollama from https://ollama.ai/
ollama serve  # Runs on http://localhost:11434/v1 by default
ollama pull qwen2.5-coder:14b
sage

llama.cpp server

# Build llama.cpp and run server
./server -m model.gguf --port 8080 --ctx-size 8192
sage --base-url http://localhost:8080/v1 --model model-name

vLLM

python -m vllm.entrypoints.openai.api_server \
  --model Qwen/Qwen2.5-Coder-14B-Instruct \
  --port 8080
sage --base-url http://localhost:8080/v1 --model Qwen/Qwen2.5-Coder-14B-Instruct

LM Studio

Start LM Studio and load a model
Enable "Local Server" in settings (usually http://localhost:1234/v1)
Run: sage --base-url http://localhost:1234/v1 --model <model-name>

Model Recommendations

Based on testing, these models work well with Sage:

Best Overall

Qwen 2.5 Coder 14B (qwen2.5-coder:14b) — Excellent tool calling, good reasoning
Qwen 2.5 Coder 32B (qwen2.5-coder:32b) — Best performance if you have VRAM

Good Alternatives

DeepSeek Coder V2 — Strong coding capabilities, good tool use
Mistral Codestral — Fast, reliable for coding tasks
Llama 3.1 8B — Lightweight option for systems with limited resources

Requirements

Models must support function/tool calling
Recommended: 8GB+ VRAM for 7B models, 16GB+ for 14B models

Architecture

Sage uses a simple but powerful agent loop pattern:

User prompt → System prompt + history → LLM API (streaming)
  → If tool_calls in response:
      Execute tools → Append results → Loop back to LLM
  → If plain text response:
      Display to user → Wait for next input

Key Design Decisions

Raw fetch + SSE parsing instead of OpenAI SDK — maximizes compatibility across different server implementations
Readline-based UI instead of React/Ink — simpler, fewer dependencies, sufficient for streaming
Uniform tool interface — each tool exports { name, description, parameters, execute() } for easy extension
CWD-scoped operations — all file operations are restricted to current working directory
Confirmation prompts — destructive bash commands require user approval (unless --no-confirm)

Development

Running from source

npm install
npm run dev          # Run with tsx (no build needed)
npm run build        # Compile TypeScript to dist/
node dist/index.js   # Run built version

Testing

npm test             # Run all tests
npm run test:watch   # Watch mode

Project Structure

src/
  index.ts              # Entry point, CLI parsing, REPL
  agent.ts              # Core agent loop
  client.ts             # OpenAI-compatible API client
  config.ts             # Configuration loading
  context-manager.ts    # Context window management
  history.ts            # Session persistence
  tools/
    index.ts            # Tool registry
    types.ts            # Tool interface
    read.ts             # Read tool
    write.ts            # Write tool
    edit.ts             # Edit tool
    bash.ts             # Bash tool
    glob.ts             # Glob tool
    grep.ts             # Grep tool
    safety.ts           # Workspace boundaries
  ui/
    terminal.ts         # Streaming output
    markdown.ts         # Terminal markdown rendering
  types.ts              # Shared types

Examples

Basic file operations

> Read the package.json file
[Agent uses read tool]

> Add a new script called "lint" that runs eslint
[Agent uses edit tool to modify package.json]

> Create a new file called CHANGELOG.md with initial content
[Agent uses write tool]

Code analysis

> Find all TypeScript files in the src directory
[Agent uses glob tool with pattern "src/**/*.ts"]

> Search for all console.log statements
[Agent uses grep tool with pattern "console\\.log"]

> Show me the implementation of the read tool
[Agent uses read tool on src/tools/read.ts]

Shell commands

> Run the tests
[Agent uses bash tool to run "npm test"]

> Check git status
[Agent uses bash tool to run "git status"]

License

MIT

Contributing

Contributions welcome! Please open an issue or PR on GitHub.

Troubleshooting

Model doesn't support tool calling

Some models don't support function calling. Try models from the recommended list above.

Connection refused

Make sure your local model server is running and the --base-url matches your server's endpoint.

Out of memory errors

Reduce --max-tokens or use a smaller model. The default context window is 8192 tokens.

Tools not executing

Check that the model is actually calling tools (streaming output shows tool calls). Some models need better prompting or don't support tools at all.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

Sage — Local LLM Coding Agent CLI

What It Does

Features

Quick Start

Using npx (no installation)

Global installation

From source

Requirements

Configuration

CLI Arguments

Environment Variables

Config File

Available Tools

Slash Commands

Supported Model Servers

Ollama (Recommended)

llama.cpp server

vLLM

LM Studio

Model Recommendations

Best Overall

Good Alternatives

Requirements

Architecture

Key Design Decisions

Development

Running from source

Testing

Project Structure

Examples

Basic file operations

Code analysis

Shell commands

License

Contributing

Troubleshooting

Model doesn't support tool calling

Connection refused

Out of memory errors

Tools not executing

Links