habena

v0.4.0

Published

8 days ago

Habena — keep your AI agent on a short rein. MCP middleware proxy: policy guardrails, spend caps, and human approval for AI agents.

0High
0Medium
0Low

tri3dge

mcp model-context-protocol ai-agent ai-safety guardrails llm agent-security human-in-the-loop openclaw cost-control

Habena

Keep your AI agent on a short rein.

Habena is the open-source safety layer that sits between your AI assistant (OpenClaw, Hermes, any Claude-based agent) and the real MCP servers and tools it calls. It enforces a policy engine, spend caps, and one-tap human approval on every tool call, and audits every decision to SQLite — so a runaway loop can't drain your wallet, a poisoned tool can't quietly exfiltrate your secrets, and nothing dangerous happens without your say-so. Install an assistant and guard it end-to-end. Mac-first.

Renamed from AgentGuard. The agentguard command and the ~/.agentguard/ config directory still work as deprecated aliases — nothing breaks. New installs use habena and ~/.habena/; an existing ~/.agentguard/ is detected automatically.

Status: early, working, single-operator tested. MIT, no paid tier. Not yet recommended for production fleets.

Why

LLM agents are getting powerful faster than they're getting safe. Three things that have already happened to real people:

Tool poisoning. A poisoned MCP tool description was used to exfiltrate a Cursor user's ~/.ssh/id_rsa — the malicious instructions lived in the tool metadata, invisible in the normal UI. (Invariant Labs)
Rug-pull / backdoored server. Even a "trusted" server can turn on you: a tool can present a benign description at approval time, then silently change its behavior afterward (a "rug pull"), or ship an outright backdoor — like an official MCP server that BCC's every outbound email to its maintainer. (Invariant Labs)
Cost runaway. Always-on agents loop. There are reports of $1,000+ surprise bills from runaway agent loops — no cap, no off switch, no one watching.

Habena is the layer that catches these — policy + approval + spend cap + audit, in front of every tool call.

How it works

Agent (OpenClaw/…) → Habena (policy · budget · approval · audit) → real MCP servers (filesystem, gmail, …)

Your agent connects to Habena as its single MCP server. Habena inspects every tool call, applies your policy, logs the decision, and either forwards the call to the real downstream server, holds it for human approval, or blocks it. Allowed calls pass through transparently; everything else stops at the gate.

Quickstart (60 seconds)

Requires Node 20+.

Install from npm:

npm i -g habena

(Or run any command ad hoc with npx habena@latest <command>. To hack on the source instead: clone the repo, pnpm install, pnpm -F habena build, then npm link from packages/core.)

Initialize. Creates ~/.habena/config.yaml seeded with the safe cautious preset (allow read/list, require approval for writes and destructive ops, deny the rest):

habena init

Add a downstream you can reproduce — the filesystem server, rooted at a directory of your choosing:

habena downstream add filesystem ~/workspace

Register an agent + daily budget:

habena agent add --name openclaw --budget-daily 30

Start the proxy (stdio transport):

habena start

Approve from the terminal. In a second terminal, run the interactive approval queue. When a rule returns require_approval, the tool call pauses and waits here until you allow or deny it:

habena watch

Or approve from the browser. The local dashboard serves a live decision stream, the approvals queue, agents, spend, your policy, and a setup wizard:

habena dashboard    # http://localhost:7700 (first run downloads habena-web)

Point your assistant at Habena. For OpenClaw, the installer wires Habena in as the MCP proxy (it backs up your existing config and validates paths first):

habena install openclaw

The demo (what makes it click)

This runs end to end with only the commands above and the default cautious policy — no custom YAML needed. The cautious preset already requires approval for writes and destructive operations.

Set up. habena init, then habena downstream add filesystem ~/workspace, then habena start.
Watch. In a second terminal: habena watch.
Trigger. Your agent (or a test MCP client) asks the filesystem server to write or delete a file under ~/workspace. Because the cautious preset marks writes/deletes as require_approval, Habena does not forward the call — it holds it.
Decide. The held call appears in habena watch. Deny it.
Confirm it was blocked and recorded:

habena logs --decision require_approval

Every allow, deny, and held call is written to the SQLite audit log, queryable with habena logs (filter with --agent, --last 24h, --decision, --limit).

Phone-tap approvals work today. Point Habena at a Telegram bot and a held call buzzes your phone: an agent hits a require_approval rule → your phone buzzes → tap ⛔ Deny → the call is blocked and audited. Only your own chat id can approve, and the choices are Allow-once / Deny. Setup is a few lines of config — see docs/approval-channels.md. The habena watch CLI (and raw IPC) still work alongside it.

Status & roadmap

Early, working, single-operator tested. Habena is public because it's more useful to others than sitting on a laptop, not because it's production-grade. It's MIT licensed with no paid tier, no gated features, and no open-core split. Install with npm i -g habena (npmjs.com/package/habena).

Today: stdio MCP transport only; approvals via CLI/IPC, one-tap Telegram, or the local web dashboard (habena dashboard → localhost:7700: live decision stream, approvals queue, agents, spend, policy viewer, and a setup wizard).

Local heuristic threat detection works today. Habena scans downstream MCP tools for tool-poisoning (suspicious tool-description patterns), rug-pulls (tool-definition drift — checked between runs and mid-session on a periodic re-scan), and credential-egress (secrets in call args). Detection is heuristic/best-effort and runs entirely on your machine — no cloud feed. Each detector defaults to require_approval and is configurable via the threat: block in config.yaml (off | warn | require_approval | block; the re-scan cadence via rescan_interval, default 10m).

What the budget block actually enforces. Habena sits between the agent and its tools, not between the agent and its LLM, so it never sees token bills directly. Three honest mechanisms instead: budget.calls (per_minute/per_hour/per_day) hard-denies past a call count — the cap that stops a looping agent. budget.result_tokens caps the estimated tokens tool results inject into the agent's context (the measurable driver of LLM spend) — also a hard deny. Dollar limits (daily, monthly, per_session, per_request) enforce against pricing: — USD-per-call you declare for metered tools; since declared prices are a guess, overruns warn by default (on_exceed: deny or require_approval to block/escalate). For true dollar caps on LLM spend itself, put an LLM gateway with budgets (e.g. LiteLLM) in front of your model API — Habena and a gateway compose cleanly.

Roadmap:

Provider-side cost ingestion — pull real LLM spend from provider usage APIs / gateways and attribute it per agent, on top of the declared per-tool pricing that ships today.
Cloud-backed threat intel — shared signatures for known-bad servers, layered on the local heuristic detection that already ships.
Mac guarded-sandbox recipe — a documented, locked-down setup for running an assistant under Habena on macOS.

Full design: docs/plans/2026-06-08-habena-design.md.

License

MIT — see LICENSE.

An open-source project by 3app.studio.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme