habena
v0.4.0
Published
Habena — keep your AI agent on a short rein. MCP middleware proxy: policy guardrails, spend caps, and human approval for AI agents.
Maintainers
Readme
Habena
Keep your AI agent on a short rein.
Habena is the open-source safety layer that sits between your AI assistant (OpenClaw, Hermes, any Claude-based agent) and the real MCP servers and tools it calls. It enforces a policy engine, spend caps, and one-tap human approval on every tool call, and audits every decision to SQLite — so a runaway loop can't drain your wallet, a poisoned tool can't quietly exfiltrate your secrets, and nothing dangerous happens without your say-so. Install an assistant and guard it end-to-end. Mac-first.
Renamed from AgentGuard. The
agentguardcommand and the~/.agentguard/config directory still work as deprecated aliases — nothing breaks. New installs usehabenaand~/.habena/; an existing~/.agentguard/is detected automatically.
Status: early, working, single-operator tested. MIT, no paid tier. Not yet recommended for production fleets.
Why
LLM agents are getting powerful faster than they're getting safe. Three things that have already happened to real people:
- Tool poisoning. A poisoned MCP tool description was used to exfiltrate a Cursor user's
~/.ssh/id_rsa— the malicious instructions lived in the tool metadata, invisible in the normal UI. (Invariant Labs) - Rug-pull / backdoored server. Even a "trusted" server can turn on you: a tool can present a benign description at approval time, then silently change its behavior afterward (a "rug pull"), or ship an outright backdoor — like an official MCP server that BCC's every outbound email to its maintainer. (Invariant Labs)
- Cost runaway. Always-on agents loop. There are reports of $1,000+ surprise bills from runaway agent loops — no cap, no off switch, no one watching.
Habena is the layer that catches these — policy + approval + spend cap + audit, in front of every tool call.
How it works
Agent (OpenClaw/…) → Habena (policy · budget · approval · audit) → real MCP servers (filesystem, gmail, …)Your agent connects to Habena as its single MCP server. Habena inspects every tool call, applies your policy, logs the decision, and either forwards the call to the real downstream server, holds it for human approval, or blocks it. Allowed calls pass through transparently; everything else stops at the gate.
Quickstart (60 seconds)
Requires Node 20+.
Install from npm:
npm i -g habena(Or run any command ad hoc with npx habena@latest <command>. To hack on the source instead: clone the repo, pnpm install, pnpm -F habena build, then npm link from packages/core.)
Initialize. Creates ~/.habena/config.yaml seeded with the safe cautious preset (allow read/list, require approval for writes and destructive ops, deny the rest):
habena initAdd a downstream you can reproduce — the filesystem server, rooted at a directory of your choosing:
habena downstream add filesystem ~/workspaceRegister an agent + daily budget:
habena agent add --name openclaw --budget-daily 30Start the proxy (stdio transport):
habena startApprove from the terminal. In a second terminal, run the interactive approval queue. When a rule returns require_approval, the tool call pauses and waits here until you allow or deny it:
habena watchOr approve from the browser. The local dashboard serves a live decision stream, the approvals queue, agents, spend, your policy, and a setup wizard:
habena dashboard # http://localhost:7700 (first run downloads habena-web)Point your assistant at Habena. For OpenClaw, the installer wires Habena in as the MCP proxy (it backs up your existing config and validates paths first):
habena install openclawThe demo (what makes it click)
This runs end to end with only the commands above and the default cautious policy — no custom YAML needed. The cautious preset already requires approval for writes and destructive operations.
- Set up.
habena init, thenhabena downstream add filesystem ~/workspace, thenhabena start. - Watch. In a second terminal:
habena watch. - Trigger. Your agent (or a test MCP client) asks the filesystem server to write or delete a file under
~/workspace. Because thecautiouspreset marks writes/deletes asrequire_approval, Habena does not forward the call — it holds it. - Decide. The held call appears in
habena watch. Deny it. - Confirm it was blocked and recorded:
habena logs --decision require_approvalEvery allow, deny, and held call is written to the SQLite audit log, queryable with habena logs (filter with --agent, --last 24h, --decision, --limit).
Phone-tap approvals work today. Point Habena at a Telegram bot and a held call buzzes your phone: an agent hits a
require_approvalrule → your phone buzzes → tap ⛔ Deny → the call is blocked and audited. Only your own chat id can approve, and the choices are Allow-once / Deny. Setup is a few lines of config — see docs/approval-channels.md. Thehabena watchCLI (and raw IPC) still work alongside it.
Status & roadmap
Early, working, single-operator tested. Habena is public because it's more useful to others than sitting on a laptop, not because it's production-grade. It's MIT licensed with no paid tier, no gated features, and no open-core split. Install with npm i -g habena (npmjs.com/package/habena).
Today: stdio MCP transport only; approvals via CLI/IPC, one-tap Telegram, or the local web dashboard (habena dashboard → localhost:7700: live decision stream, approvals queue, agents, spend, policy viewer, and a setup wizard).
Local heuristic threat detection works today. Habena scans downstream MCP tools for tool-poisoning (suspicious tool-description patterns), rug-pulls (tool-definition drift — checked between runs and mid-session on a periodic re-scan), and credential-egress (secrets in call args). Detection is heuristic/best-effort and runs entirely on your machine — no cloud feed. Each detector defaults to
require_approvaland is configurable via thethreat:block inconfig.yaml(off|warn|require_approval|block; the re-scan cadence viarescan_interval, default10m).
What the budget block actually enforces. Habena sits between the agent and its tools, not between the agent and its LLM, so it never sees token bills directly. Three honest mechanisms instead:
budget.calls(per_minute/per_hour/per_day) hard-denies past a call count — the cap that stops a looping agent.budget.result_tokenscaps the estimated tokens tool results inject into the agent's context (the measurable driver of LLM spend) — also a hard deny. Dollar limits (daily,monthly,per_session,per_request) enforce againstpricing:— USD-per-call you declare for metered tools; since declared prices are a guess, overruns warn by default (on_exceed: denyorrequire_approvalto block/escalate). For true dollar caps on LLM spend itself, put an LLM gateway with budgets (e.g. LiteLLM) in front of your model API — Habena and a gateway compose cleanly.
Roadmap:
- Provider-side cost ingestion — pull real LLM spend from provider usage APIs / gateways and attribute it per agent, on top of the declared per-tool pricing that ships today.
- Cloud-backed threat intel — shared signatures for known-bad servers, layered on the local heuristic detection that already ships.
- Mac guarded-sandbox recipe — a documented, locked-down setup for running an assistant under Habena on macOS.
Full design: docs/plans/2026-06-08-habena-design.md.
License
MIT — see LICENSE.
An open-source project by 3app.studio.
