taintctl

v0.0.0

Published

a month ago

Content-aware provenance layer for Claude Agent SDK. Detects dangerous content at every tool I/O boundary, propagates classification across sub-agent dispatch, and enforces fail-closed policy.

Downloads

137

0High
0Medium
0Low

jokfa

ai-security agent-security claude-agent-sdk guardrails taint-analysis prompt-injection mcp provenance

taintctl

Content-aware provenance layer for Claude Agent SDK and other agent frameworks. Detects dangerous content at every tool I/O boundary, propagates classification across sub-agent dispatch, and enforces policy with a fail-closed default.

Status: pre-alpha. Design phase. No runnable code yet.

Why this exists

Agent systems built on Claude Agent SDK (and similar orchestrator-worker patterns like LangGraph, CrewAI) recognize dangerous content unevenly across calls. When agent A reads .env and dispatches a sub-agent with that data in its prompt, sub-agent B has no signal that the data was already classified as sensitive.

Existing guardrails operate at single-LLM-call granularity and do not propagate classification state across sub-agent dispatch boundaries. Existing MCP scanners operate on static descriptions and trust-on-first-use, not runtime data flow.

taintctl fills that gap.

What's different

| Tool | What it does | Cross-subagent provenance? | Live visualization? | |---|---|:-:|:-:| | mcp-scan | Static MCP description scanning | ❌ | ❌ | | mcp-context-protector | Trust-on-first-use config pinning | ❌ | ❌ | | Lakera Guard / NeMo / guardrails-ai | Single-call content classification | ❌ | ❌ | | Claude Code permission system | Syntactic allow/deny prompts | ❌ | ❌ | | taintctl | Content classification + cross-subagent ledger + flow graph UI | ✅ | ✅ (Stage 3) |

Roadmap

| Stage | Weeks | Deliverable | |---|---|---| | 0 (Pre-code) | Week 0 | Verify SDK hook coverage, name reserved, baseline benchmarks captured | | 1 | Weeks 1-4 | Claude Agent SDK middleware + content classifiers + policy engine + terminal UI | | 2 | Weeks 5-7 | Cross-subagent provenance (channel-a fingerprint + channel-b prompt injection) | | 3 | Weeks 8-12 | Static SPA flow-graph UI + README screencast |

Full design: docs/design-2026-05-20.md Active tasks: TODO.md

Limitations (acknowledged, not hidden)

v1 only handles verbatim taint flow. When an LLM paraphrases sensitive data, channel-a (sha256 fingerprint) breaks. Channel-b (system-prompt warnings to sub-agents) is a partial mitigation but its effectiveness is an empirical question, not a guarantee.
v1 only ships a TypeScript adapter for Claude Agent SDK. Python adapter, LangGraph, AutoGen, CrewAI, OpenClaw, Hermes are v1.1+.
v1 prompt-injection detection is pattern-based. Paraphrased prompt injections will be missed. Documented as known gap, not silently broken.
Not a defense against a malicious parent agent. Standard guardrail assumption: the agent we sit inside is honest-but-naive, not adversary-controlled.

Validation

AgentDojo prompt-injection-marker subset: Stage 1 gate is recall ≥ 0.65 (deterministic detector floor)
InjecAgent: baseline numbers in CI on every PR
Multi-agent scenarios: 8-12 in benchmarks/multiagent/, derived from a fork of damn-vulnerable-MCP-server

License

MIT — see LICENSE

Related work

This is the author's second project in MCP/agent security. The first is MCP-Security-Framework, which scans MCP servers for vulnerable patterns. The two projects are complementary: MCP-Security-Framework is a static scanner; taintctl is a runtime provenance layer.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

taintctl

Why this exists

What's different

Roadmap

Limitations (acknowledged, not hidden)

Validation

License

Related work