@model-action-protocol/core
v0.2.0
Published
MAP (Model Action Protocol) — Cryptographic provenance, self-healing, and state rollback for autonomous AI agents
Downloads
64
Maintainers
Readme
Model Action Protocol (MAP)
Cryptographic provenance, self-healing critique, and state rollback for autonomous AI agents.
MCP gave Claude the hands. MAP gives Claude the receipt.
MAP v0.1 spec is frozen. Wire format, JCS canonicalization, and conformance fixtures are immutable at spec v0.1.0. Two reference implementations conform: TypeScript (
@model-action-protocol/core, npm) and Python (model-action-protocol, PyPI). A ledger written in either language verifies byte-identical in the other — pinned by 6 frozen conformance fixtures.
The Problem
AI agents are entering Phase 3: autonomous execution. They schedule reasoning, call tools, execute multi-step processes, and verify their own results — all without a human in the loop.
But there's no liability shield.
| Phase | Era | Who's Responsible? | |-------|-----|-------------------| | Phase 1: Chatbots (Nov 2022) | Model generates answers from prompts | Human — they read and act on the output | | Phase 2: Reasoning (Sep 2024) | Model reasons before answering, reduces errors | Human — still driving step by step | | Phase 3: Agents (2025-2026) | Agent executes autonomously, verifies its own work | Nobody — the human is abstracted away |
When an agent deletes a production database, sends the wrong email, or misconfigures infrastructure — who's accountable? How do you prove what happened? How do you undo it?
And when fleets of agents work simultaneously — who authorized what, who spawned whom, and which agent broke which thing?
MAP is the OS-level liability shield for autonomous agents.
What It Does
1. Cryptographic Provenance Ledger
Every agent action is logged to an append-only, SHA-256 hash-chained ledger:
- Full state snapshots before and after every action
- Tamper-evident — change one entry, all subsequent hashes break
- Exportable as JSON for audit compliance
- Chain verification in one call
2. Self-Healing Critic Loop
After every action, a fast critic model reviews the result:
- PASS — Action is correct, continue
- CORRECTED — Error detected, auto-fixed, both logged
- FLAGGED — Dangerous action, execution halts for human review
Uses tiered model routing: expensive model executes, cheap model critiques.
3. Reversal Schema (COMPENSATE / RESTORE / ESCALATE)
Every MAP-compliant tool declares how its actions can be reversed:
| Strategy | When | How Rollback Works | Limitations |
|----------|------|-------------------|-------------|
| COMPENSATE | Systems that don't allow hard deletes (ERPs, accounting) | Dispatch a compensating action (e.g., credit memo for duplicate invoice) | Matches how regulated industries already work (banks post reversing entries, ERPs issue credit memos). The strongest strategy. |
| RESTORE | CRUD APIs with GET + PUT | Captures state before write via tool.capture(), pushes original state back on rollback via tool.restore() | Concurrent modification risk: if another process modifies the same record between action and rollback, restore blindly overwrites their changes (last-write-wins). Best for single-writer environments. |
| ESCALATE | Irreversible actions (wire transfers, emails, deploys) | Intercepts before execution — the tool never runs without human approval | Not a rollback strategy. This is a prevention gate. The correct answer for actions that genuinely cannot be undone. |
What rollback can't do:
- Side effects that left the system. An email was sent and read. A Slack message was delivered. No rollback fixes that — ESCALATE is the right strategy for these actions.
- Distributed state across multiple services. If an agent updated Stripe AND Salesforce AND sent a notification, rolling back one without the others leaves inconsistent state. Coordinated multi-service rollback is a v0.2 problem.
- Time-sensitive operations. A stock was sold at $100. By the time rollback runs, the price is $87. COMPENSATE can issue a reversing trade, but the economic outcome is different.
MAP makes rollback possible and structured for the majority of agent actions that are API CRUD operations. For the rest, ESCALATE gates them before execution.
4. State Rollback
Revert to any prior point in the ledger:
- The rollback itself is logged to the provenance chain
- Rollback doesn't delete history — it preserves the full chain and adds a revert entry
- For RESTORE tools, rollback calls
tool.restore()against the external system — not just in-memory snapshot restoration
5. Multi-Agent Provenance (KYA — Know Your Agent)
When fleets of agents work simultaneously, MAP tracks everything:
Agent Identity — every agent has a cryptographic identity:
{
agentId: string;
ownerId: string; // org/user that owns this agent
ownerDomain: string; // e.g., "customer.com"
capabilities: string[]; // what this agent can do
credentialHash: string; // SHA-256 of auth credential
}Authorization Grants — cross-boundary trust:
{
grantor: AgentIdentity; // Agent A (requesting)
grantee: AgentIdentity; // Agent B (executing)
scope: string[]; // specific actions authorized
constraints: {}; // e.g., max amount, time window
expiresAt?: string; // when this grant expires
parentGrantId?: string; // delegation chain
hash: string; // tamper-evident
}Ephemeral Agent Lifecycle — spawn tree tracking:
{
agentId: string;
parentAgentId?: string; // who spawned this agent
spawnedAt: string;
terminatedAt?: string; // null if still alive
purpose: string; // why this agent exists
isEphemeral: boolean; // auto-terminate when done
childAgentIds: string[]; // sub-agents spawned
}Every ledger entry carries agentId, parentEntryId, and lineage[] — the full chain from root agent to the action.
6. Human-on-the-Loop Approval
Corrections can require human sign-off before proceeding:
pending→ action awaits reviewapproved→ human confirmed, logged to chainrejected→ human rejected, rollback required
Approval is a separate concern from entry status — clean separation.
MCP + MAP: The Complete Picture
| | MCP | MAP | |---|---|---| | Direction | Input — what the agent can see | Output — what the agent did | | Purpose | Capability | Accountability | | Analogy | Git (version control) | GitHub (collaboration + audit) |
MCP defines how agents read the world. MAP defines how agents safely write to it. Together they complete the picture for enterprise-grade autonomous agents.
Installation
TypeScript
npm install @model-action-protocol/coreRequirements: Node.js 20+, TypeScript 5.7+. Current version: 0.2.0 (breaking change from 0.1.x — JCS canonicalization adopted; see CHANGELOG).
Python
pip install model-action-protocol
# or with extras:
pip install "model-action-protocol[anthropic,sqlite,postgres,fastapi]"Requirements: Python 3.10+. Current version: 0.1.0. See python/README.md and python/DESIGN.md. Walkthrough: python/examples/quickstart.ipynb. HTTP demo: python/examples/fastapi_app/.
Specification
The wire format is defined in spec/SPEC.md. Both implementations conform. Conformance fixtures live at spec/fixtures/v0.1/ — immutable, version-bumped only via a new spec release.
Persistence (Optional)
By default, the ledger lives in memory. For production, MAP ships two pluggable storage adapters. Both are optional peer dependencies — install whichever you need.
PostgreSQL
npm install pgimport { MAP } from '@model-action-protocol/core';
import { PostgresLedgerStore } from '@model-action-protocol/core/postgres';
const store = new PostgresLedgerStore({
connectionString: process.env.DATABASE_URL,
tableName: 'ledger_entries', // optional, default: 'ledger_entries'
sessionId: 'default', // optional, for multi-tenant isolation
});
const map = await MAP.load({ ...config, store }, critic);Connection pooling, JSONB entries, concurrent-write retry logic. Contributed by @mel-cell.
SQLite
npm install better-sqlite3import { MAP } from '@model-action-protocol/core';
import { SQLiteLedgerStore } from '@model-action-protocol/core/sqlite';
const store = new SQLiteLedgerStore('./ledger.db');
const map = await MAP.load({ ...config, store }, critic);WAL mode, prepared-statement caching, atomic transactions. Good for single-node deployments.
Use
MAP.load()instead ofnew MAP()when using a persistent store — it reads any existing entries on startup.new MAP()skips that step.
Quick Start
import { MAP, createRuleCritic } from '@model-action-protocol/core';
import { z } from 'zod';
// Your state
const database: Record<string, any> = {
acme: { id: "acme", name: "Acme Corp", price: 500 },
globex: { id: "globex", name: "Globex Inc", price: 500 },
};
// 1. Create a critic
const critic = createRuleCritic([
{
name: 'no-zero-prices',
check: ({ stateAfter }) => {
const state = stateAfter as Record<string, any>;
const bad = Object.values(state).find((c) => c.price === 0);
if (bad) {
return {
verdict: 'CORRECTED',
reason: `${bad.name} price was set to $0`,
correction: { tool: 'updatePrice', input: { customerId: bad.id, price: 299 } },
};
}
return null;
},
},
]);
// 2. Initialize MAP
const map = new MAP(
{ executor: 'claude-sonnet-4.6', critic: 'claude-haiku-4.5' },
critic
);
// 3. Register tools
map.registerTool(
'updatePrice', 'Update a customer price',
z.object({ customerId: z.string(), price: z.number() }),
async ({ customerId, price }) => {
database[customerId].price = price;
return { updated: customerId, newPrice: price };
}
);
// 4. Connect state
map.connectState(
() => JSON.parse(JSON.stringify(database)),
(state) => Object.assign(database, state),
);
// 5. Execute with full provenance
await map.execute('Migrate pricing', 'updatePrice', { customerId: 'acme', price: 299 });
// 6. Rollback if needed
const ledger = map.getLedger();
await map.rollbackTo(ledger[0].id);
// 7. Export for audit
const audit = map.exportLedger();
// → { protocol: 'map', version: '0.1.0', entries: [...], stats: {...} }
// 8. Verify chain integrity
map.verifyIntegrity(); // → { valid: true }Python — same scenario
from map import Action, CriticResult, Map, rule_critic, verify_chain
# Your state — a tiny "orders database" stand-in for a real backend.
ORDERS: dict[str, dict] = {}
def place_order(item_id: str, quantity: int) -> dict:
order_id = f"O-{item_id}-{quantity}-{len(ORDERS)}"
record = {"orderId": order_id, "item_id": item_id, "quantity": quantity, "status": "open"}
ORDERS[order_id] = record
return record
def cancel_order(action: Action, output) -> dict:
ORDERS[output["orderId"]]["status"] = "cancelled"
return {"orderId": output["orderId"], "cancelled": True}
# 1. Critic — flag any order over 100 units
def disallow_huge(action, sb, sa):
if action.input.get("quantity", 0) > 100:
return CriticResult(verdict="FLAGGED", reason="quantity over 100 needs approval")
return None
# 2. Wire up MAP with critic + reverser
m = Map()
m.set_critic(rule_critic([disallow_huge]))
decorated = m.reversible(reverser=cancel_order)(place_order)
# 3. Execute — every call is a verifiable ledger entry
output = decorated(item_id="SKU-A", quantity=2)
entry = m.execute(Action(tool="place_order", input={"item_id": "SKU-A", "quantity": 2}, output=output))
# → entry.critic.verdict == "PASS"
# 4. Roll back — reverser fires, world state flips
m.rollback_to(entry.id)
assert ORDERS["O-SKU-A-2-0"]["status"] == "cancelled"
# 5. Audit export + chain verification
chain = [e.model_dump(by_alias=True, exclude_none=True) for e in m.get_entries()]
assert verify_chain(chain) == {"valid": True}For a full narrated walkthrough — including LearningEngine, persistent stores, and the Anthropic SDK integration — see python/examples/quickstart.ipynb. For an HTTP service template, see python/examples/fastapi_app/.
Cross-language conformance
MAP's claim isn't "we have a TypeScript library and a Python library." It's that the wire format is the spec, both implementations conform, and ledgers are byte-identical across them. The proof is in spec/SPEC.md §6.4 — a worked example with a hand-computed SHA-256 that an automated test on each side asserts equality against.
// TypeScript — Open Source/src/snapshot.ts
import { computeEntryHash, sha256, serializeState } from "@model-action-protocol/core";
const stateHash = sha256(serializeState(null));
const entryHash = computeEntryHash(
0,
{ tool: "ping", input: {}, output: "pong" },
stateHash, stateHash,
"0".repeat(64),
{ verdict: "PASS", reason: "ok" }
);
// → "25d29bc25a183ebdb29b70b6a03ed2ad8d31033d1fb6347f656b21d7e9efb650"# Python — same inputs, same output, byte-identical
from map import GENESIS_HASH, compute_entry_hash, state_hash
null_hash = state_hash(None)
entry_hash = compute_entry_hash(
sequence=0,
action={"tool": "ping", "input": {}, "output": "pong"},
state_before=null_hash, state_after=null_hash,
parent_hash=GENESIS_HASH,
critic={"verdict": "PASS", "reason": "ok"},
)
# → "25d29bc25a183ebdb29b70b6a03ed2ad8d31033d1fb6347f656b21d7e9efb650"Six frozen v0.1 fixtures (in spec/fixtures/v0.1/) cover the cases that historically break cross-language hash protocols: unicode (NFC vs NFD), deep nesting, empty payloads, large payloads, integer-valued floats (RFC 8785 §3.2.2.3 — JS's JSON.stringify(1.0) is "1", Python's default json.dumps(1.0) is "1.0"; we bridge it). Both impls verify all six, and a ledger written in either language verifies in the other.
The Paved Path: Pre-Built Tool Packages
Instead of writing reversal schemas from scratch, use pre-built MAP-compliant tools. The first example ships in this repo at examples/tools-stripe — drop it directly into your project while the npm packages get published:
// Example pattern — see examples/tools-stripe in this repo for the full implementation
import { stripeTools } from './tools-stripe';
stripeTools.forEach(tool => map.addTool(tool));Build tools with typed reversal strategies:
import { defineRestoreTool, defineCompensateTool, defineEscalateTool } from '@model-action-protocol/core';
// RESTORE: auto-capture state before write, restore on rollback
const updateCustomer = defineRestoreTool({
name: 'updateCustomer', description: 'Update customer record',
inputSchema: z.object({ id: z.string(), email: z.string() }),
execute: async (input) => api.updateCustomer(input),
capture: async (input) => api.getCustomer(input.id),
restore: async (captured) => api.updateCustomer(captured),
});
// COMPENSATE: map forward action to compensating action
const chargeCard = defineCompensateTool({
name: 'chargeCard', description: 'Charge a credit card',
inputSchema: z.object({ amount: z.number() }),
execute: async (input) => stripe.charges.create(input),
compensate: async (input, output) => stripe.refunds.create({ charge: output.id }),
});
// ESCALATE: require human approval for irreversible actions
const wireTransfer = defineEscalateTool({
name: 'wireTransfer', description: 'Send a wire transfer',
inputSchema: z.object({ amount: z.number(), to: z.string() }),
execute: async (input) => bank.sendWire(input),
approver: '[email protected]',
});Planned tool packages:
@model-action-protocol/tools-stripe— payments, refunds, subscriptions (example included)@model-action-protocol/tools-salesforce— CRM operations@model-action-protocol/tools-netsuite— ERP/GL operations@model-action-protocol/tools-hubspot— marketing automation@model-action-protocol/tools-aws— infrastructure operations
Using an LLM Critic (Production)
import { MAP, createLLMCritic } from '@model-action-protocol/core';
import { generateText } from 'ai';
const critic = createLLMCritic({
model: 'claude-haiku-4.5',
generateText,
});
const map = new MAP(
{ executor: 'claude-sonnet-4.6', critic: 'claude-haiku-4.5' },
critic
);Learning Engine — The Ledger IS the Training Data
Every CORRECTED verdict, every FLAGGED action, every human Approve/Reject decision is permanently logged with full context. Over time, this becomes a dataset of "mistakes this organization's agents make" and "how humans want them corrected."
Level 1: Rule Extraction
After N identical corrections, the system proposes a new deterministic rule. No LLM needed for that check anymore — it becomes a microsecond gate.
import { LearningEngine } from '@model-action-protocol/core';
const engine = new LearningEngine();
// Analyze the ledger for repeated correction patterns
const patterns = engine.analyzePatterns(map.getLedger());
// → [{ tool: "reclassifyTransaction", count: 5, summary: "CORRECTED: SOX violation..." }]
// Propose rules from patterns observed 3+ times
const proposals = engine.proposeRules(map.getLedger(), 3);
// → [{ id: "rule_corrected:reclassify...", description: "Auto-proposed: ...", approved: false }]
// Human reviews and approves the rule
proposals.forEach(r => engine.addProposedRule(r));
engine.approveRule(proposals[0].id);
// Use learned rules as the fast tier in the tiered critic
const learnedCritic = engine.toRuleCritic();
// Plug into tiered critic — learned rules run first (microseconds),
// LLM only fires for patterns the rules haven't seen yet
import { createTieredCritic } from '@model-action-protocol/core';
const tieredCritic = createTieredCritic({
low: learnedCritic, // μs — learned rules
medium: createLLMCritic({ model: 'claude-haiku-4.5', generateText }), // 200ms
high: createLLMCritic({ model: 'claude-sonnet-4.6', generateText }), // 1-2s
});The system gets cheaper and faster over time. Every correction that becomes a rule is one fewer LLM call.
Level 2: Critic Fine-Tuning
Export the corpus of corrections with human decisions as structured training data:
const trainingData = engine.exportFineTuningData(map.getLedger());
// → [{
// input: { action, stateBefore, stateAfter },
// output: { verdict: "CORRECTED", reason: "SOX violation...", correction: {...} },
// humanApproval: "approved"
// }]Fine-tune the Critic model on your organization's specific error patterns. The Critic doesn't just know general compliance — it knows YOUR compliance.
Level 3: Agent Self-Improvement
Give agents their own correction history so they stop repeating mistakes:
const memory = engine.exportAgentMemory(map.getLedger(), 'agent-compliance-checker');
// → [{
// tool: "closeAccount",
// whatHappened: "Called closeAccount with { accountId: '1200-004' }",
// verdict: "FLAGGED",
// lesson: "This action was FLAGGED and required human review: regulatory hold violation.
// Do not attempt this without explicit approval."
// }]
// Inject into agent's system prompt as learned context
const agentPrompt = `
You are a compliance agent. Here are lessons from your past actions:
${memory.map(m => `- ${m.lesson}`).join('\n')}
`;Key design principle: The learning engine reads from the ledger, never modifies it. Proposed rules require human approval before activating. The human stays on the loop even for the learning system.
Data Privacy
All learning is local to your organization. A trust protocol cannot undermine trust.
| Level | Where Data Lives | Shared Across Orgs? | Used for Base Model Training? | |-------|-----------------|--------------------|-----------------------------| | Rule extraction | Your environment | No | No | | Critic fine-tuning | Your private fine-tuned model | No | No | | Agent memory | Your agent's prompt context | No | No |
- Level 2 fine-tuning is explicitly opt-in — you export the data and fine-tune on your terms
- Fine-tuned models are scoped to your organization — never cross-pollinated
- MAP does not transmit, aggregate, or share learning data between organizations
Real-Time Events
map.on((event) => {
switch (event.type) {
case 'action:start': // Before tool execution
case 'action:complete': // After execution + logging
case 'critic:verdict': // After critic review
case 'correction:applied': // After auto-correction
case 'flagged': // Dangerous action detected
case 'rollback:start': // Before rollback
case 'rollback:complete': // After rollback
case 'session:complete': // Sequence finished
case 'agent:spawned': // New agent in the fleet
case 'agent:terminated': // Agent finished its work
case 'authorization:granted': // KYA grant issued
case 'authorization:revoked': // KYA grant revoked
case 'error': // Unrecoverable error
}
});API Reference
new MAP(config, critic)
| Config | Type | Default | Description |
|--------|------|---------|-------------|
| executor | string | required | Model for the executor agent |
| critic | string | required | Model for the critic (cheap, fast) |
| maxActions | number | 50 | Max actions before force-stop |
| autoCorrect | boolean | true | Auto-apply CORRECTED fixes |
| pauseOnFlag | boolean | true | Halt execution on FLAGGED |
| serializeState | fn | JSON.stringify | Custom state serializer |
| tags | string[] | [] | AI Gateway cost attribution tags |
Methods
| Method | Description |
|--------|-------------|
| registerTool(name, desc, schema, fn) | Register a tool with Zod schema |
| addTool(tool) | Register a pre-built MAPTool |
| connectState(getState, setState) | Connect to your environment |
| execute(goal, tool, input) | Execute one action with full provenance |
| run(goal, actions[]) | Execute a sequence |
| await rollbackTo(entryId) | Revert to a specific point (async — calls tool.restore() for RESTORE tools) |
| await rollbackToSafe() | Revert to last known good state |
| getLedger() | Get all entries |
| exportLedger() | Export audit-ready JSON |
| verifyIntegrity() | Verify hash chain |
| getStats() | Session statistics |
| on(handler) | Subscribe to events |
Ledger Entry Format
{
id: string; // UUID
sequence: number; // Position in chain
timestamp: string; // ISO 8601
action: {
tool: string;
input: Record<string, unknown>;
output: unknown;
reversalStrategy?: "COMPENSATE" | "RESTORE" | "ESCALATE";
};
stateBefore: string; // SHA-256 hash
stateAfter: string; // SHA-256 hash
snapshots: { before, after }; // Full serialized state
parentHash: string; // Previous entry's hash
hash: string; // SHA-256 of this entry
critic: {
verdict: "PASS" | "CORRECTED" | "FLAGGED";
reason: string;
correction?: { tool, input };
cost?: { inputTokens, outputTokens, model, latencyMs, costUsd };
};
status: "ACTIVE" | "ROLLED_BACK";
approval?: "pending" | "approved" | "rejected";
// Multi-agent provenance
agentId?: string; // Which agent acted
parentEntryId?: string; // Upstream agent's entry
lineage?: string[]; // Root → current agent chain
stateVersion?: number; // Optimistic concurrency
}Architecture
Human Supervisor (one person, many agents)
│
▼
┌──────────────────────────────────────────────────┐
│ @model-action-protocol/core │
│ │
│ ┌──────────┐ ┌────────┐ ┌────────────────┐ │
│ │ Executor │→ │ Critic │→ │ Ledger │ │
│ │ Harness │ │ (fast) │ │ (SHA-256 chain)│ │
│ └──────────┘ └────────┘ └────────────────┘ │
│ │ │ │
│ ┌──────────┐ ┌────────────────┐│ │
│ │ Rollback │ │ KYA (Know Your ││ │
│ │ Engine │ │ Agent) ││ │
│ └──────────┘ └────────────────┘│ │
│ │ │
│ ┌──────────────────────────────┐│ │
│ │ Agent Lifecycle Tracking ││ │
│ │ (spawn trees, ephemeral) ││ │
│ └──────────────────────────────┘│ │
└──────────────────────────────────────────────────┘
│ │ │
▼ ▼ ▼
Agent A Agent B Agent C
(Stripe) (Salesforce) (NetSuite)Design Principles
| Principle | How MAP Applies It | |-----------|---------------------| | Messages as state | The ledger IS the execution state | | Errors as feedback | Critic failures feed back, never crash | | Schema-driven tools | Zod schemas validate before execution | | Tiered model routing | Expensive execution + cheap critique | | Append-only history | Rollback adds a revert entry, never deletes | | Sub-agents as tool calls | Agent spawns tracked with full lineage |
The MAP Protocol
MAP (Model Action Protocol) is an open standard for agent action provenance.
MCP standardized how agents use tools (inputs). MAP standardizes how agents prove what they did (outputs).
The strategy:
- Open-source the protocol → every agent framework adopts it
- Hand it to regulatory agencies → system of record for agent provenance
- Commoditize the trust layer → build native zero-latency rollback into agent frameworks
Testing
# TypeScript (146 tests)
npm test
npm run test:fixtures # spec/fixtures/v0.1/ conformance
npm run test:python-output-conformance # TS verifies Python-written ledgers
# Python (108 tests + 3 skipped on missing env vars)
cd python
pip install -e ".[dev]"
pytest # everything except gated suites
pytest -m postgres # requires DB_HOST
pytest -m live_api # requires ANTHROPIC_API_KEY254 tests across both implementations cover: ledger chaining and tamper detection, hash-chain verification, critic integration (PASS/CORRECTED/FLAGGED), auto-correction, RESTORE/COMPENSATE/ESCALATE reversal lifecycle, atomic stop-on-failure rollback semantics, audit export, event emission, sequence execution, tiered critic routing, custom risk classifiers, LearningEngine pattern fingerprints, tool builders / decorators, JCS canonicalization edge cases (unicode NFC vs NFD, integer-valued floats, deep nesting, empty/large payloads), cross-language conformance both directions, and SDK integration with mocked + live Anthropic clients.
Contributing
See CONTRIBUTING.md for guidelines on how to contribute to this project.
License
MIT — by deadpxl
