@tuanhung303/opencode-acp

v3.0.5

Published

3 hours ago

Reduce LLM token usage by up to 50% through intelligent context pruning. Auto-supersede duplicates, manually discard/distill content, and preserve critical state.

0High
0Medium
0Low

tuanhung303

opencode opencode-plugin plugin context pruning optimization tokens

Agentic Context Pruning (ACP)

Reduce token usage by up to 50% through intelligent context management.

ACP optimizes LLM context windows by automatically pruning obsolete content—tool outputs, messages, and reasoning blocks—while preserving critical operational state.

📊 Context Flow Architecture

flowchart TB
    subgraph Input["📥 Input Layer"]
        U[User Message]
        T[Tool Outputs]
        M[Assistant Messages]
        R[Thinking Blocks]
    end

    subgraph Processing["⚙️ ACP Processing"]
        direction TB
        Auto["Auto-Supersede"]
        Manual["Manual Pruning"]

        subgraph AutoStrategies["Auto-Supersede Strategies"]
            H[Hash-Based<br/>Duplicates]
            F[File-Based<br/>Operations]
            Todo[Todo-Based<br/>Updates]
            URL[Source-URL<br/>Fetches]
            SQ[State Query<br/>Dedup]
        end

        subgraph ManualTools["Manual Tools"]
            D[Discard]
            Dist[Distill]
        end
    end

    subgraph Output["📤 Optimized Context"]
        Clean[Clean Context<br/>~50% smaller]
        L[LLM Provider]
    end

    U --> Processing
    T --> Auto
    M --> Manual
    R --> Manual

    Auto --> H
    Auto --> F
    Auto --> Todo
    Auto --> URL
    Auto --> SQ

    Manual --> D
    Manual --> Dist

    H --> Clean
    F --> Clean
    Todo --> Clean
    URL --> Clean
    SQ --> Clean
    D --> Clean
    Dist --> Clean

    Clean --> L

    style Input fill:#e1f5fe,stroke:#01579b,stroke-width:2px
    style Processing fill:#f3e5f5,stroke:#4a148c,stroke-width:2px
    style AutoStrategies fill:#fff3e0,stroke:#e65100,stroke-width:1px
    style ManualTools fill:#e8f5e9,stroke:#1b5e20,stroke-width:1px
    style Output fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px

🚀 Quick Start

Installation

npm install @tuanhung303/opencode-acp

Add to your OpenCode config:

// opencode.jsonc
{
    "plugin": ["@tuanhung303/opencode-acp@latest"],
}

Basic Usage

ACP handles most pruning automatically. The following tools give agents granular control over context:

// Discard completed work
context({ action: "discard", targets: [["a1b2c3"]] })

// Distill large outputs
context({
    action: "distill",
    targets: [["d4e5f6", "Found 15 TypeScript files"]],
})

// Batch operations
context({
    action: "discard",
    targets: [["hash1"], ["hash2"], ["hash3"]],
})

📚 Documentation

| Document | Purpose | | --------------------------------------------------------------- | ------------------------------------------ | | Validation Guide | 43 comprehensive test cases | | Test Harness | Ready-to-run test scripts | | Todo Write Testing Guide | Testing todowrite & stuck task detection | | Context Architecture | Memory management strategies | | Decision Tree | Visual pruning flowcharts | | Limitations Report | What cannot be pruned | | Changelog | Version history and migration guides |

🤖 Agent Auto Mode

ACP provides the context tool for intelligent context management:

Tool Interface

context({
    action: "discard" | "distill",
    targets: [string, string?][]  // [[target, summary?], ...]
})

Target Types

| Type | Format | Example | | ------------------- | ----------- | ------------------ | | Tool outputs | 6 hex chars | 44136f, 01cb91 | | Thinking blocks | 6 hex chars | abc123 | | Messages | 6 hex chars | def456 |

Batch Operations

// Prune multiple items at once
context({
    action: "discard",
    targets: [
        ["44136f"], // Tool output
        ["abc123"], // Thinking block
        ["def456"], // Message
    ],
})

// Distill with shared summary
context({
    action: "distill",
    targets: [
        ["44136f", "Research phase complete"],
        ["01cb91", "Research phase complete"],
    ],
})

🔄 Auto-Supersede Mechanisms

ACP automatically removes redundant content through multiple strategies:

1. Hash-Based Supersede

Duplicate tool calls with identical arguments are automatically deduplicated.

┌─────────────────────────────────────┐        ┌─────────────────────────────────────┐
│ BEFORE:                             │        │ AFTER:                              │
│                                     │        │                                     │
│   1. read(package.json) #a1b2c3     │   ───► │   ...other work...                  │
│   2. ...other work...               │        │   3. read(package.json) #d4e5f6◄──┐ │
│   3. read(package.json) #d4e5f6     │        │                                     │
│                                     │        │  First call superseded (hash match) │
│  Tokens: ~15,000                    │        │  Tokens: ~10,000  (-33%)            │
└─────────────────────────────────────┘        └─────────────────────────────────────┘

2. File-Based Supersede (One-File-One-View)

File operations automatically supersede previous operations on the same file.

┌─────────────────────────────────────┐        ┌─────────────────────────────────────┐
│ BEFORE:                             │        │ AFTER:                              │
│                                     │        │                                     │
│   1. read(config.ts)                │   ───► │                                     │
│   2. write(config.ts)               │        │   3. edit(config.ts)◄────────────┐  │
│   3. edit(config.ts)                │        │                                     │
│                                     │        │  Previous operations pruned         │
│  Tokens: ~18,000                    │        │  Tokens: ~6,000  (-67%)             │
└─────────────────────────────────────┘        └─────────────────────────────────────┘

3. Todo-Based Supersede (One-Todo-One-View)

Todo operations automatically supersede previous todo states.

┌─────────────────────────────────────┐        ┌─────────────────────────────────────┐
│ BEFORE:                             │        │ AFTER:                              │
│                                     │        │                                     │
│   1. todowrite: pending             │   ───► │                                     │
│   2. todowrite: in_progress         │        │   3. todowrite: completed◄────────┐ │
│   3. todowrite: completed           │        │                                     │
│                                     │        │  Previous states auto-pruned        │
│  Tokens: ~4,500                     │        │  Tokens: ~1,500  (-67%)             │
└─────────────────────────────────────┘        └─────────────────────────────────────┘

4. Source-URL Supersede

Identical URL fetches are deduplicated—only the latest response is retained.

5. State Query Supersede

State queries (ls, find, git status) are deduplicated—only the latest results matter.

🛡️ Protected Tools

These tools are exempt from pruning to ensure operational continuity:

task, todowrite, todoread, context, batch, write, edit, plan_enter, plan_exit

Additional tools can be protected via configuration:

{
    "commands": {
        "protectedTools": ["my_custom_tool"],
    },
}

⚙️ Configuration

ACP uses its own config file with multiple levels:

Priority: Defaults → Global → Config Dir → Project

Global: ~/.config/opencode/acp.jsonc
Config Dir: $OPENCODE_CONFIG_DIR/acp.jsonc
Project: .opencode/acp.jsonc

Default Configuration

{
    "$schema": "https://raw.githubusercontent.com/tuanhung303/opencode-agent-context-pruning/master/acp.schema.json",
    "enabled": true,
    "autoPruneAfterTool": false,
    "pruneNotification": "minimal",

    "commands": {
        "enabled": true,
        "protectedTools": [],
    },

    "tools": {
        "discard": { "enabled": true },
        "distill": { "enabled": true },
        "todoReminder": { "enabled": true },
        "automataMode": { "enabled": true },
    },

    "strategies": {
        "deduplication": { "enabled": false },
        "purgeErrors": { "enabled": false },
        "truncation": { "enabled": false },
        "thinkingCompression": { "enabled": false },
        "supersedeWrites": { "enabled": false },
    },
}

Aggressive Pruning (Opt-In)

Enable for up to 50% token savings:

{
    "strategies": {
        "aggressivePruning": {
            "pruneToolInputs": true, // Strip verbose inputs
            "pruneStepMarkers": true, // Remove step markers
            "pruneSourceUrls": true, // Dedup URL fetches
            "pruneFiles": true, // Mask file attachments
            "pruneSnapshots": true, // Keep latest snapshot
            "pruneRetryParts": true, // Prune failed retries
            "pruneUserCodeBlocks": true, // Truncate old code blocks
            "truncateOldErrors": true, // Truncate old errors
            "aggressiveFilePrune": true, // One-file-one-view
            "stateQuerySupersede": true, // Dedup state queries
        },
    },
}

📊 Token Savings

| Metric | Without ACP | With ACP | Savings | | ------------------- | ------------ | ----------- | ------- | | Typical Session | ~80k tokens | ~40k tokens | 50% | | Long Session | ~150k tokens | ~75k tokens | 50% | | File-Heavy Work | ~100k tokens | ~35k tokens | 65% |

Cache Impact: ~65% cache hit rate with ACP vs ~85% without. The token savings typically outweigh the cache miss cost, especially in long sessions.

🧪 Testing

Run the comprehensive test suite:

# Load test todos
todowrite({ /* copy from docs/VALIDATION_GUIDE.md */ })

# Run preparation
prep-0 through prep-7

# Execute tests
t1 through t43

# Generate report
report-1 through report-4

See Validation Guide for detailed test procedures.

📋 Pruning Workflow

Complete example: execute tool → find hash → prune.

Step 1: Run a tool

read({ filePath: "src/config.ts" })
// Output includes: <tool_hash>a1b2c3</tool_hash>

Step 2: Find the hash in output

... file contents ...
<tool_hash>a1b2c3</tool_hash>

Step 3: Prune when no longer needed

context({ action: "discard", targets: [["a1b2c3"]] })
// Response: 「 🗑️ discard ✓ 」- ⚙️ read
// Available: Tools(5), Messages(2), Reasoning(1)

Batch multiple targets:

context({ action: "discard", targets: [["a1b2c3"], ["d4e5f6"], ["g7h8i9"]] })

Distill with summary:

context({ action: "distill", targets: [["abc123", "Auth: chose JWT over sessions"]] })

🏗️ Architecture Overview

flowchart TD
    subgraph OpenCode["OpenCode Core"]
        direction TB
        A[User Message] --> B[Session]
        B --> C[Transform Hook]
        C --> D[toModelMessages]
        D --> E[LLM Provider]
    end

    subgraph ACP["ACP Plugin"]
        direction TB
        C --> F[syncToolCache]
        F --> G[injectHashes]
        G --> H[Apply Strategies]
        H --> I[prune]
        I --> C
    end

    style OpenCode fill:#F4F7F9,stroke:#5A6B8A,stroke-width:1.5px
    style ACP fill:#E8F5F2,stroke:#9AC4C0,stroke-width:1.5px

ACP hooks into OpenCode's message flow to reduce context size before sending to the LLM:

Sync Tool Cache - Updates internal tool state tracking
Inject Hashes - Makes content addressable for pruning
Apply Strategies - Runs auto-supersede mechanisms
Prune - Applies manual and automatic pruning rules

📝 Commands

| Command | Description | | ---------------- | --------------------------------- | | /acp | List available commands | | /acp context | Show token usage breakdown | | /acp stats | Show aggregate pruning statistics | | /acp sweep [n] | Prune last N tool outputs |

🔧 Advanced Features

Todo Reminder

Monitors todowrite usage and prompts when tasks are neglected:

{
    "tools": {
        "todoReminder": {
            "enabled": true,
            "initialTurns": 8, // First reminder after 8 turns without todo update
            "repeatTurns": 4, // Subsequent reminders every 4 turns
            "stuckTaskTurns": 12, // Threshold for stuck task detection
        },
    },
}

Reminder Behavior:

First reminder: Fires after initialTurns (8) turns without todowrite
Repeat reminders: Fire every repeatTurns (4) turns thereafter
Auto-reset: Each todowrite call resets the counter to 0
Deduplication: Only ONE reminder exists in context at a time; new reminders replace old ones
Stuck task detection: Tasks in in_progress for stuckTaskTurns (12) are flagged with guidance
Prunable outputs: Reminder displays a list of prunable tool outputs to help with cleanup

Reminder Sequence:

Turn 0:  todowrite() called (resets counter)
Turn 8:  🔖 First reminder (if no todowrite since turn 0)
Turn 12: 🔖 Repeat reminder
Turn 16: 🔖 Repeat reminder
...

Automata Mode

Autonomous reflection triggered by "automata" keyword:

{
    "tools": {
        "automataMode": {
            "enabled": true,
            "initialTurns": 8, // Turns before first reflection
        },
    },
}

Stuck Task Detection

Identifies tasks stuck in in_progress for too long:

{
    "tools": {
        "todoReminder": {
            "stuckTaskTurns": 12, // Threshold for stuck detection
        },
    },
}

🚧 Limitations

Subagents: ACP is disabled for subagent sessions
Cache Invalidation: Pruning mid-conversation invalidates prompt caches
Protected Tools: Some tools cannot be pruned by design

🛠️ Troubleshooting

Error: `reasoning_content is missing` (400 Bad Request)

Cause: Using Anthropic/DeepSeek/Kimi thinking mode with an outdated ACP version or missing reasoning sync.

Fix:

Update to ACP v3.0.0+: npm install @tuanhung303/opencode-acp@latest
Ensure your config has thinking-compatible settings
See Thinking Mode Compatibility for details

Plugin Not Loading

Symptoms: Commands like /acp return "Unknown command"

Fix:

Verify plugin is in opencode.jsonc: "plugin": ["@tuanhung303/opencode-acp@latest"]
Run npm run build && npm link in the plugin directory
Restart OpenCode

High Token Usage Despite ACP

Check:

Is aggressive pruning enabled in config? See Configuration
Are you using protected tools excessively? (task, write, edit can't be pruned)
Is your session >100 turns? Consider starting a fresh session

🔬 Provider Compatibility

Thinking Mode APIs (Anthropic, DeepSeek, Kimi)

ACP is fully compatible with extended thinking mode APIs that require the reasoning_content field. The context tool automatically syncs reasoning content to prevent 400 Bad Request errors.

Supported providers: Anthropic, DeepSeek, Kimi
Not required: OpenAI, Google

See the detailed technical documentation for implementation details and the root cause of the original compatibility issue.

📦 npm Package

Package: @tuanhung303/opencode-acp
License: MIT
Repository: https://github.com/tuanhung303/opencode-agent-context-pruning

Installation Methods

# Via npm
npm install @tuanhung303/opencode-acp

# Via OpenCode config
# Add to opencode.jsonc: "plugin": ["@tuanhung303/opencode-acp@latest"]

# Via URL (for agents)
curl -s https://raw.githubusercontent.com/tuanhung303/opencode-acp/master/README.md

CI/CD

CI: Every PR triggers linting, type checking, and unit tests
CD: Merges to main auto-publish to npm

🤝 Contributing

Fork the repository
Create a feature branch
Run tests: npm test
Submit a pull request

📄 License

Read this section before modifying ACP code. These are hard-won lessons from debugging production issues.

1. Always Fetch Messages in All Code Paths

❌ WRONG:

async function executeContextToolDiscard(ctx, toolCtx, hashes) {
    const { state, logger } = ctx

    // Validate hashes...

    if (validHashes.length === 0) {
        // Early return without fetching messages
        const currentParams = getCurrentParams(state, [], logger)  // ← BUG: Empty array
        return "No valid hashes"
    }

    // Only fetch messages in success path
    const messages = await client.session.messages(...)
}

✅ CORRECT:

async function executeContextToolDiscard(ctx, toolCtx, hashes) {
    const { client, state, logger } = ctx

    // ALWAYS fetch messages first - required for thinking mode API compatibility
    const messagesResponse = await client.session.messages({
        path: { id: toolCtx.sessionID },
    })
    const messages = messagesResponse.data || messagesResponse

    // ALWAYS initialize session - syncs reasoning_content
    await ensureSessionInitialized(client, state, toolCtx.sessionID, logger, messages)

    // Now validate hashes...

    if (validHashes.length === 0) {
        const currentParams = getCurrentParams(state, messages, logger) // ← Use actual messages
        return "No valid hashes"
    }
}

Why? Anthropic's thinking mode API requires reasoning_content on all assistant messages with tool calls. Skipping ensureSessionInitialized causes 400 errors.

2. Never Skip `ensureSessionInitialized`

This function syncs reasoning_content from message parts to msg.info. Without it:

error, status code: 400, message: thinking is enabled but reasoning_content is missing
in assistant tool call message at index 2

Rule: Call ensureSessionInitialized at the START of every context tool function, before any early returns.

3. Thinking Mode: Distill, Don't Discard Reasoning

❌ WRONG:

// Completely removing reasoning_content breaks API
state.prune.reasoningPartIds.push(partId)
// No replacement content → field removed → API error

✅ CORRECT:

// Convert discard to distill with minimal placeholder
if (reasoningHashes.length > 0) {
    const minimalSummaries = reasoningHashes.map(() => "—")
    await executeContextReasoningDistill(
        ctx,
        toolCtx,
        reasoningHashes.map((h, i) => [h, minimalSummaries[i]]),
    )
}

Why? The reasoning_content field must exist (even if minimal) for thinking mode providers.

4. Test with Non-Existing Hashes

Always test context tool changes with:

Valid existing hashes (success path)
Non-existing hashes like "zzzzzz" (no-op path)
Mix of valid and invalid hashes

The no-op path is where most bugs hide because it's less tested.

5. Provider-Specific Behavior

| Provider | Thinking Mode | reasoning_content Required | Notes | | --------- | ----------------- | ---------------------------- | -------------------- | | Anthropic | Extended thinking | ✅ Yes | Strict validation | | DeepSeek | DeepThink | ✅ Yes | Similar to Anthropic | | Kimi | K1 thinking | ✅ Yes | Similar to Anthropic | | OpenAI | N/A | ❌ No | No thinking mode | | Google | N/A | ❌ No | No thinking mode |

Always test with Anthropic when modifying context/pruning logic—it has the strictest API validation.

6. The Transform Hook Timing

User Message → Transform Hook → ensureReasoningContentSync → API Call
                    ↑
            Context tool runs HERE

If context tool returns early (no-op), the transform hook may not have synced reasoning_content yet. That's why each context tool function must call ensureSessionInitialized itself.

7. Debug Checklist

When you see reasoning_content is missing errors:

Check the error index - "at index 2" means the 3rd message in the array
Verify message fetch - Is the function fetching messages before early returns?
Verify session init - Is ensureSessionInitialized called in ALL code paths?
Check the no-op path - Does the early return path have proper initialization?
Test with invalid hash - context({ action: "discard", targets: [["zzzzzz"]] })

If you're an AI assistant reading this:

Quick Pruning Reference:
Light pressure (<50%): No action
Moderate (50-75%): Prune old tool outputs using context({ action: "discard", targets: [[hash]] })
High (75-90%): Anchor critical info, then batch prune non-essential items
Critical (>90%): Aggressively discard all disposable context
Always Protect: Active todos, current file edits, user requirements

Ask the user: "Would you like me to run a context optimization check?"

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

Agentic Context Pruning (ACP)

📊 Context Flow Architecture

🚀 Quick Start

Installation

Basic Usage

📚 Documentation

🤖 Agent Auto Mode

Tool Interface

Target Types

Batch Operations

🔄 Auto-Supersede Mechanisms

1. Hash-Based Supersede

2. File-Based Supersede (One-File-One-View)

3. Todo-Based Supersede (One-Todo-One-View)

4. Source-URL Supersede

5. State Query Supersede

🛡️ Protected Tools

⚙️ Configuration

Default Configuration

Aggressive Pruning (Opt-In)

📊 Token Savings

🧪 Testing

📋 Pruning Workflow

🏗️ Architecture Overview

📝 Commands

🔧 Advanced Features

Todo Reminder

Automata Mode

Stuck Task Detection

🚧 Limitations

🛠️ Troubleshooting

Error: reasoning_content is missing (400 Bad Request)

Plugin Not Loading

High Token Usage Despite ACP

🔬 Provider Compatibility

Thinking Mode APIs (Anthropic, DeepSeek, Kimi)

📦 npm Package

Installation Methods

CI/CD

🤝 Contributing

📄 License

1. Always Fetch Messages in All Code Paths

2. Never Skip ensureSessionInitialized

3. Thinking Mode: Distill, Don't Discard Reasoning

4. Test with Non-Existing Hashes

5. Provider-Specific Behavior

6. The Transform Hook Timing

7. Debug Checklist

Error: `reasoning_content is missing` (400 Bad Request)

2. Never Skip `ensureSessionInitialized`