npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

talking-cli

v0.3.0

Published

A linter that audits agent skills: is your CLI mute?

Downloads

309

Readme

Talking CLI

Tool silence is a design defect. Distributed Prompting is the fix.

License: MIT Node >= 18 Self-Audit CI

The self-audit badge shows talking-cli's own audit score (100/100). CI enforces ≥80 on every PR.

Sound familiar?

Your SKILL.md is 400 lines. Half of it describes what the agent should do after a specific tool returns — "if zero results, broaden the query," "if ambiguous, ask the user," "this field means X, not Y."

The agent loads all 400 lines every single turn, but most of that guidance only matters 10% of the time. The other 90%, it's paying attention rent on scenarios that didn't happen.

Meanwhile, your tools return raw JSON and say nothing. No hint about what just happened. No signal that results were sparse or the query was ambiguous. The tools are mute, so all the guidance gets shoved upstream into SKILL.md, which slowly bloats into a monologue describing every possible outcome — most of which the agent promptly forgets or ignores.

Talking CLI gives your tools a voice. When the agent calls, the tool talks back — with the right hint, at the right moment, inside the response. We call this Prompt-On-Call: guidance that surfaces only when a tool is called, relevant only to what just happened.

The cumulative effect is Distributed Prompting: a prompt surface spread across every tool response, not crammed into one bloated document.


Standing on shoulders. CLI is the native interface for AI agents — Carmack, CodeAct (Wang et al., ICML 2024), and Karpathy crystallized it.

Progressive disclosure as a skill-loading architecture was formalized by Anthropic (Oct 2025) and is now an open standard. Anthropic also advocates "steering agents with helpful instructions in tool responses" — but only as a paragraph-level best practice. Nobody has named it, budgeted it, audited it, or proposed it as a protocol-level primitive. That gap is what Talking CLI fills.


What this project is

Talking CLI is built around one idea: Distributed Prompting — moving guidance from static SKILL.md into the moment of invocation.

  1. MethodologyPHILOSOPHY.md: Four Channels (C1–C4), Four Rules of Talk, a prompt budget, and five anti-patterns.
  2. Evidence — a reproducible 2×2 ablation benchmark across 15 curated tasks (published below).
  3. Standard — a proposed agent_hints convention we are taking to the MCP spec, backed by the data.

The linter (talking-cli audit / audit-mcp) is the probe, not the hero. It's how you reproduce the audit numbers on your own skill.

Core claim

Prompt Surface = SKILL.md{tool_result.hints} — two halves, one budget.

Anything you write into SKILL.md that only applies after a specific tool call is mispriced: it costs every turn and earns only on a small fraction of turns. Tool hints fix the pricing.


How it works

The Prompt Budget Shift

graph LR
    subgraph Before ["❌ Before: Mute CLI"]
        A1[SKILL.md<br/>400+ lines] --> A2[Agent]
        A3[Tool returns<br/>raw JSON only] --> A2
        A1 -.->|"guidance shoved upstream"| A3
    end

    subgraph After ["✅ After: Distributed Prompting"]
        B1[SKILL.md<br/>&lt; 150 lines] --> B2[Agent]
        B3[Tool returns<br/>JSON + hints] --> B2
    end

    Before -->|Audit + Optimize| After

Four Heuristics, One Score

graph TD
    H1[H1 · Document Budget<br/>SKILL.md ≤ 150 lines]
    H2[H2 · Fixture Coverage<br/>error + empty scenarios]
    H3[H3 · Structured Hints<br/>hints / suggestions / guidance]
    H4[H4 · Actionable Guidance<br/>specific, actionable content]

    H1 & H2 & H3 & H4 --> Score[Total Score<br/>0–100]
    Score -->|≥ 80| Pass[✅ PASS]
    Score -->|< 80| Fail[❌ FAIL]

Quick Start

# Audit your skill — coach mode (plain language, actionable)
npx talking-cli audit ./my-skill

# CI mode — machine-readable, exit code driven
npx talking-cli audit ./my-skill --ci

# JSON mode — structured output for tooling
npx talking-cli audit ./my-skill --json

# Audit an MCP server — static analysis (fast, safe)
npx talking-cli audit-mcp ./my-mcp-server

# Deep audit — runtime heuristics (spawns server)
# ⚠️ Only use --deep on servers you trust. See SECURITY.md.
npx talking-cli audit-mcp ./my-mcp-server --deep

# Generate optimization plan (plan-only, never touches source files)
npx talking-cli optimize ./my-skill

# Scaffold a new skill directory with templates that pass audit
npx talking-cli init my-skill
cd my-skill
npx talking-cli audit .

All commands are fully local — no API key required.


What it looks like

Coach mode running against a bloated, mute skill:

Score: 0/100
Yikes. Your CLI is so quiet I can hear the tokens screaming in agony.

H1 · Line Count · FAIL
Your SKILL.md is 165 lines. The budget is 150.
→ Just 15 lines over. Tighten the prose and migrate post-call guidance to tool hints.

H2 · Hint Coverage · FAIL
1 tool(s) have zero fixtures. They don't speak at all: search
→ Add talking-cli-fixtures for [search]. One error, one empty-result scenario.

H3 · Structured Hints · FAIL
0/0 passed fixtures contain hint fields.
→ Make your tools return a "hints" or "suggestions" field alongside raw data.

H4 · Actionable Guidance · FAIL
0/0 hint fields have actionable content.
→ Hints should be specific. "Try broadening your query with fewer filters" is actionable.

---
Fix the issues above, then run npx talking-cli audit again to see your new score.

(The real output is colored. We just can't show chalk in a code block.)


The finding: MCP Ecosystem Audit

We ran talking-cli audit-mcp --deep against 4 official Anthropic MCP servers across 68 error / empty-result scenarios. Number of scenarios that returned actionable guidance:

0 / 68.

Static analysis of 823 Composio GitHub tools: same result. The MCP ecosystem today treats tool output as a data pipe, not a dialogue participant.

| Server | Tools | Scenarios | M3 · Guidance | |--------|-------|-----------|---------------| | server-filesystem | 11 | 21 | 0 | | server-everything | 13 | 13 | 0 | | server-memory | 9 | 9 | 0 | | server-github | 25 | 25 | 0 | | Total | 58 | 68 | 0 / 68 |

2×2 Ablation Benchmark (GLM-5.1)

We ran a 2×2 ablation (Full/Lean Skill × Mute/Hinting Tools) on GLM-5.1 across 15 curated tasks:

| Cell | Skill | Server | Pass Rate | Avg Input Tokens | |------|-------|--------|-----------|-----------------| | 1 | Full Skill (873 lines) | Mute Tools | 7/15 (47%) | 122,562 | | 2 | Full Skill | Hinting Tools | 8/15 (53%) | 96,829 | | 3 | Lean Skill (168 lines) | Mute Tools | 8/15 (53%) | 54,078 | | 4 | Lean Skill | Hinting Tools | 11/15 (73%) | 40,815 |

Key findings:

  • Combined effect (Cell 4 vs Cell 1): −67% tokens, +26pp pass rate — both efficiency and quality improve.
  • Skill compression alone: −56% tokens, +6pp
  • Tool hints alone: +6pp
  • Synergistic interaction: the combined effect exceeds the sum of individual effects
  • Verdict: GREAT SUCCESS

Why compression helps: The 873-line skill at P99.5 of real-world sizes consumes ~8,700 tokens and accumulates across turns, crowding task data toward the context window's far end where attention is weakest. SkillsBench (arXiv 2602.12670, 36,000 real-world skills) independently found that comprehensive skills at P99.5 degrade performance by −2.9pp while moderate skills improve it by +18.8pp — confirming the direction at ecosystem scale.

Historical context and reproduction instructions → benchmark/. Full methodology and limitations → docs/BENCHMARK-METHODOLOGY.md.


The Methodology

Talking CLI is the reference implementation of Distributed Prompting: every tool response is a designed prompt surface, not a data dump. Prompt-On-Call is the concrete mechanism — guidance that arrives when the tool is called, relevant to what just happened. The cumulative effect across every tool in the system is Distributed Prompting.

  • PHILOSOPHY.md — the methodology: Four Channels, Four Rules, a budget, and five anti-patterns.
  • Adversarial Case Study — where Distributed Prompting fails, and what to do about it.

What's next

  • Cross-model validation — replicating the 2×2 ablation on Claude and other providers
  • MCP spec proposal — RFC for a first-class agent_hints field in tool responses
  • H4 semantic upgrade — replacing the ≥ 10 chars heuristic with a lightweight classifier
  • Real-world validation — auditing and benchmarking real MCP servers with before/after results

See PHILOSOPHY.md §Evidence for the full benchmark data including historical DeepSeek-V3.2 results and MiniMax M2.7 validation.


License

MIT