@luutuankiet/write-pr

v0.2.4

Published

3 days ago

Evidence-driven Pull Request templating for Claude Code agents

0High
0Medium
0Low

kenluu

claude-code claude-code-skill pr pull-request template nunjucks evidence-driven

write-pr

Evidence-driven Pull Request templating that keeps your agent context clean.

The value isn't the template — it's that the template npx-compiles from disk, so agent context never holds the 15 KB of query results / CI logs / benchmark dumps the reviewer actually wants to see.

The problem this solves

Writing a good PR description usually means pasting evidence — a 200-row BQ result, a stack trace, a before/after benchmark, a schema diff — into the body. For an agent author that means every byte ends up in the agent's context window on the way to the PR body. A rich PR easily burns 15–50 KB of context that the agent reads, transforms slightly, and emits unchanged.

write-pr inverts this. The agent's data-gathering tools (code execution, bq, dbt, curl, cat) write to disk. The agent writes only a small template (pr.md.j2) that references the disk paths. Then npx @luutuankiet/write-pr render reads disk + interpolates + writes the final markdown. The agent never re-reads the raw evidence.

Context cost — back of the envelope

| Pattern | Agent context spent | |---|---| | Paste 200-row result table inline as markdown | ~6 KB | | Paste 30-line stack trace + 50-row benchmark + 4 KB SQL appendix | ~12 KB | | Same content via write-pr (load_json('q.json') \| md_table) | ~80 bytes per fence |

Multiply by 5–10 evidence blocks in a real refactor PR and the math is brutal. write-pr is the same idea as code execution: keep heavy intermediates out of context, push them through a script that runs once.

How it works

flowchart LR
    A["Agent runs tool<br/>(bq / dbt / pytest)"] --> B["Tool writes JSON<br/>to evidence/queries/*"]
    B --> C["Agent writes<br/>pr.md.j2 template<br/>(~500 bytes)"]
    C --> D["npx write-pr render<br/>reads disk + interpolates"]
    D --> E["PR.md<br/>(rich, multi-KB,<br/>never in agent context)"]

Step 1's output and step 5's output never co-exist in the agent's context. Only the template (step 3) does.

Use cases

1. Performance regression PR (fully worked)

Agent benchmarks before + after, the benchmark tool writes results to disk, agent writes a 6-line template:

evidence/perf.json (written by the benchmark tool; agent never reads):

{
  "before": [{"metric": "p95_ms", "value": 1450}, {"metric": "errors/min", "value": 12}],
  "after":  [{"metric": "p95_ms", "value":   55}, {"metric": "errors/min", "value":  0}]
}

pr.md.j2 (agent writes — ~150 bytes in context):

## Before / After

[% set perf = load_json('perf.json') %]
[[ perf.before | delta_table(perf.after) ]]

Rendered output (lives only on disk + GitHub, never in agent context):

## Before / After

| metric     | before | after | Δ            |
| :---       | :---   | :---  | :---         |
| p95_ms     | 1450   | 55    | -1395 (-96%) |
| errors/min | 12     | 0     | -12 (-100%)  |

2. Schema migration PR

Agent runs DESCRIBE before + after the migration, dumps both to disk as [{column, type}, ...] arrays. Template:

[[ load_json('schema_before.json') | delta_table(load_json('schema_after.json'), {key: 'column', value: 'type'}) ]]

Output: a clean diff showing added / removed columns and type changes. Agent context spent: one line.

3. CI failure attribution

Test runner dumps pytest --json-report / jest --json / dbt run_results.json to disk. Agent does a one-pass reshape on disk (jq, code execution), then:

[[ load_json('failures.json') | md_table | fold('Failures (12)') ]]

Reviewer sees a click-to-expand table. Agent context spent: ~200 bytes total. No raw test output passes through.

4. Heavy appendix without polluting the PR body

Any long artifact (full compiled SQL, raw API response, 500-line log) goes through the fold filter:

[[ load_text('appendix.sql') | fold('Compiled SQL (470 lines)') ]]
[[ load_json('api_dump.json') | json_pretty | fold('Raw API response') ]]

GitHub renders these as native <details> collapsibles. Agent writes 2 lines; reviewer gets click-to-expand evidence. The 470-line SQL never enters agent context.

Quick start

# Install the bundled skill into the current project (.claude/skills/write-pr/)
npx -y @luutuankiet/write-pr install-skill

# Or globally (~/.claude/skills/write-pr/)
npx -y @luutuankiet/write-pr install-skill --global

# Render a PR template (after the agent has written pr.md.j2 + dumped evidence)
npx -y @luutuankiet/write-pr render \
  --template pr.md.j2 \
  --evidence ./evidence \
  --out PR.md

The 6 filters

| Filter | Use | |---|---| | md_table | JSON array → markdown table | | json_pretty | Value → fenced ```json block | | gh_callout | GitHub TIP / NOTE / IMPORTANT / WARNING / CAUTION | | code_expand | Read source file → fenced code with file:line header | | delta_table | Two arrays → before/after table with computed Δ | | fold | Wrap any content in <details> collapsible |

Full signatures + 8 rule docs (incl. good-pr-traits.md) + 10 snippet patterns ship inside the skill bundle. See skills/write-pr/SKILL.md for the agent-facing entry point.

What makes a good PR

The skill describes 7 quality traits agents should hit regardless of section structure:

Spoon-feed evidence inline — every assertion followed by the data that proves it (table / JSON / SQL / mermaid diagram for topology + state + flow), no "see X" pointers
Resolve private context — translate symbolic (internal log/task IDs, private doc refs) AND verbal ("prior cycles", "baseline", "previously") private references into plain english + reproducible anchors
Collapse at section boundary — every top-level ## IS the <summary> of a <details> wrap, not just inner blocks (GitHub PR view has no TOC, so a long PR is a wall of text on first paint)
Anchor prior-run claims to a reproducible ID — "baseline failure rate" gets a job ID + inline result table
Text-fallback for auth-gated links — GCP / dbt-Cloud / internal-dashboard URLs come with an inline summary, since reviewers outside the org see a 401
Tag hypothesis vs measurement — "likely / probably / suggests" flagged or backed with the confirming query
File citations carry line ranges — src/foo.sql:42-58, not bare src/foo.sql

Framing: describe traits, not enforce template — pick the section outline that fits the change; the traits apply regardless. Full descriptions + Good/Bad tables + agent-behavior heuristics in skills/write-pr/rules/good-pr-traits.md.

Custom Jinja delimiters (important)

Templates use [[ ]] for variables and [% %] for blocks. Default {{ }} collides with dbt-Jinja and other tools that emit double-brace syntax inside code-fence evidence (silently renders to empty string). The custom delimiters dodge that without losing any Nunjucks power.

[[ load_json('q.json') | md_table ]]

[% if meta.has_breaking_change %]
[[ 'WARNING' | gh_callout(meta.breaking_change_note) ]]
[% endif %]

[# this is a comment, agents-only #]

Two template-delimiter edge cases ([[[ ]]] trap, literal-filter-in-prose) are documented in skills/write-pr/rules/template-gotchas.md with verified workarounds.

License

MIT — see LICENSE.