@luutuankiet/write-pr
v0.2.4
Published
Evidence-driven Pull Request templating for Claude Code agents
Maintainers
Readme
write-pr
Evidence-driven Pull Request templating that keeps your agent context clean.
The value isn't the template — it's that the template
npx-compiles from disk, so agent context never holds the 15 KB of query results / CI logs / benchmark dumps the reviewer actually wants to see.
The problem this solves
Writing a good PR description usually means pasting evidence — a 200-row BQ result, a stack trace, a before/after benchmark, a schema diff — into the body. For an agent author that means every byte ends up in the agent's context window on the way to the PR body. A rich PR easily burns 15–50 KB of context that the agent reads, transforms slightly, and emits unchanged.
write-pr inverts this. The agent's data-gathering tools (code execution, bq, dbt, curl, cat) write to disk. The agent writes only a small template (pr.md.j2) that references the disk paths. Then npx @luutuankiet/write-pr render reads disk + interpolates + writes the final markdown. The agent never re-reads the raw evidence.
Context cost — back of the envelope
| Pattern | Agent context spent |
|---|---|
| Paste 200-row result table inline as markdown | ~6 KB |
| Paste 30-line stack trace + 50-row benchmark + 4 KB SQL appendix | ~12 KB |
| Same content via write-pr (load_json('q.json') \| md_table) | ~80 bytes per fence |
Multiply by 5–10 evidence blocks in a real refactor PR and the math is brutal. write-pr is the same idea as code execution: keep heavy intermediates out of context, push them through a script that runs once.
How it works
flowchart LR
A["Agent runs tool<br/>(bq / dbt / pytest)"] --> B["Tool writes JSON<br/>to evidence/queries/*"]
B --> C["Agent writes<br/>pr.md.j2 template<br/>(~500 bytes)"]
C --> D["npx write-pr render<br/>reads disk + interpolates"]
D --> E["PR.md<br/>(rich, multi-KB,<br/>never in agent context)"]Step 1's output and step 5's output never co-exist in the agent's context. Only the template (step 3) does.
Use cases
1. Performance regression PR (fully worked)
Agent benchmarks before + after, the benchmark tool writes results to disk, agent writes a 6-line template:
evidence/perf.json (written by the benchmark tool; agent never reads):
{
"before": [{"metric": "p95_ms", "value": 1450}, {"metric": "errors/min", "value": 12}],
"after": [{"metric": "p95_ms", "value": 55}, {"metric": "errors/min", "value": 0}]
}pr.md.j2 (agent writes — ~150 bytes in context):
## Before / After
[% set perf = load_json('perf.json') %]
[[ perf.before | delta_table(perf.after) ]]Rendered output (lives only on disk + GitHub, never in agent context):
## Before / After
| metric | before | after | Δ |
| :--- | :--- | :--- | :--- |
| p95_ms | 1450 | 55 | -1395 (-96%) |
| errors/min | 12 | 0 | -12 (-100%) |2. Schema migration PR
Agent runs DESCRIBE before + after the migration, dumps both to disk as [{column, type}, ...] arrays. Template:
[[ load_json('schema_before.json') | delta_table(load_json('schema_after.json'), {key: 'column', value: 'type'}) ]]Output: a clean diff showing added / removed columns and type changes. Agent context spent: one line.
3. CI failure attribution
Test runner dumps pytest --json-report / jest --json / dbt run_results.json to disk. Agent does a one-pass reshape on disk (jq, code execution), then:
[[ load_json('failures.json') | md_table | fold('Failures (12)') ]]Reviewer sees a click-to-expand table. Agent context spent: ~200 bytes total. No raw test output passes through.
4. Heavy appendix without polluting the PR body
Any long artifact (full compiled SQL, raw API response, 500-line log) goes through the fold filter:
[[ load_text('appendix.sql') | fold('Compiled SQL (470 lines)') ]]
[[ load_json('api_dump.json') | json_pretty | fold('Raw API response') ]]GitHub renders these as native <details> collapsibles. Agent writes 2 lines; reviewer gets click-to-expand evidence. The 470-line SQL never enters agent context.
Quick start
# Install the bundled skill into the current project (.claude/skills/write-pr/)
npx -y @luutuankiet/write-pr install-skill
# Or globally (~/.claude/skills/write-pr/)
npx -y @luutuankiet/write-pr install-skill --global
# Render a PR template (after the agent has written pr.md.j2 + dumped evidence)
npx -y @luutuankiet/write-pr render \
--template pr.md.j2 \
--evidence ./evidence \
--out PR.mdThe 6 filters
| Filter | Use |
|---|---|
| md_table | JSON array → markdown table |
| json_pretty | Value → fenced ```json block |
| gh_callout | GitHub TIP / NOTE / IMPORTANT / WARNING / CAUTION |
| code_expand | Read source file → fenced code with file:line header |
| delta_table | Two arrays → before/after table with computed Δ |
| fold | Wrap any content in <details> collapsible |
Full signatures + 8 rule docs (incl. good-pr-traits.md) + 10 snippet patterns ship inside the skill bundle. See skills/write-pr/SKILL.md for the agent-facing entry point.
What makes a good PR
The skill describes 7 quality traits agents should hit regardless of section structure:
- Spoon-feed evidence inline — every assertion followed by the data that proves it (table / JSON / SQL / mermaid diagram for topology + state + flow), no "see X" pointers
- Resolve private context — translate symbolic (internal log/task IDs, private doc refs) AND verbal ("prior cycles", "baseline", "previously") private references into plain english + reproducible anchors
- Collapse at section boundary — every top-level
##IS the<summary>of a<details>wrap, not just inner blocks (GitHub PR view has no TOC, so a long PR is a wall of text on first paint) - Anchor prior-run claims to a reproducible ID — "baseline failure rate" gets a job ID + inline result table
- Text-fallback for auth-gated links — GCP / dbt-Cloud / internal-dashboard URLs come with an inline summary, since reviewers outside the org see a 401
- Tag hypothesis vs measurement — "likely / probably / suggests" flagged or backed with the confirming query
- File citations carry line ranges —
src/foo.sql:42-58, not baresrc/foo.sql
Framing: describe traits, not enforce template — pick the section outline that fits the change; the traits apply regardless. Full descriptions + Good/Bad tables + agent-behavior heuristics in skills/write-pr/rules/good-pr-traits.md.
Custom Jinja delimiters (important)
Templates use [[ ]] for variables and [% %] for blocks. Default {{ }} collides with dbt-Jinja and other tools that emit double-brace syntax inside code-fence evidence (silently renders to empty string). The custom delimiters dodge that without losing any Nunjucks power.
[[ load_json('q.json') | md_table ]]
[% if meta.has_breaking_change %]
[[ 'WARNING' | gh_callout(meta.breaking_change_note) ]]
[% endif %]
[# this is a comment, agents-only #]Two template-delimiter edge cases ([[[ ]]] trap, literal-filter-in-prose) are documented in skills/write-pr/rules/template-gotchas.md with verified workarounds.
License
MIT — see LICENSE.
