ai-prompting
v0.2.1
Published
Improve, rewrite, debug, evaluate, and compress prompts for any LLM agent. Ships an installable Skill plus a deterministic local CLI.
Downloads
217
Maintainers
Readme
ai-prompting
Improve, rewrite, debug, evaluate, and compress prompts for any LLM agent.
ai-prompting ships in two layers — an Anthropic-style Skill that any compatible agent can apply semantically, and a deterministic Node.js CLI that runs without API keys, without LLM calls, and without runtime dependencies. The CLI is the verifiable baseline. The Skill is where the real semantic gains happen.
30-second quickstart
# Improve any prompt from your shell:
npx ai-prompting improve --prompt "Help me write a sales email"
# Or pipe one in:
echo "scrivi una mail di vendita" | npx ai-prompting improve --mode final_only
# Verify install health:
npx ai-prompting doctorIn a Claude Code session (or any host that supports Skills) the same logic is invokable as a slash command:
/ai-prompting:improve scrivi una mail al manager per chiedere un aumento --mode diagnosticTable of contents
- What it does
- How it works
- Install
- CLI
- Slash command
- Library API
- Output modes
- The 100-point rubric
- Agent compatibility
- Examples (input → output)
- Doctor
- FAQ
- Limitations
- Contributing
- Changelog
- License
What it does
You give it a vague, padded, or unstructured prompt. It returns a clearer, cheaper, more reliable version — calibrated to the target agent, task type, and output you need.
Concretely, on every prompt it:
- Detects language (it/en) and task type (research, coding, writing, analysis, data-extraction, agentic-workflow, creative, business, education, general).
- Strips padding that hurts modern reasoning models —
"you are a world-class…","take a deep breath","think step by step","I will tip you…"plus Italian variants ("sei un esperto di livello mondiale","fai un respiro profondo","pensaci passo passo"). - Decides whether to ask for clarification (only when info is critical, contradictions exist, or the domain is high-risk: legal, medical, financial, security, production code).
- Rewrites with a canonical XML scaffold (
<role>,<objective>,<context>,<task>,<constraints>,<output_format>,<quality_bar>). - Scores the prompt on a 100-point heuristic rubric (9 categories), before/after, with confidence.
- Surfaces every assumption it made as
[ASSUMPTION: …]— so you can correct without rerunning.
How it works
┌─────────────────────────────┐
│ USER: "improve my prompt" │
└──────────────┬──────────────┘
▼
┌─────────────────────────────┐
│ classify: lang, task type, │
│ target agent (if declared) │
└──────────────┬──────────────┘
▼
╱╲ critical info ╱╲
╱ missing? ╲
╲ yes ◄──► no ╱
▼ ▼
┌──────────────┐ ┌──────────────┐
│ ask ≤ 3 Qs │ │ proceed with │
│ (multi- │ │ [ASSUMPTION] │
│ choice) │ │ markers │
└──────┬───────┘ └──────┬───────┘
└─────────┬───────┘
▼
┌─────────────────────────────┐
│ strip padding │
│ scaffold XML │
│ score 0-100 │
└──────────────┬──────────────┘
▼
┌─────────────────────────────┐
│ return in chosen mode: │
│ final_only / compact / │
│ standard / diagnostic │
└─────────────────────────────┘Two layers. The Skill (SKILL.md + references/) is read by a host agent (Claude Code, Codex CLI, Cursor, Gemini CLI, etc.); the agent applies the procedure with its own model. The CLI runs the same decisions deterministically, locally, with no LLM call — useful for batch jobs, CI gates, smoke tests, and as a verifiable baseline.
Install
Requires Node.js ≥ 18. No runtime dependencies.
# One-off use:
npx ai-prompting <command>
# Project install:
npm install --save-dev ai-prompting
# Global install:
npm install -g ai-promptingFrom source:
git clone https://github.com/matteoscurati/ai-prompting.git
cd ai-prompting
npm install
npm run build
node dist/src/cli.js doctorCLI
ai-prompting improve --prompt "Help me write something good"
ai-prompting improve --file ./prompt.txt --mode diagnostic
ai-prompting improve --prompt "..." --target claude --task research --token-budget minimal
cat prompt.txt | ai-prompting improve --mode final_only
ai-prompting doctor
ai-prompting --helpFlags
| Flag | Values | Default | Notes |
|---|---|---|---|
| --prompt <text> | inline string | — | exclusive with --file/stdin |
| --file <path> | filesystem path | — | UTF-8 |
| --mode <name> | final_only / compact / standard / diagnostic | standard | see Output modes |
| --target <agent> | claude / openai / gpt / gemini / local / coding-agent / research-agent / tool-agent | — | activates adapter rules from references/agent-compatibility.md |
| --task <type> | research / writing / coding / analysis / data-extraction / agentic-workflow / creative / business / education / general | inferred | overrides the keyword classifier |
| --token-budget <level> | minimal / balanced / generous | balanced | minimal drops <context> + <quality_bar> |
| --language <code> | it / en | auto-detect | overrides the marker-based detector |
| --audience <text> | free text | — | injected into <context>; suppresses the audience question |
| --constraints <list> | pipe-separated | — | e.g. --constraints "max 200 words|no markdown" |
| --no-score | flag | off | suppresses score block |
| --no-rationale | flag | off | suppresses per-category rationale (diagnostic) |
| --version | flag | — | prints package version |
| --help | flag | — | prints CLI help |
Slash command
A slash command ships at .claude/commands/improve.md for hosts that support the Agent Skills Open Standard. Install it system-wide:
npm run install-command # detects supported hosts and copies the file
npm run install-command -- --list # show targets
npm run install-command -- --host claude --forceAuto-detected: Claude Code (~/.claude/commands/), OpenAI Codex CLI (~/.codex/commands/), Cursor (~/.cursor/commands/). Add more in scripts/install-command.js (5 lines per host).
Once installed, in a Claude Code session:
/ai-prompting:improve <prompt> [--mode …] [--target …] [--task …] [--language …] [--audience "…"] [--token-budget …] [--clarify auto|always|never]Natural-language fallbacks work too: "in italiano", "non chiedere", "solo prompt", "compatto", "in dettaglio". Flags always win over fallbacks.
Library API
The package exposes a small, fully-typed library. No runtime deps.
import {
improvePrompt,
scorePrompt,
runDoctor,
type PromptImproverOptions,
type ImprovementResult,
} from 'ai-prompting';
const result: ImprovementResult = improvePrompt({
originalPrompt: 'Help me write a better sales email',
taskType: 'writing',
outputMode: 'standard',
language: 'it',
audience: 'CTOs di scale-up B2B',
tokenBudget: 'minimal',
askClarifyingQuestions: 'never',
});
console.log(result.improved); // string: the rewritten prompt
console.log(result.scores); // { before, after, delta, confidence }
console.log(result.assumptions); // string[]: explicit assumptions
console.log(result.clarifications); // ClarificationQuestion[]
console.log(result.changes); // ImprovementChange[]: what was modifiedFull options surface: PromptImproverOptions in dist/src/types.d.ts.
Output modes
| Mode | What you get | Use when |
|---|---|---|
| final_only | Improved prompt block only | You'll paste it elsewhere; lowest token cost |
| compact | Improved prompt + 1-line score | You want quick before/after delta |
| standard (default) | Improved prompt + change list + score + assumptions | Day-to-day work |
| diagnostic | Standard + per-category rubric breakdown | Debugging why a prompt scored low |
The 100-point rubric
| Category | Weight | What it measures |
|---|---|---|
| Intent clarity | 15 | Vague openers penalized; explicit objective rewarded |
| Context sufficiency | 15 | Load-bearing facts present; productive length band |
| Task decomposition | 10 | Numbered/bulleted steps when order matters |
| Constraint specificity | 10 | Verifiable rules: lengths, formats, exclusions |
| Output format clarity | 15 | Schema declared; example present; structured tags |
| Tool / source instructions | 10 | When to use, when not to, error handling, citations |
| Robustness vs hallucination | 10 | "Do not invent", "if unknown say so", uncertainty |
| Token efficiency | 10 | No padding; no repeated trigrams |
| Evaluation criteria | 5 | <success_criteria> or equivalent |
Total = 100. Scores are heuristic estimates, not guaranteed performance gains. Full breakdown in references/prompt-quality-rubric.md.
Agent compatibility
The package follows the Agent Skills Open Standard (Anthropic, December 2025). As of 2026, ~32 host tools read SKILL.md from the conventional directory.
| Host | Reads SKILL.md natively | Slash command directory |
|---|---|---|
| Claude Code | ✓ | ~/.claude/commands/ |
| OpenAI Codex CLI | ✓ | ~/.codex/commands/ |
| ChatGPT (Apps SDK) | ✓ via Apps SDK | uses agents/openai.yaml |
| Cursor | ✓ | ~/.cursor/commands/ |
| Gemini CLI | ✓ | ~/.gemini/commands/ (add to installer) |
| JetBrains Junie | ✓ | proprietary mechanism |
| Block Goose | ✓ | .goose/commands/ |
| AWS Kiro | ✓ | proprietary manifest |
| Older custom integrations | adapter required | call improvePrompt() as a tool |
Adapter notes per target are in references/agent-compatibility.md.
Examples (input → output)
A vague prompt with English padding
Input:
You are a world-class expert. Take a deep breath. Help me write something good.Output (standard mode, English):
## Improved prompt
<role>Expert generalist assistant — precise and direct.</role>
<objective>Write something good.</objective>
<context>[ASSUMPTION: no additional context provided; the model should flag any load-bearing assumptions]</context>
<task>Help me write something good.</task>
<constraints>
- Explicitly distinguish facts, inferences, and assumptions.
- If critical information is missing, declare it rather than inventing.
</constraints>
<output_format>Structured, direct answer; length calibrated to task complexity.</output_format>
<quality_bar>...</quality_bar>
Original: 30/100 → Improved: 79/100 (Δ +49, confidence: high)An Italian prompt with persona padding
Input:
Sei un esperto di livello mondiale. Aiutami a scrivere una mail per chiedere un aumento.Output (standard mode, Italian — note the auto-detect):
- Stripped Italian padding (
Sei un esperto di livello mondiale) - Nominalized opener (
Aiutami a scrivere…→ objective:Scrivere una mail…) - Task type →
writing, role →Senior editor con focus su chiarezza, ritmo e adeguatezza al pubblico. - Score 32 → 84 (Δ +52, confidence: high)
A high-risk prompt — triggers clarification
Input:
Estrai i dati dei pazienti da questo CSV e crea un report di compliance GDPR.The CLI detects paziente + gdpr + compliance (high-risk keyword set), surfaces:
## Domande di chiarimento
- Per quale audience? (es. tecnici, clienti finali)
- Formato di output preferito? (markdown / json / plain text / table / xml)…and proceeds with [CLARIFY] High-risk domain detected — confirmation needed. in the assumptions.
More examples in examples/.
Doctor
$ npx ai-prompting doctor
AIPrompting Doctor
✓ Node.js >= 18 (node v20.10.0)
✓ package.json valid + bin entry (version 0.1.6)
✓ SKILL.md frontmatter (name, description) (name ✓, description ✓)
✓ references/ files (5/5)
✓ bin/ai-prompting executable
✓ compiled CLI artifact (dist)
✓ smoke test (improvePrompt) (improved length=842, delta=43)
Status: OK
Node: 20.10.0 | Package: 0.1.6If a check fails, the doctor prints a fix: line for it.
FAQ
Why ship both a Skill and a CLI? They serve different consumers. The Skill is consumed by an agent that has its own LLM — that agent gets semantic rewriting. The CLI runs locally, deterministic, no API key — useful in CI, batch jobs, smoke tests, and as a fallback when no agent is available.
Does the CLI call an LLM? No. Never. The CLI is rule-based: keyword detection, regex padding strippers, XML scaffolding, heuristic scoring. Zero API calls.
Will the score improve my real prompt performance? The score is a heuristic estimate of prompt structural quality, not a measured behavioral outcome. Reliable measurement requires a golden test set on real models. The score correlates with quality in our own evals; it is not a guarantee.
Why not use a hosted prompt-improvement service? Privacy, cost, reproducibility, and offline use. The deterministic CLI is the same on every machine. The Skill, when run by a host agent, never leaves the user's existing tool.
The CLI rewrites my Italian prompt with English structure tags. The XML tags (<role>, <task>, etc.) are the protocol for the model — they are intentionally English because that's the convention every modern LLM is trained on. The content of the prompt is in your language. If you want fully Italian tags, fork and translate; the rest of the logic doesn't depend on the tag names.
Can I add a new task type / target agent? Yes — see CONTRIBUTING.md. It's two file edits and a regression test.
What happens to "think step by step"? It's stripped. Modern reasoning models (Claude 4.x, GPT-5.x, Gemini 2.5+) reason adaptively; explicit CoT instructions are redundant or counterproductive. If you target a small/local model that does not reason internally, the adapter for --target local does not strip it.
Where do I report issues? GitHub Issues. Security: see SECURITY.md.
Limitations
- CLI = deterministic baseline. It guarantees structural scaffolding and padding removal; it does not perform semantic rewriting. The richer rewrite happens when an agent host applies the Skill.
- Scores are heuristic. Useful for relative comparison (before vs after, variant A vs B); not a direct performance metric.
- Italian language detection uses a small marker set (verbs + articles + common nouns); pass
--language en|itfor certainty. - Padding patterns are explicit regexes; novel padding phrases may slip through. Open an issue with the example.
- The Skill description is dense to maximize triggering precision; tune for your host if needed.
Contributing
See CONTRIBUTING.md for dev setup, how to add task types / agent targets / scoring categories, and the release flow.
By participating, you agree to abide by the Code of Conduct.
Changelog
See CHANGELOG.md. The project follows Semantic Versioning and Keep a Changelog.
License
MIT — see LICENSE. © 2026 Matteo Scurati.
