pi-llm-as-verifier
v0.2.2
Published
Pi skill + extension for llm-as-verifier style pairwise, repeated, criteria-decomposed candidate selection.
Maintainers
Readme
pi-llm-as-verifier
Pi package for llm-as-verifier style selection and auditing.
It bundles:
- a Pi skill:
llm-as-verifier - a Pi extension tool:
llm_as_verifier - reusable prompt templates for common verifier workflows
Install
pi install npm:pi-llm-as-verifierOr test without installing globally:
pi -e npm:pi-llm-as-verifierWhat it does
This package helps Pi choose among multiple candidate artifacts using:
- pairwise comparison
- criteria decomposition
- repeated verification
- round-robin winner selection
It supports three backends:
gemini-python- Python runner inspired by the upstream paper/repozai-coding-plan- single ZAI model through Pi's model registrypi-model-ensemble- multiple Pi models rotated across repeated attempts
Tool usage
Use the llm_as_verifier tool with:
taskcandidatescriteria- optional
context - optional
evidencePaths - optional
outputPath
Multi-model repeated attempts
For mixed-model verification, use:
backend: "pi-model-ensemble"models: ["openai:gpt-5.4", "google:gemini-2.5-flash", "minimax:MiniMax-M2.7-highspeed"]
If nVerifications is omitted in ensemble mode, it defaults to the number of configured verifier models so each model gets one pass.
Weighted voting by model
For ensemble runs, you can bias some verifier models more strongly:
{
"backend": "pi-model-ensemble",
"models": [
"openai:gpt-5.4",
"google:gemini-2.5-flash",
"minimax:MiniMax-M2.7-highspeed"
],
"modelWeights": [
{ "model": "openai:gpt-5.4", "weight": 1.5 },
{ "model": "google:gemini-2.5-flash", "weight": 1.0 },
{ "model": "minimax:MiniMax-M2.7-highspeed", "weight": 0.8 }
]
}Confidence reporting
Ensemble and ZAI-backed runs now return richer breakdowns in details, including:
- criterion confidence
- pairwise confidence
- disagreement scores
- per-model breakdowns
- weighted model metadata
Example
{
"backend": "pi-model-ensemble",
"task": "Choose the strongest patch for the bug fix.",
"models": [
"openai:gpt-5.4",
"google:gemini-2.5-flash",
"minimax:MiniMax-M2.7-highspeed"
],
"modelWeights": [
{ "model": "openai:gpt-5.4", "weight": 1.3 },
{ "model": "google:gemini-2.5-flash", "weight": 1.0 },
{ "model": "minimax:MiniMax-M2.7-highspeed", "weight": 0.9 }
],
"candidates": [
{
"id": "patch-a",
"content": "..."
},
{
"id": "patch-b",
"content": "..."
}
],
"criteria": [
{
"name": "Correctness",
"description": "Check whether the patch directly fixes the requested behavior."
},
{
"name": "Requirements adherence",
"description": "Check whether exact task constraints are satisfied."
},
{
"name": "Empirical verification",
"description": "Check whether the candidate is supported by concrete test or runtime evidence."
}
]
}Prompt templates
This package also ships prompt templates:
/compare-patches/audit-candidate/ensemble-verifier
These expand into ready-made instructions for common verifier workflows.
Auth and setup
Gemini Python backend
Install:
pip install google-genaiProvide one of:
GEMINI_API_KEYGOOGLE_API_KEYVERTEX_API_KEY
Pi registry backends
For zai-coding-plan and pi-model-ensemble, configure model auth in Pi for whichever providers you want to use.
Smoke tests
Python-runner smoke test:
/lav-smokeWeighted ensemble smoke test:
/lav-ensemble-smokePackage contents
.pi/extensions/llm-as-verifier/index.ts.agents/skills/llm-as-verifier/SKILL.md.agents/skills/llm-as-verifier/scripts/lav_runner.py.agents/skills/llm-as-verifier/examples/code-patch-selection.json.agents/skills/llm-as-verifier/examples/weighted-ensemble-selection.jsonprompts/*.md- bundled references and examples
