@drawcall/praxis
v0.1.14
Published
Define, train, and use optimized LLM prompts
Downloads
764
Maintainers
Readme
praxis
knowledge through action
Praxis turns a schema + examples into an optimized LLM prompt. You define the task, Praxis finds the best prompt automatically using AX (TypeScript DSPy), and you get a portable model.config.json that works with any LLM.
Install
npm install @drawcall/praxisOr add it as a skill for your coding agent (Claude Code, Cursor, Copilot, Codex, Gemini CLI, and more):
npx skills add drawcall-ai/praxisWorkflow: Define → Train (optional) → Build → Run
1. Define
Create model.definition.ts with a schema, examples, and a metric:
import { z } from 'zod';
import { defineModel } from '@drawcall/praxis';
export default defineModel({
name: 'Sentiment Analyzer', // optional: human-readable label shown in the web UI
student: 'google/gemini-3-flash-preview',
version: '1.0', // optional: bump to trigger retraining when examples or metrics change
teacher: 'google/gemini-3.1-pro-preview', // optional: stronger model for optimization
description: 'Classify product review sentiment with confidence.', // optional: task description
input: z.object({
reviewText: z.string().describe('The product review to analyze'),
}),
output: z.object({
sentiment: z.enum(['positive', 'negative', 'neutral']),
confidence: z.number().describe('Confidence score 0-1'),
}),
examples: [
{ input: { reviewText: 'Love this product!' }, output: { sentiment: 'positive', confidence: 0.95 } },
{ input: { reviewText: 'Terrible quality.' }, output: { sentiment: 'negative', confidence: 0.92 } },
{ input: { reviewText: 'It works fine.' }, output: { sentiment: 'neutral', confidence: 0.75 } },
// ... at least 10 examples
],
metric: ({ modelOutput, exampleOutput }) => {
if (!exampleOutput) return null;
return modelOutput.sentiment === exampleOutput.sentiment ? 1 : 0;
},
});2. Train (optional)
npx praxis trainAuto-discovers model.definition.ts (or .js) anywhere in the project via glob, optimizes the prompt, and writes model.config.json next to the definition file. You can also pass an explicit path: npx praxis train -d path/to/model.definition.ts.
Training requires a metric and at least 10 examples — without it, Praxis generates a default instruction from the schema.
Options: --output, -o <path> · --optimizer <ace|gepa|auto> · --split <ratio> · --force
Training is skipped if the config is already up to date. If version is set in the definition, schema/student/teacher changes without a version bump produce an error — bump the version to retrain.
3. Build
import { buildRequest } from '@drawcall/praxis';
import modelDefinition from './model.definition.js';
import modelConfig from './model.config.json'; // optional
const request = buildRequest(modelDefinition, { reviewText: '...' }, modelConfig);
// → { messages, schema, student, metric? }4. Run
import { generateText } from '@drawcall/praxis';
import modelDefinition from './model.definition.js';
const { output, score } = await generateText({
definition: modelDefinition,
input: { reviewText: 'Amazing product!' },
});Or from the CLI:
npx praxis run --reviewText "Great product!"Multiple metrics
Return a Record<string, number> from metric to evaluate on multiple dimensions. Praxis auto-selects the GEPA optimizer for multi-objective optimization:
export default defineModel({
student: 'google/gemini-3-flash-preview',
input: z.object({
reviewText: z.string().describe('The product review to evaluate'),
}),
output: z.object({
quality: z.enum(['helpful', 'unhelpful']),
sentiment: z.enum(['positive', 'negative', 'neutral']),
hasSpecificDetails: z.boolean(),
}),
examples: [
{ input: { reviewText: 'Battery lasts 8 hours, charges in 30 min.' }, output: { quality: 'helpful', sentiment: 'positive', hasSpecificDetails: true } },
{ input: { reviewText: 'Bad.' }, output: { quality: 'unhelpful', sentiment: 'negative', hasSpecificDetails: false } },
// ... at least 10 examples
],
metric: ({ modelOutput, exampleOutput }) => {
if (!exampleOutput) return null;
return {
quality: modelOutput.quality === exampleOutput.quality ? 1 : 0,
sentiment: modelOutput.sentiment === exampleOutput.sentiment ? 1 : 0,
details: modelOutput.hasSpecificDetails === exampleOutput.hasSpecificDetails ? 1 : 0,
};
},
});generateText returns per-metric scores:
const { output, score } = await generateText({
definition: modelDefinition,
input: { reviewText: '...' },
});
// score → { quality: 1, sentiment: 1, details: 0 }Async examples
Examples can be an async function instead of a plain array — useful for loading from a database or generating synthetic data:
export default defineModel({
student: 'google/gemini-3-flash-preview',
input: z.object({ text: z.string() }),
output: z.object({ label: z.enum(['a', 'b', 'c']) }),
examples: async () => {
const rows = await fetchTrainingData();
return rows.map(r => ({ input: { text: r.text }, output: { label: r.label } }));
},
metric: ({ modelOutput, exampleOutput }) => {
if (!exampleOutput) return null;
return modelOutput.label === exampleOutput.label ? 1 : 0;
},
});You can also use top-level await for simpler cases — no special syntax needed.
Versioning
The optional version field signals when to retrain. Schema, student, and teacher changes are detected automatically, but changes to metric logic or example content are not. Bump the version to trigger retraining:
export default defineModel({
student: 'google/gemini-3-flash-preview',
version: '1.1', // was '1.0' — bumped because metric changed
// ...
});If version is set and schema/student/teacher change without a version bump, praxis train errors — this prevents accidental drift. Use --force to bypass.
Examples without expected output
The output field on examples is optional. This is useful for metrics that evaluate the prediction on its own (e.g. length, format, safety) without comparing to a ground truth:
export default defineModel({
student: 'google/gemini-3-flash-preview',
input: z.object({ text: z.string() }),
output: z.object({ summary: z.string() }),
examples: [
{ input: { text: 'A long article about climate change...' } },
{ input: { text: 'Breaking news: new discovery in physics...' } },
// no output needed — metric evaluates the prediction directly
],
metric: ({ modelOutput }) => {
// Score based on output quality, not comparison to expected
return modelOutput.summary.length > 10 && modelOutput.summary.length < 200 ? 1 : 0;
},
});If a metric returns null during training, Praxis throws an error — this usually means the metric expects exampleOutput but the examples don't provide it.
Runnable examples
See the examples/ directory for runnable projects:
- sentiment — Single metric: classify review sentiment
- review-quality — Multiple metrics: evaluate quality, sentiment, and detail
Run them:
pnpm install
pnpm --filter @drawcall/example-sentiment dev
pnpm --filter @drawcall/example-review-quality devCLI
| Command | Description |
|---------|-------------|
| praxis train [-d <path>] [-f] | Optimize prompts (auto-discovers definition via glob) |
| praxis run [-d <path>] [-c <config>] | Run inference (auto-discovers definition and config) |
| praxis view [-d <path>] [-p <port>] | Launch web UI to inspect eval runs and test manually |
| praxis validate [-d <path>] [-c <config>] | Check config matches definition (schema, student, version) |
Features
- Schema-driven — Define inputs and outputs with Zod, get type-safe results
- Automatic optimization — Finds the best prompt through systematic search
- Portable output —
model.config.jsonworks with any LLM provider via OpenRouter - Works without training — Definitions alone produce results; training makes them better
- Single or multi-metric — Return a
numberfor one score, or aRecord<string, number>for multi-objective optimization - CLI + code — Run from the terminal or import into your app
- Agent-friendly — Install as a skill and let your coding agent write definitions
Requirements
- Node.js >= 18
OPENROUTER_KEYin.env(openrouter.ai)
License
MIT
