prompt-workflow-kit
v0.2.0
Published
A thin, event-driven framework for orchestrating LLM steps in TypeScript
Readme
prompt-workflow-kit
A thin, event-driven framework for orchestrating LLM steps in TypeScript applications.
The Problem
I started with simple LLM integrations—text extraction, parsing unstructured input, automating small decisions in my SaaS workflows. The results were good. Each success sparked ideas for more LLM-powered automation.
But what started simple became complex:
- Prompt chaining got messy. One step's output feeds into another's input. Error handling between steps. Conditional branching. Soon my business logic was buried under LLM orchestration code.
- Testing became difficult. How do you test a 4-step LLM chain? Mock everything? Run real calls and hope for deterministic outputs?
- Custom retry logic everywhere. Rate limits, transient failures, validation errors—each step needed its own retry wrapper.
- Observability required custom logging classes. I needed to trace requests from start to finish, correlate steps within a single execution, understand where things failed.
- Prompt management scattered. System prompts in one file, user prompt builders in another, schemas somewhere else. No consistent structure.
I needed something to bring order to this chaos.
Why Not Use...?
Agent SDKs (Pydantic AI, Google Agent SDK, LangGraph)
These frameworks are designed for fully autonomous agents—systems that decide their own next steps, call tools dynamically, and orchestrate themselves.
That's not what I needed. My users still go through predictable flows. They submit input, it gets classified, details get extracted, validation happens, it routes to the next stage. The flow is structured. The LLM enhances each step with intelligence—it doesn't replace the flow itself.
Agentic SDKs are overkill when you want LLM-powered micro-decisions within existing application flows.
Workflow Platforms (Temporal, Cloudflare Workers, Convex, Hatchet)
These are powerful systems for durable execution, long-running workflows, and at-scale orchestration. But they come with baggage:
- You adapt to their runtime and primitives
- Self-hosting means running their Docker infrastructure
- Platform lock-in for what might be a few LLM calls in your existing app
I didn't want to restructure my application around a workflow platform. I wanted a library I could drop into my existing TypeScript codebase.
Where This Fits
| Solution | Use When | |----------|----------| | Agent SDKs | Building autonomous agents that orchestrate themselves | | Workflow Platforms | Need durable execution, long-running workflows, at-scale orchestration | | This library | Enhancing existing apps with LLM steps, human-in-the-loop, structured flows |
This library fills the gap between "raw LLM API calls" and "full-blown agent/workflow platform."
Key Features
- Schema-validated outputs — Zod schemas ensure type-safe LLM responses. No more
JSON.parse()and hope. - Two-phase tool execution — Steps with tools run an agentic reasoning phase, then format results to your schema.
- Built-in verifiers — Code functions or LLM judges that validate step outputs and trigger automatic retries on failure.
- Event-based observability — Request-level tracing with execution ID correlation across all steps.
- No runtime lock-in — Pure TypeScript. Runs anywhere Node runs.
Quick Start
Define a Step
import { createStep } from "prompt-workflow-kit";
import { z } from "zod";
const analyzeSentiment = createStep({
name: "analyze-sentiment",
model: "gpt-4o-mini",
systemPrompt: `You analyze the sentiment of text. Classify it as positive,
negative, or neutral, and explain your reasoning.`,
schema: z.object({
sentiment: z.enum(["positive", "negative", "neutral"]),
reasoning: z.string(),
confidence: z.enum(["high", "medium", "low"]),
}),
buildUserPrompt: (text, meta) => `Analyze this text: "${text}"`,
// Optional: add a verifier for quality control
verifier: {
systemPrompt: `Verify the sentiment classification is correct and reasoning is sound.`,
maxAttempts: 3,
},
});
// Verifiers can also be plain functions — no LLM call needed
const extractAmount = createStep({
name: "extract-amount",
model: "gpt-4o-mini",
systemPrompt: `Extract the dollar amount from the text.`,
schema: z.object({ amount: z.number(), currency: z.string() }),
buildUserPrompt: (text) => text,
verifier: {
verify: (output) => {
if (output.amount <= 0) return { isCorrect: false, feedback: "Amount must be positive" };
return { isCorrect: true };
},
maxAttempts: 3,
},
});Create a Workflow
Extend BaseWorkflow<TInput, TEvents> with your input type (any object with an id: string) and domain event interface:
import { BaseWorkflow, type WorkflowInput } from "prompt-workflow-kit";
interface FeedbackInput extends WorkflowInput {
text: string;
source: string;
}
interface FeedbackEvents {
"analysis-complete": (inputId: string, sentiment: string) => void;
"failed": (error: Error, input: FeedbackInput) => void;
}
class AnalyzeFeedback extends BaseWorkflow<FeedbackInput, FeedbackEvents> {
protected async execute(context) {
const input = context.getInput();
// runStep() handles error checking, metadata storage, and event emission
const sentiment = await this.runStep(analyzeSentiment, input.text);
this.emit("analysis-complete", input.id, sentiment.sentiment);
}
}Run with Observability
import { getObservabilityHub } from "prompt-workflow-kit";
// Register an observer
getObservabilityHub().registerPartialObserver({
name: "console-logger",
onWorkflowStarted(event) {
console.log(`Workflow started: ${event.executionId}`);
},
onStepCompleted(event) {
console.log(`Step ${event.stepName} completed in ${event.durationMs}ms`);
},
onWorkflowCompleted(event) {
console.log(`Workflow done. Steps: ${event.stepsExecuted.join(" → ")}`);
},
});
// Run the workflow
const workflow = new AnalyzeFeedback();
await workflow.process({ id: "feedback-1", text: "Great product!", source: "survey" });Observability Philosophy
Running evals upfront is good, but it's not the complete story.
Real validation comes from production + human feedback. You can't anticipate every edge case in your test suite. The weird customer notes, the ambiguous requests, the inputs that technically parse but don't make sense—these emerge in production.
This library provides execution-level tracing:
- Every workflow run has an
executionIdthat correlates all steps - Events fire for workflow start/complete/error, step start/complete/error, tool calls, verifier results
- You register observers to send these wherever you need—your database, logging service, monitoring dashboard
Use production traces to build synthetic datasets for evals. The observability system gives you the raw material; you decide how to use it.
Core Concepts
Workflow
Orchestrates steps, manages context accumulation, emits domain events. Extend BaseWorkflow<TInput, TEvents> and implement execute():
class MyWorkflow extends BaseWorkflow<MyInput, MyEvents> {
protected async execute(context, executionContext) {
const input = context.getInput(); // typed as MyInput
// Your step orchestration logic
}
}Step
A single LLM call with:
- Zod schema for output validation
- System and user prompts
- Optional tools (for agentic steps)
- Optional verifier (for quality control with retry)
const myStep = createStep({
name: "my-step",
systemPrompt: "...",
schema: MyOutputSchema,
buildUserPrompt: (input, meta) => "...",
tools: { /* optional AI SDK tools */ },
verifier: { /* optional verifier config */ },
});Verifiers
Verifiers validate step output and retry the step if validation fails. Two modes:
Code-based — a plain function, no LLM call. Use for deterministic checks like range validation, regex matching, or business rules:
verifier: {
verify: (output, input) => {
if (output.score < 0 || output.score > 100)
return { isCorrect: false, feedback: "Score out of range" };
return { isCorrect: true };
},
maxAttempts: 3,
}LLM-based — an LLM judge that evaluates output quality. Use for subjective checks like "is this summary accurate?":
verifier: {
systemPrompt: "Verify the summary is factually accurate and covers all key points.",
maxAttempts: 3,
}If both verify and systemPrompt are provided, verify takes precedence. The retry loop, observability events, and verified flag on StepResult work identically for both modes.
Context
Accumulates metadata across steps. Each step can read previous step outputs and add its own:
context.setStepMeta("step-name", stepOutput);
const previousOutput = context.getMeta()["previous-step"];ObservabilityHub
Singleton that dispatches structured events to registered observers. Fire-and-forget by default; call flush() before shutdown:
const hub = getObservabilityHub();
hub.registerPartialObserver({ name: "my-observer", onStepCompleted(e) { ... } });
// ... workflow runs, events fire ...
await hub.flush();Installation
pnpm add prompt-workflow-kitRequires zod and ai (Vercel AI SDK) as peer dependencies.
Examples
- Order processing workflow — Multi-step workflow with classification, extraction, validation, and department routing. Shows
BaseWorkflow<Order, ProcessDesignNoteEvents>with domain-specific input. - Prompt chain — Generic YAML-driven prompt chaining. Shows how different input shapes plug into the same framework.
- Usage example — Complete example with domain events and observability.
License
MIT
