npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

prompt-workflow-kit

v0.2.0

Published

A thin, event-driven framework for orchestrating LLM steps in TypeScript

Readme

prompt-workflow-kit

A thin, event-driven framework for orchestrating LLM steps in TypeScript applications.

The Problem

I started with simple LLM integrations—text extraction, parsing unstructured input, automating small decisions in my SaaS workflows. The results were good. Each success sparked ideas for more LLM-powered automation.

But what started simple became complex:

  • Prompt chaining got messy. One step's output feeds into another's input. Error handling between steps. Conditional branching. Soon my business logic was buried under LLM orchestration code.
  • Testing became difficult. How do you test a 4-step LLM chain? Mock everything? Run real calls and hope for deterministic outputs?
  • Custom retry logic everywhere. Rate limits, transient failures, validation errors—each step needed its own retry wrapper.
  • Observability required custom logging classes. I needed to trace requests from start to finish, correlate steps within a single execution, understand where things failed.
  • Prompt management scattered. System prompts in one file, user prompt builders in another, schemas somewhere else. No consistent structure.

I needed something to bring order to this chaos.

Why Not Use...?

Agent SDKs (Pydantic AI, Google Agent SDK, LangGraph)

These frameworks are designed for fully autonomous agents—systems that decide their own next steps, call tools dynamically, and orchestrate themselves.

That's not what I needed. My users still go through predictable flows. They submit input, it gets classified, details get extracted, validation happens, it routes to the next stage. The flow is structured. The LLM enhances each step with intelligence—it doesn't replace the flow itself.

Agentic SDKs are overkill when you want LLM-powered micro-decisions within existing application flows.

Workflow Platforms (Temporal, Cloudflare Workers, Convex, Hatchet)

These are powerful systems for durable execution, long-running workflows, and at-scale orchestration. But they come with baggage:

  • You adapt to their runtime and primitives
  • Self-hosting means running their Docker infrastructure
  • Platform lock-in for what might be a few LLM calls in your existing app

I didn't want to restructure my application around a workflow platform. I wanted a library I could drop into my existing TypeScript codebase.

Where This Fits

| Solution | Use When | |----------|----------| | Agent SDKs | Building autonomous agents that orchestrate themselves | | Workflow Platforms | Need durable execution, long-running workflows, at-scale orchestration | | This library | Enhancing existing apps with LLM steps, human-in-the-loop, structured flows |

This library fills the gap between "raw LLM API calls" and "full-blown agent/workflow platform."

Key Features

  • Schema-validated outputs — Zod schemas ensure type-safe LLM responses. No more JSON.parse() and hope.
  • Two-phase tool execution — Steps with tools run an agentic reasoning phase, then format results to your schema.
  • Built-in verifiers — Code functions or LLM judges that validate step outputs and trigger automatic retries on failure.
  • Event-based observability — Request-level tracing with execution ID correlation across all steps.
  • No runtime lock-in — Pure TypeScript. Runs anywhere Node runs.

Quick Start

Define a Step

import { createStep } from "prompt-workflow-kit";
import { z } from "zod";

const analyzeSentiment = createStep({
  name: "analyze-sentiment",
  model: "gpt-4o-mini",
  systemPrompt: `You analyze the sentiment of text. Classify it as positive,
negative, or neutral, and explain your reasoning.`,
  schema: z.object({
    sentiment: z.enum(["positive", "negative", "neutral"]),
    reasoning: z.string(),
    confidence: z.enum(["high", "medium", "low"]),
  }),
  buildUserPrompt: (text, meta) => `Analyze this text: "${text}"`,
  // Optional: add a verifier for quality control
  verifier: {
    systemPrompt: `Verify the sentiment classification is correct and reasoning is sound.`,
    maxAttempts: 3,
  },
});

// Verifiers can also be plain functions — no LLM call needed
const extractAmount = createStep({
  name: "extract-amount",
  model: "gpt-4o-mini",
  systemPrompt: `Extract the dollar amount from the text.`,
  schema: z.object({ amount: z.number(), currency: z.string() }),
  buildUserPrompt: (text) => text,
  verifier: {
    verify: (output) => {
      if (output.amount <= 0) return { isCorrect: false, feedback: "Amount must be positive" };
      return { isCorrect: true };
    },
    maxAttempts: 3,
  },
});

Create a Workflow

Extend BaseWorkflow<TInput, TEvents> with your input type (any object with an id: string) and domain event interface:

import { BaseWorkflow, type WorkflowInput } from "prompt-workflow-kit";

interface FeedbackInput extends WorkflowInput {
  text: string;
  source: string;
}

interface FeedbackEvents {
  "analysis-complete": (inputId: string, sentiment: string) => void;
  "failed": (error: Error, input: FeedbackInput) => void;
}

class AnalyzeFeedback extends BaseWorkflow<FeedbackInput, FeedbackEvents> {
  protected async execute(context) {
    const input = context.getInput();

    // runStep() handles error checking, metadata storage, and event emission
    const sentiment = await this.runStep(analyzeSentiment, input.text);

    this.emit("analysis-complete", input.id, sentiment.sentiment);
  }
}

Run with Observability

import { getObservabilityHub } from "prompt-workflow-kit";

// Register an observer
getObservabilityHub().registerPartialObserver({
  name: "console-logger",
  onWorkflowStarted(event) {
    console.log(`Workflow started: ${event.executionId}`);
  },
  onStepCompleted(event) {
    console.log(`Step ${event.stepName} completed in ${event.durationMs}ms`);
  },
  onWorkflowCompleted(event) {
    console.log(`Workflow done. Steps: ${event.stepsExecuted.join(" → ")}`);
  },
});

// Run the workflow
const workflow = new AnalyzeFeedback();
await workflow.process({ id: "feedback-1", text: "Great product!", source: "survey" });

Observability Philosophy

Running evals upfront is good, but it's not the complete story.

Real validation comes from production + human feedback. You can't anticipate every edge case in your test suite. The weird customer notes, the ambiguous requests, the inputs that technically parse but don't make sense—these emerge in production.

This library provides execution-level tracing:

  • Every workflow run has an executionId that correlates all steps
  • Events fire for workflow start/complete/error, step start/complete/error, tool calls, verifier results
  • You register observers to send these wherever you need—your database, logging service, monitoring dashboard

Use production traces to build synthetic datasets for evals. The observability system gives you the raw material; you decide how to use it.

Core Concepts

Workflow

Orchestrates steps, manages context accumulation, emits domain events. Extend BaseWorkflow<TInput, TEvents> and implement execute():

class MyWorkflow extends BaseWorkflow<MyInput, MyEvents> {
  protected async execute(context, executionContext) {
    const input = context.getInput(); // typed as MyInput
    // Your step orchestration logic
  }
}

Step

A single LLM call with:

  • Zod schema for output validation
  • System and user prompts
  • Optional tools (for agentic steps)
  • Optional verifier (for quality control with retry)
const myStep = createStep({
  name: "my-step",
  systemPrompt: "...",
  schema: MyOutputSchema,
  buildUserPrompt: (input, meta) => "...",
  tools: { /* optional AI SDK tools */ },
  verifier: { /* optional verifier config */ },
});

Verifiers

Verifiers validate step output and retry the step if validation fails. Two modes:

Code-based — a plain function, no LLM call. Use for deterministic checks like range validation, regex matching, or business rules:

verifier: {
  verify: (output, input) => {
    if (output.score < 0 || output.score > 100)
      return { isCorrect: false, feedback: "Score out of range" };
    return { isCorrect: true };
  },
  maxAttempts: 3,
}

LLM-based — an LLM judge that evaluates output quality. Use for subjective checks like "is this summary accurate?":

verifier: {
  systemPrompt: "Verify the summary is factually accurate and covers all key points.",
  maxAttempts: 3,
}

If both verify and systemPrompt are provided, verify takes precedence. The retry loop, observability events, and verified flag on StepResult work identically for both modes.

Context

Accumulates metadata across steps. Each step can read previous step outputs and add its own:

context.setStepMeta("step-name", stepOutput);
const previousOutput = context.getMeta()["previous-step"];

ObservabilityHub

Singleton that dispatches structured events to registered observers. Fire-and-forget by default; call flush() before shutdown:

const hub = getObservabilityHub();
hub.registerPartialObserver({ name: "my-observer", onStepCompleted(e) { ... } });
// ... workflow runs, events fire ...
await hub.flush();

Installation

pnpm add prompt-workflow-kit

Requires zod and ai (Vercel AI SDK) as peer dependencies.

Examples

  • Order processing workflow — Multi-step workflow with classification, extraction, validation, and department routing. Shows BaseWorkflow<Order, ProcessDesignNoteEvents> with domain-specific input.
  • Prompt chain — Generic YAML-driven prompt chaining. Shows how different input shapes plug into the same framework.
  • Usage example — Complete example with domain events and observability.

License

MIT