prompt-workflow-kit

v0.2.0

Published

13 days ago

A thin, event-driven framework for orchestrating LLM steps in TypeScript

0High
0Medium
0Low

sahandsh

llm workflow ai typescript

prompt-workflow-kit

A thin, event-driven framework for orchestrating LLM steps in TypeScript applications.

The Problem

I started with simple LLM integrations—text extraction, parsing unstructured input, automating small decisions in my SaaS workflows. The results were good. Each success sparked ideas for more LLM-powered automation.

But what started simple became complex:

Prompt chaining got messy. One step's output feeds into another's input. Error handling between steps. Conditional branching. Soon my business logic was buried under LLM orchestration code.
Testing became difficult. How do you test a 4-step LLM chain? Mock everything? Run real calls and hope for deterministic outputs?
Custom retry logic everywhere. Rate limits, transient failures, validation errors—each step needed its own retry wrapper.
Observability required custom logging classes. I needed to trace requests from start to finish, correlate steps within a single execution, understand where things failed.
Prompt management scattered. System prompts in one file, user prompt builders in another, schemas somewhere else. No consistent structure.

I needed something to bring order to this chaos.

Why Not Use...?

Agent SDKs (Pydantic AI, Google Agent SDK, LangGraph)

These frameworks are designed for fully autonomous agents—systems that decide their own next steps, call tools dynamically, and orchestrate themselves.

That's not what I needed. My users still go through predictable flows. They submit input, it gets classified, details get extracted, validation happens, it routes to the next stage. The flow is structured. The LLM enhances each step with intelligence—it doesn't replace the flow itself.

Agentic SDKs are overkill when you want LLM-powered micro-decisions within existing application flows.

Workflow Platforms (Temporal, Cloudflare Workers, Convex, Hatchet)

These are powerful systems for durable execution, long-running workflows, and at-scale orchestration. But they come with baggage:

You adapt to their runtime and primitives
Self-hosting means running their Docker infrastructure
Platform lock-in for what might be a few LLM calls in your existing app

I didn't want to restructure my application around a workflow platform. I wanted a library I could drop into my existing TypeScript codebase.

Where This Fits

| Solution | Use When | |----------|----------| | Agent SDKs | Building autonomous agents that orchestrate themselves | | Workflow Platforms | Need durable execution, long-running workflows, at-scale orchestration | | This library | Enhancing existing apps with LLM steps, human-in-the-loop, structured flows |

This library fills the gap between "raw LLM API calls" and "full-blown agent/workflow platform."

Key Features

Schema-validated outputs — Zod schemas ensure type-safe LLM responses. No more JSON.parse() and hope.
Two-phase tool execution — Steps with tools run an agentic reasoning phase, then format results to your schema.
Built-in verifiers — Code functions or LLM judges that validate step outputs and trigger automatic retries on failure.
Event-based observability — Request-level tracing with execution ID correlation across all steps.
No runtime lock-in — Pure TypeScript. Runs anywhere Node runs.

Quick Start

Define a Step

import { createStep } from "prompt-workflow-kit";
import { z } from "zod";

const analyzeSentiment = createStep({
  name: "analyze-sentiment",
  model: "gpt-4o-mini",
  systemPrompt: `You analyze the sentiment of text. Classify it as positive,
negative, or neutral, and explain your reasoning.`,
  schema: z.object({
    sentiment: z.enum(["positive", "negative", "neutral"]),
    reasoning: z.string(),
    confidence: z.enum(["high", "medium", "low"]),
  }),
  buildUserPrompt: (text, meta) => `Analyze this text: "${text}"`,
  // Optional: add a verifier for quality control
  verifier: {
    systemPrompt: `Verify the sentiment classification is correct and reasoning is sound.`,
    maxAttempts: 3,
  },
});

// Verifiers can also be plain functions — no LLM call needed
const extractAmount = createStep({
  name: "extract-amount",
  model: "gpt-4o-mini",
  systemPrompt: `Extract the dollar amount from the text.`,
  schema: z.object({ amount: z.number(), currency: z.string() }),
  buildUserPrompt: (text) => text,
  verifier: {
    verify: (output) => {
      if (output.amount <= 0) return { isCorrect: false, feedback: "Amount must be positive" };
      return { isCorrect: true };
    },
    maxAttempts: 3,
  },
});

Create a Workflow

Extend BaseWorkflow<TInput, TEvents> with your input type (any object with an id: string) and domain event interface:

import { BaseWorkflow, type WorkflowInput } from "prompt-workflow-kit";

interface FeedbackInput extends WorkflowInput {
  text: string;
  source: string;
}

interface FeedbackEvents {
  "analysis-complete": (inputId: string, sentiment: string) => void;
  "failed": (error: Error, input: FeedbackInput) => void;
}

class AnalyzeFeedback extends BaseWorkflow<FeedbackInput, FeedbackEvents> {
  protected async execute(context) {
    const input = context.getInput();

    // runStep() handles error checking, metadata storage, and event emission
    const sentiment = await this.runStep(analyzeSentiment, input.text);

    this.emit("analysis-complete", input.id, sentiment.sentiment);
  }
}

Run with Observability

import { getObservabilityHub } from "prompt-workflow-kit";

// Register an observer
getObservabilityHub().registerPartialObserver({
  name: "console-logger",
  onWorkflowStarted(event) {
    console.log(`Workflow started: ${event.executionId}`);
  },
  onStepCompleted(event) {
    console.log(`Step ${event.stepName} completed in ${event.durationMs}ms`);
  },
  onWorkflowCompleted(event) {
    console.log(`Workflow done. Steps: ${event.stepsExecuted.join(" → ")}`);
  },
});

// Run the workflow
const workflow = new AnalyzeFeedback();
await workflow.process({ id: "feedback-1", text: "Great product!", source: "survey" });

Observability Philosophy

Running evals upfront is good, but it's not the complete story.

Real validation comes from production + human feedback. You can't anticipate every edge case in your test suite. The weird customer notes, the ambiguous requests, the inputs that technically parse but don't make sense—these emerge in production.

This library provides execution-level tracing:

Every workflow run has an executionId that correlates all steps
Events fire for workflow start/complete/error, step start/complete/error, tool calls, verifier results
You register observers to send these wherever you need—your database, logging service, monitoring dashboard

Use production traces to build synthetic datasets for evals. The observability system gives you the raw material; you decide how to use it.

Core Concepts

Workflow

Orchestrates steps, manages context accumulation, emits domain events. Extend BaseWorkflow<TInput, TEvents> and implement execute():

class MyWorkflow extends BaseWorkflow<MyInput, MyEvents> {
  protected async execute(context, executionContext) {
    const input = context.getInput(); // typed as MyInput
    // Your step orchestration logic
  }
}

Step

A single LLM call with:

Zod schema for output validation
System and user prompts
Optional tools (for agentic steps)
Optional verifier (for quality control with retry)

const myStep = createStep({
  name: "my-step",
  systemPrompt: "...",
  schema: MyOutputSchema,
  buildUserPrompt: (input, meta) => "...",
  tools: { /* optional AI SDK tools */ },
  verifier: { /* optional verifier config */ },
});

Verifiers

Verifiers validate step output and retry the step if validation fails. Two modes:

Code-based — a plain function, no LLM call. Use for deterministic checks like range validation, regex matching, or business rules:

verifier: {
  verify: (output, input) => {
    if (output.score < 0 || output.score > 100)
      return { isCorrect: false, feedback: "Score out of range" };
    return { isCorrect: true };
  },
  maxAttempts: 3,
}

LLM-based — an LLM judge that evaluates output quality. Use for subjective checks like "is this summary accurate?":

verifier: {
  systemPrompt: "Verify the summary is factually accurate and covers all key points.",
  maxAttempts: 3,
}

If both verify and systemPrompt are provided, verify takes precedence. The retry loop, observability events, and verified flag on StepResult work identically for both modes.

Context

Accumulates metadata across steps. Each step can read previous step outputs and add its own:

context.setStepMeta("step-name", stepOutput);
const previousOutput = context.getMeta()["previous-step"];

ObservabilityHub

Singleton that dispatches structured events to registered observers. Fire-and-forget by default; call flush() before shutdown:

const hub = getObservabilityHub();
hub.registerPartialObserver({ name: "my-observer", onStepCompleted(e) { ... } });
// ... workflow runs, events fire ...
await hub.flush();

Installation

pnpm add prompt-workflow-kit

Requires zod and ai (Vercel AI SDK) as peer dependencies.

Examples

Order processing workflow — Multi-step workflow with classification, extraction, validation, and department routing. Shows BaseWorkflow<Order, ProcessDesignNoteEvents> with domain-specific input.
Prompt chain — Generic YAML-driven prompt chaining. Shows how different input shapes plug into the same framework.
Usage example — Complete example with domain events and observability.

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

prompt-workflow-kit

The Problem

Why Not Use...?

Agent SDKs (Pydantic AI, Google Agent SDK, LangGraph)

Workflow Platforms (Temporal, Cloudflare Workers, Convex, Hatchet)

Where This Fits

Key Features

Quick Start

Define a Step

Create a Workflow

Run with Observability

Observability Philosophy

Core Concepts

Workflow

Step

Verifiers

Context

ObservabilityHub

Installation

Examples

License