@silyze/browsary-openai-provider
v2.0.14
Published
OpenAI provider for the Browsary pipeline framework
Readme
Browsary OpenAI Provider
A ChatGPT-powered browser-automation agent that analyzes page structure and generates JSON-driven automation pipelines. Built on top of OpenAI’s API, Puppeteer-core, and the Browsary pipeline framework.
Installation
npm install @silyze/browsary-openai-providerQuick Start
import OpenAI from "openai";
import { OpenAiProvider, OpenAiConfig } from "@silyze/browsary-openai-provider";
import { PipelineProvider } from "@silyze/browsary-pipeline";
import { createConsoleLogger } from "@silyze/logger";
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const pipelineProvider = new PipelineProvider();
const logger = createConsoleLogger();
const config: OpenAiConfig = {
openAi: openai,
pipelineProvider,
logger,
models: {
agent: process.env.AGENT_MODEL,
},
};
const aiProvider = new OpenAiProvider(config, async (page, fnName, params) => {
// implement browser function calls (e.g., puppeteer page[fnName](...))
// return JSON-serializable result
});
const page = await browser.newPage();
const userPrompt = "Search for 'open source licenses' and list links";
let previousPipeline = {};
let conversation =
await aiProvider.prompt(page, {
userPrompt,
previousPipeline,
onPipelineUpdate: (pipeline) => {
const json = pipeline.toJSON();
previousPipeline = json;
console.log("Updated pipeline", json);
},
onStatusUpdate: (status) => logger.log("info", "status", status),
});
// Persist `conversation.state` somewhere durable if you need to resume later.
while (!conversation.isComplete) {
conversation = await aiProvider.continuePrompt(page, {
conversation,
previousPipeline,
onPipelineUpdate: (pipeline) => {
const json = pipeline.toJSON();
previousPipeline = json;
console.log("Updated pipeline", json);
},
onStatusUpdate: (status) => logger.log("info", "status", status),
});
}Example CLI Tool
The repository ships with an interactive CLI demo that showcases prompting, pausing, persisting, and resuming a Browsary + OpenAI conversation.
npm run start:exampleThe CLI will:
- ask for your high-level prompt and automatically gather clarifications if the request is vague, so the agent always has enough context;
- resume from
.browsary-cli/conversation.<hash>.jsonwhen you restart it, letting you continue work after a pause or crash; - stream status updates, persist every pipeline revision, and let you inject new instructions mid-run (
addMessages); - support
pause/resumecontrol requests so you can safely stop the agent, then continue once you're ready.
All cached prompts, conversations, and pipelines live under .browsary-cli/ to keep the workspace tidy.
Configuration
OpenAiConfig
| Property | Type | Description |
| ------------------ | ----------------------------------------- | ----------------------------------------------------- |
| openAi | OpenAI \| Promise<OpenAI> | An instantiated OpenAI client (or a promise thereof). |
| pipelineProvider | PipelineProvider | Compiles and runs generated pipelines. |
| logger | Logger | Scoped logger for debug/info/error events. |
| models? | { agent?: string; generate?: string; analyze?: string; status?: string } | Override default OpenAI model identifiers (the agent model powers the unified workflow, while status controls AI narration for tool-call updates). |
Supported identifiers currently include gpt-4o-mini, gpt-5-nano, gpt-5-mini, gpt-5, and gpt-5.1.
Function-call status updates use a lightweight prompt on the status model (defaults to the analyze/generate choice) to narrate what each tool invocation is attempting and how it finished. Override models.status if you want a different model for these blurbs, or omit onStatusUpdate to silence them entirely.
API Reference
Class: OpenAiProvider
Extends AiProvider<Page, OpenAiConfig>. It orchestrates a single-phase agent that can browse, reason, chat, and emit pipelines within one resumable conversation.
constructor(config: OpenAiConfig, functionCall: (page: Page, name: string, params: any) => Promise<unknown>)
- config: See Configuration.
- functionCall: Invoked on ChatGPT function calls to map to Puppeteer actions.
prompt(page: Page, params: PromptParams): Promise<AiAgentConversationState<OpenAiConversationState>>
Starts a new conversation. The provider will:
- Expose browsing, pipeline, and communication tools simultaneously so the agent can explore, validate, emit, or simply chat in a single loop.
- Emit
onStatusUpdatefor browsing milestones, chat messages, pending questions, output payloads, verification steps, and AI-authored narratives explaining each browser/tool call. - Invoke
onPipelineUpdateonly when the compiled pipeline differs fromparams.previousPipeline. - Return a conversation state object (matching
OpenAiConversationState) that captures agent metadata, pending questions, and pipeline history for later resumption.
continuePrompt(page: Page, params: ContinuePromptParams): Promise<AiAgentConversationState<OpenAiConversationState>>
Resumes a prior conversation. Pass the previously returned conversation plus the pipeline you currently have deployed. The provider will:
- Restore the agent's pending questions, metadata, and pipeline history from
conversation.state. - Clear cached artifacts when
controlRequestsinject new instructions. - Respect pause/resume requests:
{ type: "pause", reason?: string }stops after the current safe point.{ type: "resume" }continues a paused conversation.{ type: "addMessages", messages: unknown[] }appends extra user instructions (clears cached agent/pipeline data).
- Return the updated state so you can persist it again.
onMessages receives the raw OpenAI message history, enabling streaming/debug UI if desired.
Prompts & Schemas
Unified Agent Prompt & Tooling
agentPrompt(options): System prompt guiding the single-phase agent. It describes how to browse, chat, compile/emit pipelines, request user input, and provide direct output data without switching phases.analyzeTools: Reusable browsing functions exposed to the agent:querySelectorquerySelectorAllgotoclicktypeurl
- Schema utilities (unchanged):
validatePipelineSchema(obj): booleangetPipelineValidationErrors(): Ajv.ErrorObject[]genericNodeSchema/pipelineSchema: JSON schemas describing valid Browsary pipelines.
Types
OpenAiConfigOpenAiConversationState— serializable snapshot you should persist betweenprompt/continuePromptcalls.PromptParams/ContinuePromptParams— include callbacks (onPipelineUpdate,onStatusUpdate?,onMessages?),previousPipeline, optionalcontrolRequests, and (forprompt) the initialuserPrompt.AiAgentConversationState<TState>— returned by both prompt methods; use itsstate+idto resume later.Pipeline: Mapping of step names to node definitions with inputs/outputs/dependsOn
Refer to the TypeScript definitions for complete details.
Environment Variables
OPENAI_API_KEY— Your API key for OpenAI.OPENAI_DEFAULT_MODEL— Default model for the unified agent (used if no overrides are supplied).OPENAI_DEFAULT_GENERATE_MODEL— Override model for the agent/pipeline compilation phase.
