llm-runtime
v0.6.6
Published
Runtime layer for application-owned LLM workflows with tool orchestration, MCP integration, and skill loading.
Downloads
1,371
Maintainers
Readme
llm-runtime
llm-runtime is a TypeScript runtime layer for application-owned LLM workflows. It gives a host app one package boundary for provider calls, tool execution, MCP discovery, skills, and bounded agentic completion.
The package is intentionally not a full agent product. Your app still owns UI, persistence, permissions, transcript storage, workspace lifetime, and business policy. llm-runtime owns the provider/tool loop mechanics that should not be reimplemented in every harness.
For a human-oriented walkthrough of the codebase, start with the local project wiki: .wiki/index.md.
Installation
npm install llm-runtimeThe package is ESM-only, targets Node.js 18+, and exposes a single root entrypoint.
Public Surface
The root entrypoint exports four runtime functions:
generate(...)complete(...)streamComplete(...)createRuntime(...)
It also exports the public type set for providers, messages, tools, runtime options, completion results, stream events, MCP config, and provider config.
Lower-level loop internals, direct provider clients, recovery prompts, validation helpers, and tool-resolution helpers are not part of the root public API. If an app needs stable reusable dependencies and tool execution helpers, create a runtime and use the methods on that runtime instance.
Providers
Supported provider names:
openaianthropicgoogleazurexaiopenai-compatibleollama
Provider configuration can be passed per call, through a provider map, or through a reusable runtime:
import { generate } from 'llm-runtime';
const response = await generate({
provider: 'openai',
model: 'gpt-5',
providers: {
openai: {
apiKey: process.env.OPENAI_API_KEY!,
},
},
messages: [
{ role: 'user', content: 'Summarize this in one paragraph.' },
],
});
console.log(response.content);Runtime Model
Use createRuntime(...) when your app wants stable provider config, MCP config, skill roots, defaults, and registry state across many calls.
Put stable harness state in the runtime:
- provider config
- MCP config or MCP registry
- skill roots or skill registry
- default
reasoningEffort - default
toolPermission
Keep request-local state per call:
providermodelmessagestemperaturemaxTokenscontext.workingDirectorycontext.abortSignalwebSearch- per-call
builtIns,extraTools, ortools
Single-Turn Generation
generate(...) performs one provider call. It resolves the requested tool surface and passes it to the provider, but it does not execute returned tool calls or continue the conversation. Use it when the host wants to own the loop.
The returned LLMResponse is either:
type: 'text'withcontenttype: 'tool_calls'withtool_calls
Use complete(...) when the runtime should own repeated model calls, tool execution, and terminal control-tool handling.
In short: generate(...) asks the model once; complete(...) keeps working until the runtime reaches a completion, user-input, blocked, or bounded-stop condition.
Example:
import { createRuntime } from 'llm-runtime';
const runtime = createRuntime({
providers: {
openai: {
apiKey: process.env.OPENAI_API_KEY!,
},
},
skillRoots: ['/app/skills', '/workspace/.codex/skills'],
defaults: {
reasoningEffort: 'medium',
toolPermission: 'auto',
},
mcpConfig: {
servers: {
docs: {
command: 'node',
args: ['docs-server.js'],
transport: 'stdio',
},
},
},
});
const response = await runtime.generate({
provider: 'openai',
model: 'gpt-5',
messages: [
{ role: 'user', content: 'Read the project and identify the main runtime boundary.' },
],
context: {
workingDirectory: process.cwd(),
},
builtIns: {
read_file: true,
list_files: true,
search_files: true,
},
});
console.log(response.content);
await runtime.dispose();Completion
complete(...) owns a bounded model/tool loop. It retries weak non-progressing responses, executes known tools, injects the runtime completion contract, and terminates through internal control tools:
final_answerblocked
Those control tools are runtime-reserved. Do not define app tools with those names.
The loop continues under these rules:
- normal tool call: execute the tool, append the tool result, and call the model again
final_answer: stop withstatus: 'completed'ask_user_input: stop withstatus: 'tool_calls'so the host can ask/resume- known custom tool without an executor: stop with
status: 'tool_calls'so the host can run it and resume blocked: stop withstatus: 'failed'- plain narration or intent text: keep going; narration is not completion
- empty text: retry according to
emptyTextRetryLimit - missing required action evidence: reject premature final text or
final_answerand continue with recovery guidance - host mutating tool exposed: require a host mutating tool result before accepting final completion
- repeated identical tool calls or
maxIterations: stop with the corresponding bounded failure - host cancellation through
context.abortSignal: abort the active model/tool path when the host decides the task should stop
builtIns only changes which package-owned tools are available. It does not decide whether the loop itself runs. builtIns: false with host-supplied extraTools or tools still gives complete(...) a valid loop: host tools remain executable, and the runtime still injects final_answer and blocked.
complete(...) returns:
status: 'completed'withoutputwhen the model reachesfinal_answerstatus: 'tool_calls'when the host must handle a tool call, commonly required user inputstatus: 'failed'when the run is blocked, invalid, or otherwise cannot completestatus: 'max_iterations'when loop bounds stop the run
Example:
import { createRuntime } from 'llm-runtime';
const runtime = createRuntime({
providers: {
openai: {
apiKey: process.env.OPENAI_API_KEY!,
},
},
});
const result = await runtime.complete({
provider: 'openai',
model: 'gpt-5',
messages: [
{ role: 'user', content: 'Inspect the workspace and tell me where runtime completion is implemented.' },
],
context: {
workingDirectory: process.cwd(),
},
builtIns: {
read_file: true,
search_files: true,
path_exists: true,
},
maxIterations: 12,
});
if (result.status === 'completed') {
console.log(result.output);
}
if (result.status === 'tool_calls') {
console.log(result.toolCalls);
}streamComplete(...) runs the same completion path and yields lifecycle events. It emits model/tool events plus provider text, reasoning, tool-call argument, and final answer deltas when the provider adapter supplies them:
model_starttext_deltareasoning_deltatool_call_deltaanswer_deltaassistant_messagetool_starttool_resulttool_errortool_callscompletedfailedraw
for await (const event of runtime.streamComplete({
provider: 'openai',
model: 'gpt-5',
messages: [
{ role: 'user', content: 'Use tools if needed, then give the final answer.' },
],
builtIns: {
read_file: true,
search_files: true,
path_exists: true,
},
})) {
if (event.type === 'text_delta' || event.type === 'answer_delta') {
process.stdout.write(event.delta);
}
if (event.type === 'completed') {
console.log(event.result.output);
}
}Tools
Tool sources are merged into one model-facing surface:
- built-in runtime tools
- app-provided
extraTools - app-provided
tools - MCP tools discovered from configured servers
Built-in tool names are reserved:
shell_cmdload_skillask_user_inputweb_fetchread_filewrite_filelist_filessearch_filescreate_directorypath_exists
Built-ins default to all package-owned tools for host convenience. Pass false to disable them, or pass a narrow map when the task should expose less:
- omitting
builtInsenables every built-in tool builtIns: falseenables no built-in toolsbuiltIns: trueenables every built-in tool- pass an explicit per-tool map such as
{ read_file: true, search_files: true } - string shorthand modes such as
builtIns: 'all'andbuiltIns: 'read-only'are not supported
Use small, task-specific maps:
const readOnlyBuiltIns = {
load_skill: true,
list_files: true,
search_files: true,
read_file: true,
path_exists: true,
};
const writeFileBuiltIns = {
...readOnlyBuiltIns,
create_directory: true,
write_file: true,
};
const commandBuiltIns = {
...writeFileBuiltIns,
shell_cmd: true,
};Opt into write or command tools only when the task needs them. Do not use a broad preset for ordinary file inspection:
const result = await runtime.complete({
provider: 'openai',
model: 'gpt-5',
messages: [
{ role: 'user', content: 'Run the project test command and summarize the result.' },
],
context: {
workingDirectory: process.cwd(),
},
builtIns: {
read_file: true,
search_files: true,
path_exists: true,
create_directory: true,
write_file: true,
shell_cmd: true,
},
});toolPermission: 'read' is a hard read-only boundary for package-owned mutating tools. It blocks write_file, create_directory, and shell_cmd even if those built-ins are exposed.
Prefer structured workspace tools over shell_cmd for routine file work:
list_filesfor directory listingsearch_filesfor glob-like discoveryread_filefor paginated file readspath_existsfor file or directory checkscreate_directoryfor recursive directory creation when enabled
App tools can be passed as extraTools or tools. They are additive; they cannot override reserved built-ins or completion control tools.
const result = await runtime.complete({
provider: 'openai',
model: 'gpt-5',
messages: [
{ role: 'user', content: 'Look up customer c_123 and summarize the account state.' },
],
extraTools: [
{
name: 'lookup_customer',
description: 'Look up a customer by id.',
evidenceKind: 'read',
parameters: {
type: 'object',
properties: {
customerId: { type: 'string' },
},
required: ['customerId'],
additionalProperties: false,
},
execute: async ({ customerId }) => {
return { customerId, plan: 'enterprise', status: 'active' };
},
},
],
});Runtime instances also expose resolveTools(...), executeToolCall(...), and executeToolCalls(...) for hosts that need to inspect or run the effective tool surface outside complete(...).
Human Input
ask_user_input is the public human-input tool contract. It uses a structured questions[] payload:
{
type?: "single-select" | "multiple-select";
allowSkip?: boolean;
questions: Array<{
header: string;
id: string;
question: string;
options: Array<{
id: string;
label: string;
description?: string;
}>;
}>;
}If you use a narrow builtIns map, include it when the model is allowed to ask the host for a human decision:
builtIns: {
ask_user_input: true,
}When completion needs host-owned user input, it returns status: 'tool_calls'. The host should surface the question, then resume by appending a normal tool-result message for the pending tool call and calling complete(...) again with the updated message list.
const resumedMessages = [
...result.messages,
{
role: 'tool',
tool_call_id: result.toolCalls![0].id,
content: JSON.stringify({
answers: {
scope: 'all',
},
}),
},
];MCP And Skills
MCP servers are configured through mcpConfig. Both servers and legacy mcpServers shapes are accepted. URL-based servers default to streamable-http; stdio servers require a command.
const runtime = createRuntime({
mcpConfig: {
servers: {
search: {
url: 'https://example.com/mcp',
headers: {
Authorization: `Bearer ${process.env.MCP_TOKEN}`,
},
},
},
},
});Skills are discovered from configured skillRoots and loaded through the load_skill built-in. Later skill roots have higher precedence when duplicate skill ids are found.
Skills add instruction context. They are not executable tools.
Web Search
Pass webSearch per call:
const response = await generate({
provider: 'openai',
model: 'gpt-5',
providers: {
openai: {
apiKey: process.env.OPENAI_API_KEY!,
},
},
messages: [
{ role: 'user', content: 'Use current public information to answer.' },
],
webSearch: {
searchContextSize: 'medium',
},
});Provider behavior:
openai,anthropic, andgooglereceive provider-native web search optionsazure,openai-compatible,xai, andollamaignore unsupported web search on the current chat path and return aweb_search_ignoredwarning- Gemini Google Search grounding is not combined with function calling; when both tools and
webSearchare requested forgoogle, tools win and web search is ignored with a warning searchContextSizeis forwarded for OpenAI-style requests and ignored by Anthropic and Gemini
Cleanup
Call runtime.dispose() when a runtime owns MCP clients:
const runtime = createRuntime({ mcpConfig });
try {
await runtime.complete(request);
} finally {
await runtime.dispose();
}The host still owns temporary workspaces, transcript persistence, app-specific registries, and any resources it injected into the runtime.
Local Development
npm run build
npm run check
npm testUseful scripts:
npm run buildcompilessrc/intodist/npm run checkruns TypeScript without emitting filesnpm testruns the Vitest suite intests/llmnpm run test:watchruns Vitest in watch modenpm run test:e2eruns the live provider showcasenpm run test:e2e:dry-runvalidates showcase wiring without live provider callsnpm run test:e2e:azureruns Azure live-provider coveragenpm run test:e2e:azure:dry-runvalidates Azure showcase wiringnpm run test:e2e:geminiruns Gemini live-provider coveragenpm run test:e2e:gemini:dry-runvalidates Gemini showcase wiringnpm run test:e2e:turn-loopruns runtime completion showcase coveragenpm run test:e2e:turn-loop:dry-runvalidates turn-loop showcase wiringnpm run test:e2e:hardeningruns deterministic hardening coverage without a live providernpm run test:e2e:host-ownedruns deterministic host-owned tool-call coverage without a live providernpm run test:e2e:host-owned:geminiruns the host-owned tool-call coverage against Gemini 2.5 Flash by default
Live showcase runners expect a repo-local .env with the relevant provider credentials.
