npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@mikesaintsg/inference

v0.0.8

Published

Type-safe inference with full TypeScript support. Zero runtime dependencies.

Readme

@mikesaintsg/inference

Zero-dependency, adapter-based LLM inference library for browser and Node.js applications.

npm version bundle size license


Features

  • Session-Based Conversations — Maintain conversation history with automatic context management
  • Ephemeral Generation — Stateless single-shot completions for one-off tasks
  • Unified Streaming — Async iteration over tokens with abort control and events
  • Token Batching — Coalesce tokens for smoother UI updates with boundary detection
  • Abort Coordination — Coordinated cancellation across multiple operations
  • Timeout Monitoring — Detect token stalls and total timeout exceeded
  • Token Counting — Estimate tokens for context window management
  • Context Integration — Generate from BuiltContext (contextbuilder)
  • Model Orchestrator — Progressive model loading with tier-based generation
  • Intent Detection — Classify user input into search, question, action, or navigation
  • Circuit Breaker — Prevent cascading failures with circuit breaker pattern
  • Retry Logic — Configurable retry with exponential backoff for transient failures
  • Rate Limiting — Concurrency control with acquire/release semantics
  • Telemetry Support — Observability with spans, metrics, and logging
  • Zero dependencies — Built on native fetch, EventSource, and browser/Node APIs
  • TypeScript first — Full type safety with generics and strict types

Installation

npm install @mikesaintsg/inference

Quick Start

import { createEngine } from '@mikesaintsg/inference'
import { createOpenAIProviderAdapter } from '@mikesaintsg/adapters'

// Create engine with provider adapter (required first parameter)
const engine = createEngine(
	createOpenAIProviderAdapter({
		apiKey: process.env.OPENAI_API_KEY,
		model: 'gpt-4o',
	}),
)

// Create a conversation session
const session = engine.createSession({
	system: 'You are a helpful assistant.',
})

// Add a message and generate response
session.addMessage('user', 'Hello!')
const result = await session.generate()
console.log(result.text)

// Stream responses
const handle = session.stream()
for await (const token of handle) {
	process.stdout.write(token)
}

Documentation

📚 Full API Guide — Comprehensive documentation with examples

Key Sections


API Overview

Factory Functions

| Function | Description | |---------------------------------------|------------------------------------| | createEngine(provider, options?) | Create an inference engine | | createTokenBatcher(options?) | Create a token batcher for UI | | createTokenCounter(options?) | Create a token counter | | createAbortScope() | Create an abort scope | | createTimeoutMonitor(options?) | Create a timeout monitor | | createModelOrchestrator(options) | Create a model orchestrator | | createIntentDetector(options) | Create an intent detector |

EngineInterface

| Method | Description | |------------------------------------------|----------------------------------| | createSession(options?) | Create a conversation session | | generate(messages, options?) | Ephemeral generation (stateless) | | stream(messages, options?) | Ephemeral streaming (stateless) | | generateFromContext(context, options?) | Generate from BuiltContext | | streamFromContext(context, options?) | Stream from BuiltContext | | countTokens(text, model) | Count tokens in text | | countMessages(messages, model) | Count tokens in messages | | fitsInContext(content, model, max?) | Check if content fits | | getContextWindowSize(model) | Get model context size | | abort(requestId) | Abort a request by ID | | getDeduplicationStats() | Get request deduplication stats |

SessionInterface

| Method | Description | |---------------------------------------|---------------------------------| | getId() | Get session ID | | getSystem() | Get system prompt | | getHistory() | Get message history | | addMessage(role, content) | Add message to history | | addToolResult(callId, name, result) | Add tool result message | | removeMessage(id) | Remove message from history | | clear() | Clear all messages | | truncateHistory(count) | Keep last N messages | | generate(options?) | Generate response with context | | stream(options?) | Stream response with context | | getTokenBudgetState() | Get current token budget state | | fitsInBudget(content) | Check if content fits in budget | | onMessageAdded(callback) | Subscribe to message additions | | onMessageRemoved(callback) | Subscribe to message removals | | onTokenBudgetChange(callback) | Subscribe to budget changes |


Examples

Session-Based Conversation

import { createEngine } from '@mikesaintsg/inference'
import { createOpenAIProviderAdapter } from '@mikesaintsg/adapters'

const engine = createEngine(
	createOpenAIProviderAdapter({ apiKey }),
)

const session = engine.createSession({
	system: 'You are a helpful coding assistant.',
	tokenBudget: {
		model: 'gpt-4o',
		warningThreshold: 0.7,
		criticalThreshold: 0.9,
	},
})

// Conversation history is maintained automatically
session.addMessage('user', 'What is TypeScript?')
const result1 = await session.generate()

session.addMessage('user', 'Show me an example')
const result2 = await session.generate() // Has full context

Streaming with Abort

const handle = session.stream()

// Async iteration
for await (const token of handle) {
	process.stdout.write(token)
}

// Or use event subscriptions
handle.onToken((token) => updateUI(token))
handle.onComplete((result) => finalizeUI(result))
handle.onError((error) => showError(error))

// Abort at any time
handle.abort()

Error Handling

import { createEngine, InferenceError, isRateLimitError } from '@mikesaintsg/inference'

try {
	const result = await session.generate()
} catch (error) {
	if (isRateLimitError(error)) {
		console.error('Rate limit exceeded, retrying...')
		await sleep(60_000)
	} else if (error instanceof InferenceError) {
		console.error(`[${error.code}]: ${error.message}`)
	}
}

Circuit Breaker Integration

import { createEngine } from '@mikesaintsg/inference'
import { createCircuitBreaker } from '@mikesaintsg/adapters'

const circuitBreaker = createCircuitBreaker({
	failureThreshold: 5,
	resetTimeoutMs: 30_000,
})

const engine = createEngine(provider, {
	circuitBreaker,
})

// Requests are blocked when circuit is open
circuitBreaker.onStateChange((state) => {
	console.log('Circuit state:', state)
})

Retry Logic

import { createEngine } from '@mikesaintsg/inference'
import { createRetryAdapter } from '@mikesaintsg/adapters'

const retry = createRetryAdapter({
	maxAttempts: 3,
	initialDelayMs: 1000,
	maxDelayMs: 30_000,
	backoffMultiplier: 2,
})

const engine = createEngine(provider, {
	retry,
})

// Failed requests are automatically retried with exponential backoff
const result = await engine.generate([
	{ id: '1', role: 'user', content: 'Hello!', createdAt: Date.now() },
])

Rate Limiting

import { createEngine } from '@mikesaintsg/inference'
import { createRateLimitAdapter } from '@mikesaintsg/adapters'

const rateLimit = createRateLimitAdapter({
	requestsPerMinute: 60,
	maxConcurrent: 10,
})

const engine = createEngine(provider, {
	rateLimit,
})

// Requests automatically wait for a slot before starting
// Slots are released after request completes (success or error)
const handle = engine.stream([
	{ id: '1', role: 'user', content: 'Hello!', createdAt: Date.now() },
])
await handle.result()

Telemetry Integration

import { createEngine } from '@mikesaintsg/inference'
import { createTelemetryAdapter } from '@mikesaintsg/adapters'

const telemetry = createTelemetryAdapter({
	serviceName: 'my-app',
})

const engine = createEngine(provider, {
	telemetry,
})

// Spans are created for each generation request
// Metrics are recorded for latency

Token Batching for UI

import { createTokenBatcher } from '@mikesaintsg/inference'

const batcher = createTokenBatcher({
	batchSize: 5,
	flushIntervalMs: 50,
	flushOnBoundary: 'sentence',
})

batcher.onBatch((batch) => {
	appendToUI(batch.text)
})

for await (const token of handle) {
	batcher.push(token)
}

batcher.end()

Generate from BuiltContext

import { createEngine } from '@mikesaintsg/inference'
import { createContextBuilder } from '@mikesaintsg/contextbuilder'

// Build context using contextbuilder
const builder = createContextBuilder(tokenAdapter, { budget })
builder.addSection('system', 'You are helpful.')
builder.addRetrieval(searchResults)
const context = builder.build()

// Generate from built context
const result = await engine.generateFromContext(context)

Ecosystem Integration

| Package | Integration | |--------------------------------|--------------------------------------------------------| | @mikesaintsg/core | Shared types (Message, GenerationResult, BuiltContext) | | @mikesaintsg/adapters | Provider adapters (OpenAI, Anthropic, etc.) | | @mikesaintsg/vectorstore | Vector storage with embedding adapters | | @mikesaintsg/contextbuilder | Advanced context assembly (BuiltContext) | | @mikesaintsg/contextprotocol | Tool registry and routing |

See Integration with Ecosystem for details.


Browser Support

| Browser | Minimum Version | |---------|-----------------| | Chrome | 90+ | | Firefox | 90+ | | Safari | 15+ | | Edge | 90+ |


License

MIT © mikesaintsg