noosphere

v0.9.3

Published

2 months ago

Unified AI creation engine — text, image, video, audio across all providers

noosphere

Unified AI creation engine — text, image, video, and audio generation across all providers through a single interface.

One import. Every model. Every modality.

Features

7 modalities — LLM, image, video, TTS, STT, music, and embeddings
OpenAI media — GPT-Image-1/1.5, DALL-E 2/3, Sora 2/Pro (video), TTS-1/HD, Whisper — all auto-fetched from OPENAI_API_KEY
Google media — Imagen 4.0 (image), Veo 2/3/3.1 (video), Gemini TTS — all auto-fetched from GEMINI_API_KEY
Always up-to-date models — Dynamic auto-fetch from ALL provider APIs at runtime (OpenAI, Anthropic, Google, Groq, Mistral, xAI, Cerebras, OpenRouter)
Dynamic descriptions — Model descriptions fetched from source (Ollama library, HuggingFace READMEs, CivitAI API) — no hardcoded strings
Modality-filtered sync — syncModels('llm') only fetches LLM providers, avoiding unnecessary requests
867+ media endpoints — via FAL (Flux, SDXL, Kling, Sora 2, VEO 3, Kokoro, ElevenLabs, and hundreds more)
30+ HuggingFace tasks — LLM, image, TTS, translation, summarization, classification, and more
Local-first architecture — Auto-detects Ollama, ComfyUI, Whisper, AudioCraft, Piper, and Kokoro on your machine
Org-aware logos — HuggingFace models show the real org logo (Meta, Google, NVIDIA) instead of generic HF logo
Pre-request token counting — Count tokens before sending, for ALL providers (OpenAI/Groq/Ollama via tiktoken, Google/Anthropic via API)
Full pi-ai access — Agent loop with tool calling, preprocessor (compaction hook), calculateCost, direct stream/complete APIs — all re-exported
Agentic capabilities — Tool use, function calling, reasoning/thinking, vision, and agent loops via Pi-AI
Failover & retry — Automatic retries with exponential backoff and cross-provider failover
Usage tracking — Real-time cost, latency, and token tracking across all providers
TypeScript-first — Full type definitions with ESM and CommonJS support

Install

npm install noosphere

Quick Start

import { Noosphere } from 'noosphere';

const ai = new Noosphere();

// Chat with any LLM
const response = await ai.chat({
  messages: [{ role: 'user', content: 'Hello!' }],
});
console.log(response.content);

// Generate an image with GPT-Image-1 (OpenAI) — just needs OPENAI_API_KEY
const image = await ai.image({
  prompt: 'A sunset over mountains',
  provider: 'openai-media',
});
// image.buffer contains the PNG data

// Generate an image with Imagen 4.0 (Google) — just needs GEMINI_API_KEY
const googleImage = await ai.image({
  prompt: 'A sunset over mountains',
  provider: 'google-media',
});
// googleImage.buffer contains the PNG data

// Generate an image with DALL-E 3
const dalle = await ai.image({
  prompt: 'A sunset over mountains',
  provider: 'openai-media',
  model: 'dall-e-3',
  width: 1024,
  height: 1024,
});
console.log(dalle.url);

// Generate a video
const video = await ai.video({
  prompt: 'Ocean waves crashing on rocks',
  duration: 5,
});
console.log(video.url);

// Text-to-speech with OpenAI TTS — just needs OPENAI_API_KEY
const audio = await ai.speak({
  text: 'Welcome to Noosphere',
  voice: 'alloy',
  format: 'mp3',
});
// audio.buffer contains the audio data

Dynamic Model Auto-Fetch — Always Up-to-Date (ALL Providers, ALL Modalities)

Noosphere automatically discovers the latest models from EVERY provider's API at runtime — across all 4 modalities (LLM, image, video, TTS). When Google releases a new Gemini model, when OpenAI drops GPT-5, when FAL adds a new video model, when a new image model trends on HuggingFace — you get them immediately, without updating Noosphere or any dependency.

Provider Logos — SVG & PNG for Every Model

Every model returned by the auto-fetch includes a logo field with CDN URLs to the provider's official logo — SVG (vector) and PNG (512×512), hosted on DigitalOcean Spaces. For aggregator providers (OpenRouter, HuggingFace), logos are resolved to the real upstream provider — so an x-ai/grok-4 model gets the xAI logo, not OpenRouter's.

const models = await ai.getModels('llm');

for (const model of models) {
  console.log(model.id, model.logo);
  // "gpt-5"          { svg: "https://...cdn.../openai.svg", png: "https://...cdn.../openai.png" }
  // "claude-opus-4-6" { svg: "https://...cdn.../anthropic.svg", png: "https://...cdn.../anthropic.png" }
  // "gemini-2.5-pro"  { svg: "https://...cdn.../google.svg", png: "https://...cdn.../google.png" }
}

// Use directly in your UI:
// <img src={model.logo.svg} alt={model.provider} />
// <img src={model.logo.png} width={48} height={48} />;
}

// Providers also have logos:
const providers = await ai.getProviders();
providers.forEach(p => console.log(p.id, p.logo));

28 providers covered (23 SVG + 28 PNG):

| Provider | SVG | PNG | Source | |---|---|---|---| | OpenAI, Anthropic, Google, Groq, Mistral, xAI | ✓ | ✓ | Official brand assets | | OpenRouter, Cerebras, Meta, DeepSeek | ✓ | ✓ | Official brand assets | | Microsoft, NVIDIA, Qwen, Cohere, Perplexity | ✓ | ✓ | Official brand assets | | Amazon, Together, Fireworks, Replicate | ✓ | ✓ | Official brand assets | | HuggingFace, Ollama, Nebius, Novita | ✓ | ✓ | Official brand assets | | FAL, ComfyUI, Piper, Kokoro, SambaNova | ✗ | ✓ | GitHub avatars (512×512) |

You can also import the logo registry directly:

import { getProviderLogo, PROVIDER_LOGOS, getAllProviderLogos } from 'noosphere';

const logo = getProviderLogo('anthropic');
// { svg: "https://...cdn.../anthropic.svg", png: "https://...cdn.../anthropic.png" }

// Get all logos as a map:
const allLogos = getAllProviderLogos();
console.log(Object.keys(allLogos));
// ['openai', 'anthropic', 'google', 'groq', 'mistral', 'xai', 'openrouter', ...]

For HuggingFace models with multiple inference providers, per-provider logos are available in capabilities.inferenceProviderLogos:

const hfModels = await ai.getModels('llm');
const qwen = hfModels.find(m => m.id === 'Qwen/Qwen2.5-72B-Instruct');

console.log(qwen.capabilities.inferenceProviderLogos);
// {
//   "together": { svg: "https://...cdn.../together.svg", png: "https://...cdn.../together.png" },
//   "fireworks-ai": { svg: "https://...cdn.../fireworks-ai.svg", png: "https://...cdn.../fireworks-ai.png" },
// }

The Problem It Solves

Traditional AI libraries rely on static model catalogs hardcoded at build time. The @mariozechner/pi-ai dependency ships with ~246 LLM models in a pre-generated models.generated.js file. HuggingFace providers typically hardcode 3-5 default models. When a provider releases a new model, you'd have to wait for the library maintainer to update, publish, and then you'd npm update. This lag can be days or weeks.

Noosphere solves this for every provider and every modality simultaneously.

How It Works — Complete Auto-Fetch Architecture

Noosphere has 3 independent auto-fetch systems that work in parallel, one for each provider layer:

┌─────────────────────────────────────────────────────────────┐
│                   NOOSPHERE AUTO-FETCH                       │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  ┌─── Pi-AI Provider (LLM) ─────────────────────────────┐  │
│  │  8 parallel API calls on first chat()/stream():       │  │
│  │  OpenAI, Anthropic, Google, Groq, Mistral,            │  │
│  │  xAI, OpenRouter, Cerebras                            │  │
│  │  → Merges with static pi-ai catalog (246 models)      │  │
│  │  → Constructs synthetic Model objects for new ones     │  │
│  └───────────────────────────────────────────────────────┘  │
│                                                              │
│  ┌─── FAL Provider (Image/Video/TTS) ───────────────────┐  │
│  │  1 API call on listModels():                          │  │
│  │  GET https://api.fal.ai/v1/models/pricing             │  │
│  │  → Returns ALL 867+ endpoints with live pricing       │  │
│  │  → Auto-classifies modality from model ID + unit      │  │
│  └───────────────────────────────────────────────────────┘  │
│                                                              │
│  ┌─── HuggingFace Provider (LLM/Image/TTS) ────────────┐  │
│  │  3 parallel API calls on listModels():                │  │
│  │  GET huggingface.co/api/models?pipeline_tag=...       │  │
│  │  → text-generation (top 50 trending, inference-ready) │  │
│  │  → text-to-image (top 50 trending, inference-ready)   │  │
│  │  → text-to-speech (top 30 trending, inference-ready)  │  │
│  │  → Includes inference provider mapping + pricing      │  │
│  └───────────────────────────────────────────────────────┘  │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Layer 1: LLM Auto-Fetch (Pi-AI Provider) — 8 Provider APIs

On the first chat() or stream() call, Pi-AI queries every LLM provider's model listing API in parallel:

| Provider | API Endpoint | Auth | Model Filter | API Protocol | |---|---|---|---|---| | OpenAI | GET /v1/models | Bearer token | gpt-*, o1*, o3*, o4*, chatgpt-*, codex-* | openai-responses | | Anthropic | GET /v1/models?limit=100 | x-api-key + anthropic-version: 2023-06-01 | claude-* | anthropic-messages | | Google | GET /v1beta/models?key=KEY | API key in URL | gemini-*, gemma-* + must support generateContent | google-generative-ai | | Groq | GET /openai/v1/models | Bearer token | All (Groq only serves chat models) | openai-completions | | Mistral | GET /v1/models | Bearer token | Exclude *embed* | openai-completions | | xAI | GET /v1/models | Bearer token | grok* | openai-completions | | OpenRouter | GET /api/v1/models | Bearer token | All (all OpenRouter models are usable) | openai-completions | | Cerebras | GET /v1/models | Bearer token | All (Cerebras only serves chat models) | openai-completions |

How new LLM models become usable: When a model isn't in the static catalog, Noosphere constructs a synthetic Model object with the correct API protocol, base URL, and inherited cost data:

// New model "gpt-4.5-turbo" discovered from OpenAI's /v1/models:
{
  id: 'gpt-4.5-turbo',
  name: 'gpt-4.5-turbo',
  api: 'openai-responses',              // Correct protocol for OpenAI
  provider: 'openai',
  baseUrl: 'https://api.openai.com/v1',
  reasoning: false,                      // Inferred from model ID prefix
  input: ['text', 'image'],
  cost: { input: 2.5, output: 10, ... },  // Inherited from template model
  contextWindow: 128000,                   // From template or API response
  maxTokens: 16384,
}
// This object is passed directly to pi-ai's complete()/stream() — works immediately

Layer 2: Image/Video/TTS Auto-Fetch (FAL Provider) — Pricing API

FAL already provides a fully dynamic catalog. On listModels(), it fetches from https://api.fal.ai/v1/models/pricing:

// FAL returns an array with ALL available endpoints + live pricing:
[
  { modelId: "fal-ai/flux-pro/v1.1-ultra", price: 0.06, unit: "per_image" },
  { modelId: "fal-ai/kling-video/v2/master/text-to-video", price: 0.10, unit: "per_second" },
  { modelId: "fal-ai/kokoro/american-english", price: 0.002, unit: "per_1k_chars" },
  // ... 867+ endpoints total
]

// Modality is auto-inferred from model ID + pricing unit:
// - unit contains 'char' OR id contains 'tts'/'kokoro'/'elevenlabs' → TTS
// - unit contains 'second' OR id contains 'video'/'kling'/'sora'/'veo' → Video
// - Everything else → Image

Result: Every FAL model is always current — new endpoints appear the moment FAL publishes them. Pricing is always accurate because it comes directly from their API.

Layer 3: LLM/Image/TTS Auto-Fetch (HuggingFace Provider) — Hub API

Instead of 3 hardcoded defaults, HuggingFace now fetches trending inference-ready models from the Hub API across all 3 modalities:

GET https://huggingface.co/api/models
  ?pipeline_tag=text-generation       ← LLM models
  &inference_provider=all             ← Only models available via inference API
  &sort=trendingScore                 ← Most popular first
  &limit=50                           ← Top 50
  &expand[]=inferenceProviderMapping  ← Include provider routing + pricing

| Pipeline Tag | Modality | Limit | What It Fetches | |---|---|---|---| | text-generation | LLM | 50 | Top 50 trending chat/completion models with active inference endpoints | | text-to-image | Image | 50 | Top 50 trending image generation models (SDXL, Flux, etc.) | | text-to-speech | TTS | 30 | Top 30 trending TTS models with active inference endpoints |

What the Hub API returns per model:

{
  "id": "Qwen/Qwen2.5-72B-Instruct",
  "pipeline_tag": "text-generation",
  "likes": 1893,
  "downloads": 4521987,
  "inferenceProviderMapping": [
    {
      "provider": "together",
      "providerId": "Qwen/Qwen2.5-72B-Instruct-Turbo",
      "status": "live",
      "providerDetails": {
        "context_length": 32768,
        "pricing": { "input": 1.2, "output": 1.2 }
      }
    },
    {
      "provider": "fireworks-ai",
      "providerId": "accounts/fireworks/models/qwen2p5-72b-instruct",
      "status": "live"
    }
  ]
}

Noosphere extracts from this:

Model ID → id field
Pricing → first provider with providerDetails.pricing
Context window → first provider with providerDetails.context_length
Inference providers → list of available providers (Together, Fireworks, Groq, etc.)

Three requests fire in parallel (Promise.allSettled) with a 10-second timeout each. If any fails, the 3 hardcoded defaults are always available as fallback.

Resilience Guarantees (All Layers)

| Guarantee | Pi-AI (LLM) | FAL (Image/Video/TTS) | HuggingFace (LLM/Image/TTS) | |---|---|---|---| | Timeout | 8s per provider | No custom timeout | 10s per pipeline_tag | | Parallelism | 8 concurrent requests | 1 request (returns all) | 3 concurrent requests | | Failure handling | Promise.allSettled | Returns [] on error | Promise.allSettled | | Fallback | Static pi-ai catalog (246 models) | Empty list (provider still usable by model ID) | 3 hardcoded defaults | | Caching | One-time fetch, cached in memory | Per listModels() call | One-time fetch, cached in memory | | Auth required | Yes (per-provider API keys) | Yes (FAL key) | Optional (works without token) |

Total Model Coverage

| Source | Modalities | Model Count | Update Frequency | |---|---|---|---| | Pi-AI static catalog | LLM | ~246 | On npm update | | Pi-AI dynamic fetch | LLM | All models across 8 providers | Every session | | FAL pricing API | Image, Video, TTS | 867+ | Every listModels() call | | HuggingFace Hub API | LLM, Image, TTS | Top 130 trending | Every session | | ComfyUI /object_info | Image | Local checkpoints | Every listModels() call | | Local TTS /voices | TTS | Local voices | Every listModels() call |

Force Refresh

const ai = new Noosphere();

// Models are auto-fetched on first call — no action needed:
await ai.chat({ model: 'gemini-2.5-ultra', messages: [...] }); // works immediately

// Trigger a full sync across ALL providers:
const result = await ai.syncModels();
// result = { synced: 1200+, byProvider: { 'pi-ai': 300, 'fal': 867, 'huggingface': 130, ... }, errors: [] }

// Get all models for a specific modality:
const imageModels = await ai.getModels('image');
// Returns: FAL image models + HuggingFace image models + ComfyUI models

Why Hybrid (Static + Dynamic)?

| Approach | Pros | Cons | |---|---|---| | Static catalog only | Accurate costs, fast startup | Stale within days, miss new models | | Dynamic only | Always current | No cost data, no context window info, slow startup | | Hybrid (Noosphere) | Best of both — accurate data for known models + immediate access to new ones | New models have estimated costs until catalog update |

Local Models — Run Everything on Your Machine

Noosphere has comprehensive local model support across all modalities — LLM, image, video, TTS, STT, and music. Auto-discovers what's installed, catalogs what's available to download, and provides a unified API for everything.

Quick Start

const ai = new Noosphere();
await ai.syncModels();

// 774 models discovered — cloud + local, all modalities
const all = await ai.getModels();

// Filter by what you can run locally
const localModels = all.filter(m => m.local || m.status === 'installed');

// What's installed vs what's available to download
const installed = all.filter(m => m.status === 'installed');   // 39 models ready to use
const available = all.filter(m => m.status === 'available');   // 251 models you can download

// Chat with a local Ollama model — same API as cloud
const result = await ai.chat({
  model: 'qwen3:8b',
  provider: 'ollama',
  messages: [{ role: 'user', content: 'Hello!' }],
});
console.log(result.content);   // "Hello! How can I help?"
console.log(result.usage);     // { cost: 0, input: 24, output: 198, unit: 'tokens' }

// Install a new model from Ollama library
await ai.installModel('deepseek-r1:14b');

// Uninstall
await ai.uninstallModel('deepseek-r1:14b');

8 Providers, 5 Modalities, 774+ Models

| Provider | Modality | Models | Source | Auto-Detect | |---|---|---|---|---| | pi-ai | LLM | 482 | OpenAI, Anthropic, Google, Groq, Mistral, xAI, OpenRouter, Cerebras | API keys | | openai-media | image, video, tts, stt | 12 | GPT-Image-1/1.5, DALL-E 2/3, Sora 2/Pro, TTS-1/HD, Whisper | OPENAI_API_KEY | | google-media | image, video, tts | 10 | Imagen 4.0, Veo 2/3/3.1, Gemini TTS (Flash/Pro) | GEMINI_API_KEY | | ollama | LLM, embedding | 70 | 38 installed + 32 from Ollama web catalog | localhost:11434 | | hf-local | image, video, tts, stt, music | 220 | HuggingFace catalog (FLUX, SDXL, Wan2.2, Whisper, MusicGen) | Always (no API key) | | huggingface | LLM, image, tts | dynamic | HuggingFace Inference API | HUGGINGFACE_TOKEN | | comfyui | image, video | dynamic | Installed checkpoints + CivitAI catalog | localhost:8188 | | openai-compat | LLM | dynamic | llama.cpp, LM Studio, vLLM, LocalAI, KoboldCpp, Jan, TabbyAPI | Scans ports | | fal | image, video, tts | 867+ | FAL.ai (Flux, SDXL, Kling, Sora 2, Kokoro, ElevenLabs) | FAL_KEY | | piper | TTS | 2+ | Piper voices installed locally | Binary detection | | whisper-local | STT | 8 | Whisper/Faster-Whisper (tiny → large-v3) | Python detection | | audiocraft | music | 5 | MusicGen (small/medium/large/melody) + AudioGen | Python detection |

Modality-Filtered Sync — Only Fetch What You Need

Sync only the providers relevant to a specific modality instead of fetching everything. This avoids unnecessary network requests (e.g., fetching 270+ HuggingFace READMEs when you only need LLMs).

// Sync only LLM providers (Ollama, pi-ai, openai-compat, huggingface)
await ai.syncModels('llm');

// Sync only image providers (hf-local, comfyui, fal, huggingface)
await ai.syncModels('image');

// Sync only STT providers (whisper-local, hf-local)
await ai.syncModels('stt');

// Sync everything (backward compatible)
await ai.syncModels();

Which providers sync for each modality:

| Modality | Providers Synced | |---|---| | llm | pi-ai, ollama, openai-compat, huggingface (cloud) | | image | openai-media (GPT-Image-1, DALL-E), google-media (Imagen 4.0), hf-local, comfyui, fal, huggingface (cloud) | | video | openai-media (Sora 2/Pro), google-media (Veo 2/3/3.1), hf-local, comfyui, fal | | tts | openai-media (TTS-1, TTS-1-HD), google-media (Gemini TTS), hf-local, fal, piper, kokoro, huggingface (cloud) | | stt | openai-media (Whisper), hf-local, whisper-local | | music | hf-local (MusicGen, AudioLDM, etc.), audiocraft | | embedding | ollama |

Models by Modality

const models = await ai.getModels();

// Filter by modality
const llm    = models.filter(m => m.modality === 'llm');    // 552 (cloud + Ollama local)
const image  = models.filter(m => m.modality === 'image');  // 101 (FLUX, SDXL, SD3, PixArt...)
const tts    = models.filter(m => m.modality === 'tts');    //  61 (MusicGen, Bark, Piper, Kokoro...)
const video  = models.filter(m => m.modality === 'video');  //  30 (Wan2.2, CogVideoX, AnimateDiff...)
const stt    = models.filter(m => m.modality === 'stt');    //  30 (Whisper, wav2vec2...)

Ollama Provider — Local LLM

Full integration with Ollama's API:

// Auto-detected on startup — no config needed
// Models include full metadata from Ollama

const ollamaModels = models.filter(m => m.provider === 'ollama');
for (const m of ollamaModels) {
  console.log(m.id);                      // "llama3.3:70b"
  console.log(m.status);                  // "installed" | "available" | "running"
  console.log(m.localInfo.parameterSize); // "70.6B"
  console.log(m.localInfo.quantization);  // "Q4_K_M"
  console.log(m.localInfo.sizeBytes);     // 42520413916
  console.log(m.localInfo.family);        // "llama"
  console.log(m.logo);                    // { svg: "...meta.svg", png: "...meta.png" }
}

// Chat with streaming
const stream = ai.stream({
  model: 'qwen3:8b',
  provider: 'ollama',
  messages: [{ role: 'user', content: 'Explain quantum computing' }],
});

for await (const event of stream) {
  if (event.type === 'text_delta') process.stdout.write(event.delta);
}

const finalResult = await stream.result();

// Model management
await ai.installModel('deepseek-r1:14b');     // Downloads from Ollama library
await ai.uninstallModel('old-model:7b');       // Removes from disk

// Hardware info
const hw = await ai.getHardware();
// { ollama: true, runningModels: [{ name: 'qwen3:8b', size: 5200000000, ... }] }

OpenAI-Compatible Provider — Any Local Server

Connects to ANY server that implements the OpenAI API:

// Auto-detects servers on common ports:
// llama.cpp (:8080), LM Studio (:1234), vLLM (:8000)
// LocalAI (:8080), TabbyAPI (:5000), KoboldCpp (:5001), Jan (:1337)

// Or configure manually:
const ai = new Noosphere({
  openaiCompat: [
    { baseUrl: 'http://localhost:1234/v1', name: 'LM Studio' },
    { baseUrl: 'http://192.168.1.100:8080/v1', name: 'Remote llama.cpp' },
  ],
});

HuggingFace Local Catalog

Auto-fetches the top models by downloads for each modality:

const imageModels = models.filter(m => m.provider === 'hf-local' && m.modality === 'image');
// → FLUX.1-dev, FLUX.1-schnell, SDXL, SD 3.5, PixArt-Σ, Playground v2.5, Kolors...

const videoModels = models.filter(m => m.provider === 'hf-local' && m.modality === 'video');
// → Wan2.2-T2V, CogVideoX-5b, AnimateDiff, Stable Video Diffusion...

const ttsModels = models.filter(m => m.provider === 'hf-local' && m.modality === 'tts');
// → MusicGen, Stable Audio Open, Bark, ACE-Step...

const sttModels = models.filter(m => m.provider === 'hf-local' && m.modality === 'stt');
// → Whisper large-v3, Whisper large-v3-turbo, wav2vec2...

Models already downloaded to ~/.cache/huggingface/hub/ are automatically detected as status: 'installed'.

ComfyUI — Dynamic Workflow Engine

When ComfyUI is running, noosphere discovers all installed checkpoints, LoRAs, and models:

// Auto-detected on localhost:8188
const comfyModels = models.filter(m => m.provider === 'comfyui');
// → All checkpoints (SD 1.5, SDXL, FLUX, Pony, etc.)

// Also fetches top models from CivitAI as "available"
const civitai = comfyModels.filter(m => m.status === 'available');

Model Descriptions — Dynamic from Source

Every model includes a description field fetched dynamically from its source — no hardcoded strings:

const models = await ai.getModels('llm');

for (const m of models) {
  console.log(m.name, m.description);
  // "llama3.1"  "Llama 3.1 is a new state-of-the-art model from Meta available in 8B, 70B and 405B"
  // "qwen3"     "Qwen3 is the latest generation of large language models in Qwen series"
  // "gemma3"    "The current, most capable model that runs on a single GPU"
}

const imageModels = await ai.getModels('image');
for (const m of imageModels) {
  console.log(m.name, m.description);
  // "stable-diffusion-xl-base-1.0"  "Stable Diffusion XL (SDXL) is a latent text-to-image..."
  // "FLUX.1-dev"                     "FLUX.1 [dev] is a 12 billion parameter rectified flow..."
}

| Provider | Description Source | |---|---| | Ollama | Scraped from ollama.com/library page | | HuggingFace Local | Parsed from each model's README.md on HuggingFace Hub | | CivitAI/ComfyUI | Extracted from CivitAI API response | | Whisper | Parsed from OpenAI's Whisper README on HuggingFace | | AudioCraft | Parsed from Meta's AudioCraft README on HuggingFace |

All description fetches are parallel and fail-safe — if a source is unreachable, models are returned without descriptions. No API keys required.

Model Status & Local Info

Every local model includes rich metadata:

interface ModelInfo {
  id: string;
  provider: string;
  name: string;
  description?: string;          // Dynamic from source (Ollama library, HF README, CivitAI)
  modality: 'llm' | 'image' | 'video' | 'tts' | 'stt' | 'music' | 'embedding';
  status?: 'installed' | 'available' | 'downloading' | 'running' | 'error';
  local: boolean;
  logo?: { svg?: string; png?: string };
  localInfo?: {
    sizeBytes: number;
    family?: string;              // "llama", "gemma3", "qwen2"
    parameterSize?: string;       // "70.6B", "7B", "3.2B"
    quantization?: string;        // "Q4_K_M", "Q8_0", "F16"
    format?: string;              // "gguf", "safetensors", "onnx"
    digest?: string;
    modifiedAt?: string;
    running?: boolean;
    runtime: string;              // "ollama", "diffusers", "comfyui", "piper", "whisper"
  };
  capabilities: {
    contextWindow?: number;
    maxTokens?: number;
    supportsVision?: boolean;
    supportsStreaming?: boolean;
  };
}

Web Catalogs (Auto-Fetched)

| Source | API | What it provides | |---|---|---| | Ollama Library | ollama.com/api/tags | 215+ LLM families with sizes and quantizations | | HuggingFace | huggingface.co/api/models?pipeline_tag=... | Top models per modality (image, video, TTS, STT) | | CivitAI | civitai.com/api/v1/models | SD/SDXL/FLUX checkpoints with previews |

Auto-Detection — Zero Config

Noosphere auto-detects all local runtimes on startup:

| Runtime | Detection Method | Default Port | |---|---|---| | Ollama | GET localhost:11434/api/version | 11434 | | ComfyUI | GET localhost:8188/system_stats | 8188 | | llama.cpp | GET localhost:8080/health | 8080 | | LM Studio | GET localhost:1234/v1/models | 1234 | | vLLM | GET localhost:8000/v1/models | 8000 | | KoboldCpp | GET localhost:5001/v1/models | 5001 | | TabbyAPI | GET localhost:5000/v1/models | 5000 | | Jan | GET localhost:1337/v1/models | 1337 | | Piper | Binary in PATH | — | | Whisper | Python package detection | — | | AudioCraft | Python package detection | — |

📄 Full research: docs/LOCAL_AI_RESEARCH.md — 44KB covering 12+ runtimes across all modalities

Pre-Request Token Counting

Count tokens before sending a request to any provider. Know the cost upfront.

// Via Noosphere instance (auto-routes by model)
const result = await ai.countTokens({
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'Explain quantum computing.' },
  ],
  model: 'gpt-4o',
});
console.log(result.tokens);   // 26
console.log(result.method);   // "tiktoken" (instant, local)
console.log(result.provider); // "openai"

// Google — exact count via API
const google = await ai.countTokens({
  messages: [{ role: 'user', content: 'Hello!' }],
  model: 'gemini-2.5-flash',
});
console.log(google.tokens);   // 3
console.log(google.method);   // "api" (exact)

Token counting by provider:

| Provider | Method | Speed | Accuracy | |---|---|---|---| | OpenAI (GPT-4o, o1, o3, o4, GPT-5) | tiktoken (local) | Instant | Exact | | Google (Gemini) | /countTokens API | ~200ms | Exact | | Anthropic (Claude) | /messages/count_tokens API | ~200ms | Exact | | Groq (Llama, Mixtral, Gemma) | tiktoken (local) | Instant | Exact | | Cerebras (Llama) | tiktoken (local) | Instant | Exact | | Mistral (Mistral, Mixtral, Codestral) | tiktoken (local) | Instant | Close approx | | xAI (Grok) | tiktoken (local) | Instant | Close approx | | OpenRouter (all models) | tiktoken (local) | Instant | Close approx | | Ollama (all local models) | tiktoken (local) | Instant | Close approx |

You can also use standalone functions without a Noosphere instance:

import {
  countTokensOpenAI, countTokensGoogle, countTokensAnthropic,
  countTokensGroq, countTokensMistral, countTokensXai,
  countTokensCerebras, countTokensOpenRouter, countTokensOllama,
} from 'noosphere';

// Local (instant, no API key needed)
const tokens = countTokensOpenAI(messages, 'gpt-4o');       // 26
const groq   = countTokensGroq(messages, 'llama-3.3-70b');  // 26
const ollama = countTokensOllama(messages, 'qwen3:8b');     // 26

// API-based (exact, needs key)
const google = await countTokensGoogle(messages, GEMINI_KEY, 'gemini-2.5-flash');     // 16
const claude = await countTokensAnthropic(messages, ANTHROPIC_KEY, 'claude-sonnet-4-20250514'); // exact

Agent Loop & pi-ai Access

Noosphere re-exports the full pi-ai library for direct access to agent loops, tool calling, cost calculation, and streaming APIs.

Preprocessor — Context Compaction

The preprocessor hook runs before every LLM call in the agent loop. Use it to manage context window limits — truncate old messages, summarize conversations, or implement sliding window strategies.

import { agentLoop, getPiModel, setApiKey, countTokensOpenAI } from 'noosphere';
import type { AgentLoopConfig, AgentContext, PiMessage } from 'noosphere';

setApiKey('openai', process.env.OPENAI_API_KEY!);

const config: AgentLoopConfig = {
  model: getPiModel('openai', 'gpt-4o'),

  // Preprocessor runs before each LLM call
  preprocessor: async (messages) => {
    // Strategy 1: Simple sliding window — keep last N messages
    if (messages.length > 50) {
      return messages.slice(-20);
    }
    return messages;
  },
};

// Start the agent loop
const context: AgentContext = {
  systemPrompt: 'You are a helpful assistant.',
  messages: [],
};

const userMessage = {
  role: 'user' as const,
  content: 'Hello!',
  timestamp: Date.now(),
};

const stream = agentLoop(userMessage, context, config);

for await (const event of stream) {
  if (event.type === 'message_update' && event.assistantMessageEvent.type === 'text_delta') {
    process.stdout.write(event.assistantMessageEvent.delta);
  }
}

Preprocessor Strategies

Token-aware compaction — count tokens and trim to fit the context window:

preprocessor: async (messages) => {
  const model = getPiModel('openai', 'gpt-4o');
  const maxTokens = model.contextWindow * 0.8; // leave 20% for response

  let totalTokens = 0;
  const kept: PiMessage[] = [];

  // Keep messages from newest to oldest until we hit the limit
  for (let i = messages.length - 1; i >= 0; i--) {
    const msg = messages[i];
    const content = 'content' in msg && typeof msg.content === 'string' ? msg.content : '';
    const msgTokens = countTokensOpenAI([{ role: msg.role, content }], 'gpt-4o');

    if (totalTokens + msgTokens > maxTokens) break;
    totalTokens += msgTokens;
    kept.unshift(msg);
  }

  return kept;
},

Summarization compaction — summarize old messages, keep recent ones:

preprocessor: async (messages) => {
  if (messages.length <= 20) return messages;

  // Summarize the older messages using the LLM itself
  const oldMessages = messages.slice(0, -10);
  const recentMessages = messages.slice(-10);

  const summary = await piCompleteSimple(getPiModel('openai', 'gpt-4o-mini'), {
    systemPrompt: 'Summarize this conversation in 2-3 sentences.',
    messages: oldMessages,
  });

  // Replace old messages with a summary message
  const summaryText = summary.content
    .filter((c): c is { type: 'text'; text: string } => c.type === 'text')
    .map((c) => c.text)
    .join('');

  return [
    { role: 'user' as const, content: `[Previous conversation summary: ${summaryText}]`, timestamp: Date.now() },
    ...recentMessages,
  ];
},

Tool Calling

Define tools for the agent loop with typed parameters:

import { Type } from '@sinclair/typebox';
import type { AgentTool } from 'noosphere';

const weatherTool: AgentTool = {
  name: 'get_weather',
  label: 'Get Weather',
  description: 'Get the current weather for a location',
  parameters: Type.Object({
    location: Type.String({ description: 'City name' }),
  }),
  execute: async (toolCallId, params) => {
    const weather = await fetchWeather(params.location);
    return { output: JSON.stringify(weather), details: weather };
  },
};

const context: AgentContext = {
  systemPrompt: 'You are a helpful assistant with weather access.',
  messages: [],
  tools: [weatherTool],
};

const stream = agentLoop(userMessage, context, config);

for await (const event of stream) {
  if (event.type === 'tool_execution_start') {
    console.log(`Calling ${event.toolName}...`);
  }
  if (event.type === 'tool_execution_end') {
    console.log(`Result: ${event.result}`);
  }
}

Cost Calculation

import { calculateCost, getPiModel } from 'noosphere';

const model = getPiModel('openai', 'gpt-4o');
const usage = { input: 1000, output: 500, cacheRead: 0, cacheWrite: 0 };
const cost = calculateCost(model, usage);
console.log(cost.total);  // $0.00625
console.log(cost.input);  // $0.0025
console.log(cost.output); // $0.00375

Direct Stream/Complete APIs

import { piComplete, piStream, piCompleteSimple, setApiKey, getPiModel } from 'noosphere';

setApiKey('openai', process.env.OPENAI_API_KEY!);
const model = getPiModel('openai', 'gpt-4o');

// Simple completion
const result = await piCompleteSimple(model, {
  systemPrompt: 'You are helpful.',
  messages: [{ role: 'user', content: 'Hello!', timestamp: Date.now() }],
});

// Streaming
const stream = piStream(model, {
  messages: [{ role: 'user', content: 'Hello!', timestamp: Date.now() }],
});

for await (const event of stream) {
  if (event.type === 'text_delta') process.stdout.write(event.delta);
}

Configuration

API keys are resolved from the constructor config or environment variables (config takes priority):

const ai = new Noosphere({
  keys: {
    openai: 'sk-...',
    anthropic: 'sk-ant-...',
    google: 'AIza...',
    fal: 'fal-...',
    huggingface: 'hf_...',
    groq: 'gsk_...',
    mistral: '...',
    xai: '...',
    openrouter: 'sk-or-...',
  },
});

Or set environment variables:

| Variable | Provider | |---|---| | OPENAI_API_KEY | OpenAI | | ANTHROPIC_API_KEY | Anthropic | | GEMINI_API_KEY | Google Gemini | | FAL_KEY | FAL.ai | | HUGGINGFACE_TOKEN | Hugging Face | | GROQ_API_KEY | Groq | | MISTRAL_API_KEY | Mistral | | XAI_API_KEY | xAI (Grok) | | OPENROUTER_API_KEY | OpenRouter |

Full Configuration Reference

const ai = new Noosphere({
  // API keys (or use env vars above)
  keys: { /* ... */ },

  // Default models per modality
  defaults: {
    llm: { provider: 'pi-ai', model: 'claude-sonnet-4-20250514' },
    image: { provider: 'fal', model: 'fal-ai/flux/schnell' },
    video: { provider: 'fal', model: 'fal-ai/kling-video/v2/master/text-to-video' },
    tts: { provider: 'fal', model: 'fal-ai/kokoro/american-english' },
  },

  // Local service configuration
  autoDetectLocal: true,  // env: NOOSPHERE_AUTO_DETECT_LOCAL
  local: {
    ollama: { enabled: true, host: 'http://localhost', port: 11434 },
    comfyui: { enabled: true, host: 'http://localhost', port: 8188 },
    piper: { enabled: true, host: 'http://localhost', port: 5500 },
    kokoro: { enabled: true, host: 'http://localhost', port: 5501 },
    custom: [],  // additional LocalServiceConfig[]
  },

  // Retry & failover
  retry: {
    maxRetries: 2,           // default: 2
    backoffMs: 1000,         // default: 1000 (exponential: 1s, 2s, 4s...)
    failover: true,          // default: true — try other providers on failure
    retryableErrors: ['PROVIDER_UNAVAILABLE', 'RATE_LIMITED', 'TIMEOUT'],
  },

  // Timeouts per modality (ms)
  timeout: {
    llm: 30000,    // 30s
    image: 120000, // 2min
    video: 300000, // 5min
    tts: 60000,    // 1min
  },

  // Model discovery cache (minutes)
  discoveryCacheTTL: 60,  // env: NOOSPHERE_DISCOVERY_CACHE_TTL

  // Real-time usage callback
  onUsage: (event) => {
    console.log(`${event.provider}/${event.model}: $${event.cost} (${event.latencyMs}ms)`);
  },
});

Local Service Environment Variables

| Variable | Default | Description | |---|---|---| | OLLAMA_HOST | http://localhost | Ollama server host | | OLLAMA_PORT | 11434 | Ollama server port | | COMFYUI_HOST | http://localhost | ComfyUI server host | | COMFYUI_PORT | 8188 | ComfyUI server port | | PIPER_HOST | http://localhost | Piper TTS server host | | PIPER_PORT | 5500 | Piper TTS server port | | KOKORO_HOST | http://localhost | Kokoro TTS server host | | KOKORO_PORT | 5501 | Kokoro TTS server port | | NOOSPHERE_AUTO_DETECT_LOCAL | true | Enable/disable local service auto-detection | | NOOSPHERE_DISCOVERY_CACHE_TTL | 60 | Model cache TTL in minutes |

API Reference

`new Noosphere(config?)`

Creates a new instance. Providers are initialized lazily on first API call. Auto-detects local services via HTTP pings (2s timeout each).

Generation Methods

`ai.chat(options): Promise<NoosphereResult>`

Generate text with any LLM. Supports 246+ models across 8 providers.

const result = await ai.chat({
  provider: 'anthropic',                // optional — auto-resolved if omitted
  model: 'claude-sonnet-4-20250514',    // optional — uses default or first available
  messages: [
    { role: 'system', content: 'You are helpful.' },
    { role: 'user', content: 'Explain quantum computing' },
  ],
  temperature: 0.7,     // optional (0-2)
  maxTokens: 1024,      // optional
  jsonMode: false,       // optional
});

console.log(result.content);          // response text
console.log(result.thinking);         // reasoning output (Claude, GPT-5, o3, Gemini, Grok-4)
console.log(result.usage.cost);       // cost in USD
console.log(result.usage.input);      // input tokens
console.log(result.usage.output);     // output tokens
console.log(result.latencyMs);        // response time in ms

`ai.stream(options): NoosphereStream`

Stream LLM responses token-by-token. Same options as chat().

const stream = ai.stream({
  messages: [{ role: 'user', content: 'Write a story' }],
});

for await (const event of stream) {
  switch (event.type) {
    case 'text_delta':
      process.stdout.write(event.delta!);
      break;
    case 'thinking_delta':
      console.log('[thinking]', event.delta);
      break;
    case 'done':
      console.log('\n\nUsage:', event.result!.usage);
      break;
    case 'error':
      console.error(event.error);
      break;
  }
}

// Or consume the full result
const result = await stream.result();

// Abort at any time
stream.abort();

`ai.image(options): Promise<NoosphereResult>`

Generate images. Supports 200+ image models via FAL, HuggingFace, and ComfyUI.

const result = await ai.image({
  provider: 'fal',                              // optional
  model: 'fal-ai/flux-2-pro',                   // optional
  prompt: 'A futuristic cityscape at sunset',
  negativePrompt: 'blurry, low quality',         // optional
  width: 1024,                                   // optional
  height: 768,                                   // optional
  seed: 42,                                      // optional — reproducible results
  steps: 30,                                     // optional — inference steps (more = higher quality)
  guidanceScale: 7.5,                            // optional — prompt adherence (higher = stricter)
});

console.log(result.url);                // image URL (FAL)
console.log(result.buffer);             // image Buffer (HuggingFace, ComfyUI)
console.log(result.media?.width);       // actual dimensions
console.log(result.media?.height);
console.log(result.media?.format);      // 'png'

`ai.video(options): Promise<NoosphereResult>`

Generate videos. Supports 150+ video models via FAL (Kling, Sora 2, VEO 3, WAN, Pixverse, and more).

const result = await ai.video({
  provider: 'fal',
  model: 'fal-ai/kling-video/v2/master/text-to-video',
  prompt: 'A bird flying through clouds',
  imageUrl: 'https://...',    // optional — image-to-video
  duration: 5,                // optional — seconds
  fps: 24,                    // optional
  width: 1280,                // optional
  height: 720,                // optional
});

console.log(result.url);                // video URL
console.log(result.media?.duration);    // actual duration
console.log(result.media?.fps);         // frames per second
console.log(result.media?.format);      // 'mp4'

`ai.speak(options): Promise<NoosphereResult>`

Text-to-speech synthesis. Supports 50+ TTS models via FAL, HuggingFace, Piper, and Kokoro.

const result = await ai.speak({
  provider: 'fal',
  model: 'fal-ai/kokoro/american-english',
  text: 'Hello world',
  voice: 'af_heart',        // optional — voice ID
  language: 'en',            // optional
  speed: 1.0,                // optional
  format: 'mp3',             // optional — 'mp3' | 'wav' | 'ogg'
});

console.log(result.buffer);  // audio Buffer
console.log(result.url);     // audio URL (FAL)

Discovery Methods

`ai.getProviders(modality?): Promise<ProviderInfo[]>`

List available providers, optionally filtered by modality.

const providers = await ai.getProviders('llm');
// [{ id: 'pi-ai', name: 'Pi-AI', modalities: ['llm'], local: false, status: 'online', modelCount: 246 }]

`ai.getModels(modality?): Promise<ModelInfo[]>`

List all available models with full metadata.

const models = await ai.getModels('image');
// Returns ModelInfo[] with id, provider, name, modality, local, cost, capabilities

`ai.getModel(provider, modelId): Promise<ModelInfo | null>`

Get details about a specific model.

`ai.syncModels(): Promise<SyncResult>`

Refresh model lists from all providers. Returns sync count, per-provider breakdown, and any errors.

Usage Tracking

`ai.getUsage(options?): UsageSummary`

Get aggregated usage statistics with optional filtering.

const usage = ai.getUsage({
  since: '2024-01-01',    // optional — ISO date or Date object
  until: '2024-12-31',    // optional
  provider: 'openai',     // optional — filter by provider
  modality: 'llm',        // optional — filter by modality
});

console.log(usage.totalCost);        // total USD spent
console.log(usage.totalRequests);    // number of requests
console.log(usage.byProvider);       // { openai: 2.50, anthropic: 1.20, fal: 0.30 }
console.log(usage.byModality);       // { llm: 3.00, image: 0.70, video: 0.30, tts: 0.00 }

Lifecycle

`ai.registerProvider(provider): void`

`ai.dispose(): Promise<void>`

Cleanup all provider resources, clear model cache, and reset usage tracker.

NoosphereResult

Every generation method returns a NoosphereResult:

interface NoosphereResult {
  content?: string;        // LLM response text
  thinking?: string;       // reasoning/thinking output (supported models)
  url?: string;            // media URL (images, videos, audio from cloud providers)
  buffer?: Buffer;         // media binary data (local providers, HuggingFace)
  provider: string;        // which provider handled the request
  model: string;           // which model was used
  modality: Modality;      // 'llm' | 'image' | 'video' | 'tts'
  latencyMs: number;       // request duration in milliseconds
  usage: {
    cost: number;          // cost in USD
    input?: number;        // input tokens/characters
    output?: number;       // output tokens
    unit?: string;         // 'tokens' | 'characters' | 'per_image' | 'per_second' | 'free'
  };
  media?: {
    width?: number;        // image/video width
    height?: number;       // image/video height
    duration?: number;     // video/audio duration in seconds
    format?: string;       // 'png' | 'mp4' | 'mp3' | 'wav'
    fps?: number;          // video frames per second
  };
}

Providers In Depth

Pi-AI — LLM Gateway (246+ models)

Provider ID: pi-ai Modalities: LLM (chat + streaming) Library: @mariozechner/pi-ai

A unified gateway that routes to 8 LLM providers through 4 different API protocols:

| API Protocol | Providers | |---|---| | anthropic-messages | Anthropic | | google-generative-ai | Google | | openai-responses | OpenAI (reasoning models) | | openai-completions | OpenAI, xAI, Groq, Cerebras, Zai, OpenRouter |

Anthropic Models (19)

| Model | Context | Reasoning | Vision | Input Cost | Output Cost | |---|---|---|---|---|---| | claude-opus-4-0 | 200k | Yes | Yes | $15/M | $75/M | | claude-opus-4-1 | 200k | Yes | Yes | $15/M | $75/M | | claude-sonnet-4-20250514 | 200k | Yes | Yes | $3/M | $15/M | | claude-sonnet-4-5-20250929 | 200k | Yes | Yes | $3/M | $15/M | | claude-3-7-sonnet-20250219 | 200k | Yes | Yes | $3/M | $15/M | | claude-3-5-sonnet-20241022 | 200k | No | Yes | $3/M | $15/M | | claude-haiku-4-5-20251001 | 200k | No | Yes | $0.80/M | $4/M | | claude-3-5-haiku-20241022 | 200k | No | Yes | $0.80/M | $4/M | | claude-3-haiku-20240307 | 200k | No | Yes | $0.25/M | $1.25/M | | ...and 10 more variants | | | | | |

OpenAI Models (24)

| Model | Context | Reasoning | Vision | Input Cost | Output Cost | |---|---|---|---|---|---| | gpt-5 | 200k | Yes | Yes | $10/M | $30/M | | gpt-5-mini | 200k | Yes | Yes | $2.50/M | $10/M | | gpt-4.1 | 128k | No | Yes | $2/M | $8/M | | gpt-4.1-mini | 128k | No | Yes | $0.40/M | $1.60/M | | gpt-4.1-nano | 128k | No | Yes | $0.10/M | $0.40/M | | gpt-4o | 128k | No | Yes | $2.50/M | $10/M | | gpt-4o-mini | 128k | No | Yes | $0.15/M | $0.60/M | | o3-pro | 200k | Yes | Yes | $20/M | $80/M | | o3-mini | 200k | Yes | Yes | $1.10/M | $4.40/M | | o4-mini | 200k | Yes | Yes | $1.10/M | $4.40/M | | codex-mini-latest | 200k | Yes | No | $1.50/M | $6/M | | ...and 13 more variants | | | | | |

Google Gemini Models (19)

| Model | Context | Reasoning | Vision | Cost | |---|---|---|---|---| | gemini-2.5-flash | 1M | Yes | Yes | $0.15-0.60/M | | gemini-2.5-pro | 1M | Yes | Yes | $1.25-10/M | | gemini-2.0-flash | 1M | No | Yes | $0.10-0.40/M | | gemini-2.0-flash-lite | 1M | No | Yes | $0.025-0.10/M | | gemini-1.5-flash | 1M | No | Yes | $0.075-0.30/M | | gemini-1.5-pro | 2M | No | Yes | $1.25-5/M | | ...and 13 more variants | | | | |

xAI Grok Models (20)

| Model | Context | Reasoning | Vision | Input Cost | |---|---|---|---|---| | grok-4 | 256k | Yes | Yes | $5/M | | grok-4-fast | 256k | Yes | Yes | $3/M | | grok-3 | 131k | No | Yes | $3/M | | grok-3-fast | 131k | No | Yes | $5/M | | grok-3-mini-fast-latest | 131k | Yes | No | $0.30/M | | grok-2-vision | 32k | No | Yes | $2/M | | ...and 14 more variants | | | | |

Groq Models (15)

| Model | Context | Cost | |---|---|---| | llama-3.3-70b-versatile | 128k | $0.59/M | | llama-3.1-8b-instant | 128k | $0.05/M | | mistral-saba-24b | 32k | $0.40/M | | qwen-qwq-32b | 128k | $0.29/M | | deepseek-r1-distill-llama-70b | 128k | $0.75/M | | ...and 10 more | | |

Cerebras Models (3)

gpt-oss-120b, qwen-3-235b-a22b-instruct-2507, qwen-3-coder-480b

Zai Models (5)

glm-4.6, glm-4.5, glm-4.5-flash, glm-4.5v, glm-4.5-air

OpenRouter (141 models)

Aggregator providing access to hundreds of additional models including Llama, Deepseek, Mistral, Qwen, and many more. Full list available via ai.getModels('llm').

The Pi-AI Engine — Deep Dive

Noosphere's LLM provider is powered by @mariozechner/pi-ai, part of the Pi mono-repo by Mario Zechner (badlogic). Pi is NOT a wrapper like LangChain or Mastra — it's a micro-framework for agentic AI (~15K LOC, 4 npm packages) that was built from scratch as a minimalist alternative to Claude Code.

Pi consists of 4 packages in 3 tiers:

TIER 1 — FOUNDATION
  @mariozechner/pi-ai             LLM API: stream(), complete(), model registry
                                  0 internal deps, talks to 20+ providers

TIER 2 — INFRASTRUCTURE
  @mariozechner/pi-agent-core     Agent loop, tool execution, lifecycle events
                                  Depends on pi-ai

  @mariozechner/pi-tui            Terminal UI with differential rendering
                                  Standalone, 0 internal deps

TIER 3 — APPLICATION
  @mariozechner/pi-coding-agent   CLI + SDK: sessions, compaction, extensions
                                  Depends on all above

Noosphere uses @mariozechner/pi-ai (Tier 1) directly for LLM access. But the full Pi ecosystem provides capabilities that can be layered on top.

How Pi Keeps 200+ Models Updated

Pi does NOT hardcode models. It has an auto-generation pipeline that runs at build time:

STEP 1: FETCH (3 sources in parallel)
┌──────────────────┐  ┌──────────────────┐  ┌───────────────┐
│   models.dev     │  │   OpenRouter     │  │  Vercel AI    │
│   /api.json      │  │   /v1/models     │  │  Gateway      │
│                  │  │                  │  │  /v1/models   │
│ Context windows  │  │ Pricing ($/M)    │  │ Capability    │
│ Capabilities     │  │ Availability     │  │ tags          │
│ Tool support     │  │ Provider routing │  │               │
└────────┬─────────┘  └────────┬─────────┘  └──────┬────────┘
         └─────────┬───────────┴────────────────────┘
                   ▼
STEP 2: MERGE & DEDUPLICATE
         Priority: models.dev > OpenRouter > Vercel
         Key: provider + modelId
                   │
                   ▼
STEP 3: FILTER
         ✅ tool_call === true
         ✅ streaming supported
         ✅ system messages supported
         ✅ not deprecated
                   │
                   ▼
STEP 4: NORMALIZE
         Costs → $/million tokens
         API type → one of 4 protocols
         Input modes → ["text"] or ["text","image"]
                   │
                   ▼
STEP 5: PATCH (manual corrections)
         Claude Opus: cache pricing fix
         GPT-5.4: context window override
         Kimi K2.5: hardcoded pricing
                   │
                   ▼
STEP 6: GENERATE TypeScript
         → models.generated.ts (~330KB)
         → 200+ models with full type safety

Each generated model entry looks like:

{
  id: "claude-opus-4-6",
  name: "Claude Opus 4.6",
  api: "anthropic-messages",
  provider: "anthropic",
  baseUrl: "https://api.anthropic.com",
  reasoning: true,
  input: ["text", "image"],
  cost: {
    input: 15,          // $15/M tokens
    output: 75,         // $75/M tokens
    cacheRead: 1.5,     // prompt cache hit
    cacheWrite: 18.75,  // prompt cache write
  },
  contextWindow: 200_000,
  maxTokens: 32_000,
} satisfies Model<"anthropic-messages">

When a new model is released (e.g., Gemini 3.0), it appears in models.dev/OpenRouter → the script captures it → a new Pi version is published → Noosphere updates its dependency.

4 API Protocols — How Pi Talks to Every Provider

Pi abstracts all LLM providers into 4 wire protocols. Each protocol handles the differences in request format, streaming format, auth headers, and response parsing:

| Protocol | Providers | Key Differences | |---|---|---| | anthropic-messages | Anthropic, AWS Bedrock | system as top-level field, content as [{type:"text", text:"..."}] blocks, x-api-key auth, anthropic-beta headers | | openai-completions | OpenAI, xAI, Groq, Cerebras, OpenRouter, Ollama, vLLM | system as message with role:"system", content as string, Authorization: Bearer auth, tool_calls array | | openai-responses | OpenAI (reasoning models) | New Responses API with server-side context, store: true, reasoning summaries | | google-generative-ai | Google Gemini, Vertex AI | systemInstruction.parts[{text}], role "model" instead of "assistant", functionCall instead of tool_calls, thinkingConfig |

The core function streamSimple() detects which protocol to use based on model.api and handles all the formatting/parsing transparently:

// What happens inside Pi when you call Noosphere's chat():
async function* streamSimple(
  model: Model,           // includes model.api to determine protocol
  context: Context,       // { systemPrompt, messages, tools }
  options?: StreamOptions  // { signal, onPayload, thinkingLevel, ... }
): AsyncIterable<AssistantMessageEvent> {
  // 1. Format request according to model.api protocol
  // 2. Open SSE/WebSocket stream
  // 3. Parse provider-specific chunks
  // 4. Emit normalized events:
  //    → text_delta, thinking_delta, tool_call, message_end
}

Agentic Capabilities

These are the capabilities people get access to through the Pi-AI engine:

1. Tool Use / Function Calling

Full structured tool calling supported across all major providers. Tool definitions use TypeBox schemas with runtime validation via AJV:

import { type Tool, StringEnum } from '@mariozechner/pi-ai';
import { Type } from '@sinclair/typebox';

// Define a tool with typed parameters
const searchTool: Tool = {
  name: 'web_search',
  description: 'Search the web for information',
  parameters: Type.Object({
    query: Type.String({ description: 'Search query' }),
    maxResults: Type.Optional(Type.Number({ default: 5 })),
    type: StringEnum(['web', 'images', 'news'], { description: 'Search type' }),
  }),
};

// Pass tools in context — Pi handles the rest
const context = {
  systemPrompt: 'You are a helpful assistant.',
  messages: [{ role: 'user', content: 'Search for recent AI news' }],
  tools: [searchTool],
};

How tool calling works internally:

User prompt → LLM → "I need to call web_search"
                         │
                         ▼
              Pi validates arguments with AJV
              against the TypeBox schema
                         │
                   ┌─────┴─────┐
                   │ Valid?     │
                   ├─Yes───────┤
                   │ Execute   │
                   │ tool      │
                   ├───────────┤
                   │ No        │
                   │ Return    │
                   │ validation│
                   │ error to  │
                   │ LLM       │
                   └───────────┘
                         │
                         ▼
              Tool result → back into context → LLM continues

Provider-specific tool_choice control:

Anthropic: "auto" | "any" | "none" | { type: "tool", name: "specific_tool" }
OpenAI: "auto" | "none" | "required" | { type: "function", function: { name: "..." } }
Google: "auto" | "none" | "any"

Partial JSON streaming: During streaming, Pi parses tool call arguments incrementally using partial JSON parsing. This means you can see tool arguments being built in real-time, not just after the tool call completes.

2. Reasoning / Extended Thinking

Pi provides unified thinking support across all providers that support it. Thinking blocks are automatically extracted, separated from regular text, and streamed as distinct events:

| Provider | Models | Control Parameters | How It Works | |---|---|---|---| | Anthropic | Claude Opus, Sonnet 4+ | thinkingEnabled: boolean, thinkingBudgetTokens: number | Extended thinking blocks in response, separate thinking content type | | OpenAI | o1, o3, o4, GPT-5 | reasoningEffort: "minimal" \| "low" \| "medium" \| "high" | Reasoning via Responses API, reasoningSummary: "auto" \| "detailed" \| "concise" | | Google | Gemini 2.5 Flash/Pro | thinking.enabled: boolean, thinking.budgetTokens: number | Thinking via thinkingConfig, mapped to effort levels | | xAI | Grok-4, Grok-3-mini | Native reasoning | Automatic when model supports it |

Cross-provider thinking portability: When switching models mid-conversation, Pi converts thinking blocks between formats. Anthropic thinking blocks become <thinking> tagged text when sent to OpenAI/Google, and vice versa.

// Thinking is automatically extracted in Noosphere responses:
const result = await ai.chat({
  model: 'claude-opus-4-6',
  messages: [{ role: 'user', content: 'Solve this step by step: 15! / 13!' }],
});

console.log(result.thinking);  // "Let me work through this... 15! = 15 × 14 × 13!..."
console.log(result.content);   // "15! / 13! = 15 × 14 = 210"

// During streaming, thinking arrives as separate events:
const stream = ai.stream({ messages: [...] });
for await (const event of stream) {
  if (event.type === 'thinking_delta') console.log('[THINKING]', event.delta);
  if (event.type === 'text_delta') console.log('[RESPONSE]', event.delta);
}

3. Vision / Multimodal Input

Models with input: ["text", "image"] accept images alongside text. Pi handles the encoding and format differences per provider:

// Send images to vision-capable models
const messages = [{
  role: 'user',
  content: [
    { type: 'text', text: 'What is in this image?' },
    { type: 'image', data: base64PngString, mimeType: 'image/png' },
  ],
}];

// Supported MIME types: image/png, image/jpeg, image/gif, image/webp
// Images are silently ignored when sent to non-vision models

Vision-capable models include: All Claude models, all GPT-4o/GPT-5 models, Gemini models, Grok-2-vision, Grok-4, and select Groq models.

4. Agent Loop — Autonomous Tool Execution

The @mariozechner/pi-agent-core package provides a complete agent loop that automatically cycles through prompt → LLM → tool call → result → repeat until the task is done:

import { agentLoop } from '@mariozechner/pi-ai';

const events = agentLoop(userMessage, agentContext, {
  model: getModel('anthropic', 'claude-opus-4-6'),
  tools: [searchTool, readFileTool, writeFileTool],
  signal: abortController.signal,
});

for await (const event of events) {
  switch (event.type) {
    case 'agent_start':           // Agent begins
    case 'turn_start':            // New LLM turn begins
    case 'message_start':         // LLM starts responding
    case 'message_update':        // Text/thinking delta received
    case 'tool_execution_start':  // About to execute a tool
    case 'tool_execution_end':    // Tool finished, result available
    case 'message_end':           // LLM finished this message
    case 'turn_end':              // Turn complete (may loop if tools were called)
    case 'agent_end':             // All done, final messages available
  }
}

The agent loop state machine:

[User sends prompt]
        │
        ▼
  ┌─[Build Context]──▶ [Check Queues]──▶ [Stream LLM]◄── streamFn()
  │                                           │
  │                                     ┌─────┴──────┐
  │                                     │            │
  │                                   text      tool_call
  │                                     │            │
  │                                     ▼            ▼
  │                                  [Done]    [Execute Tool]
  │                                                  │
  │                                            tool result
  │                                                  │
  └──────────────────────────────────────────────────┘
                                    (loops back to Stream LLM)

Key design decisions:

Tools execute sequentially by default (parallelism can be added on top)
The streamFn is injectable — you can wrap it with middleware to modify requests per-provider
Tool arguments are validated at runtime using TypeBox + AJV before execution
Aborted/failed responses preserve partial content and usage data
Tool results are automatically added to the conversation context

5. The `streamFn` Pattern — Injectable Middleware

This is Pi's most powerful architectural feature. The streamFn is the function that actually talks to the LLM, and it can be wrapped with middleware like Express.js request handlers:

import type { StreamFn } from '@mariozechner/pi-agent-core';
import { streamSimple } from '@mariozechner/pi-ai';

// Start with Pi's base streaming function
let fn: StreamFn = streamSimple;

// Wrap it with middleware that modifies requests per-provider
fn = createMyCustomWrapper(fn, {
  // Add custom headers for Anthropic
  onPayload: (payload) => {
    if (model.provider === 'anthropic') {
      payload.headers['anthropic-beta'] = 'fine-grained-tool-streaming-2025-05-14';
    }
  },
});

// Each wrapper calls the previous one, forming a chain:
// request → wrapper3 → wrapper2 → wrapper1 → streamSimple → API

This pattern is what allows projects like OpenClaw to stack 16 provider-specific wrappers on top of Pi's base streaming — adding beta headers for Anthropic, WebSocket transport for OpenAI, thinking sanitization for Google, reasoning effort headers for OpenRouter, and more — without modifying Pi's source code.

6. Session Management (via pi-coding-agent)

The @mariozechner/pi-coding-agent package provides persistent session management with JSONL-based storage:

import { createAgentSession, SessionManager } from '@mariozechner/pi-coding-agent';

// Create a session with full persistence
const session = await createAgentSession({
  model: 'claude-opus-4-6',
  tools: myTools,
  sessionManager,  // handles JSONL persistence
});

const result = await session.run('Build a REST API');
// Session is automatically saved to:
// ~/.pi/agent/sessions/session_abc123.jsonl

Session file format (append-only JSONL):

{"role":"user","content":"Build a REST API","timestamp":1710000000}
{"role":"assistant","content":"I'll create...","model":"claude-opus-4-6","usage":{...}}
{"role":"toolResult","toolCallId":"tc_001","toolName":"bash","content":"OK"}
{"type":"compaction","summary":"The user asked to build...","preservedMessages":[...]}

Session operations:

create() — new session
open(id) — restore existing session
continueRecent() — continue the most recent session
forkFrom(id) — create a branch (new JSONL referencing parent)
inMemory() — RAM-only session (for SDK/testing)

7. Context Compaction — Automatic Context Window Management

When the conversation approaches the model's context window limit, Pi automatically compacts the history:

1. DETECT: Calculate inputTokens + outputTokens vs model.contextWindow
2. TRIGGER: Proactively before overflow, or as recovery after overflow error
3. SUMMARIZE: Send history to LLM with a compaction prompt
4. WRITE: Append compaction entry to JSONL:
   {"type":"compaction","summary":"...","preservedMessages":[last N messages]}
5. CONTINUE: Context is now summary + recent messages instead of full history

The JSONL file is never rewritten — compaction entries are appended, maintaining a complete audit trail.

8. Cost Tracking — Cache-Aware Pricing

Pi tracks costs per-request with cache-aware pricing for providers that support prompt caching:

// Every model has 4 cost dimensions:
{
  input: 15,          // $15 per 1M input tokens
  output: 75,         // $75 per 1M output tokens
  cacheRead: 1.5,     // $1.50 per 1M cached prompt tokens (read)
  cacheWrite: 18.75,  // $18.75 per 1M cached prompt tokens (write)
}

// Usage tracking on every response:
{
  input: 1500,        // tokens consumed as input
  output: 800,        // tokens generated
  cacheRead: 5000,    // prompt cache hits
  cacheWrite: 1500,   // prompt cache writes
  cost: {
    total: 0.082,     // total cost in USD
    input: 0.0225,
    output: 0.06,
    cacheRead: 0.0075,
    cacheWrite: 0.028,
  },
}

Anthropic and OpenAI support prompt caching. For providers without caching, cacheRead and cacheWrite are always 0.

9. Extension System (via pi-coding-agent)

Pi supports a plugin system where extensions can register tools, commands, and lifecycle hooks:

// Extensions are TypeScript mo

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

noosphere

Features

Install

Quick Start

Dynamic Model Auto-Fetch — Always Up-to-Date (ALL Providers, ALL Modalities)

Provider Logos — SVG & PNG for Every Model

The Problem It Solves

How It Works — Complete Auto-Fetch Architecture

Layer 1: LLM Auto-Fetch (Pi-AI Provider) — 8 Provider APIs

Layer 2: Image/Video/TTS Auto-Fetch (FAL Provider) — Pricing API

Layer 3: LLM/Image/TTS Auto-Fetch (HuggingFace Provider) — Hub API

Resilience Guarantees (All Layers)

Total Model Coverage

Force Refresh

Why Hybrid (Static + Dynamic)?

Local Models — Run Everything on Your Machine

Quick Start

8 Providers, 5 Modalities, 774+ Models

Modality-Filtered Sync — Only Fetch What You Need

Models by Modality

Ollama Provider — Local LLM

OpenAI-Compatible Provider — Any Local Server

HuggingFace Local Catalog

ComfyUI — Dynamic Workflow Engine

Model Descriptions — Dynamic from Source

Model Status & Local Info

Web Catalogs (Auto-Fetched)

Auto-Detection — Zero Config

Pre-Request Token Counting

Agent Loop & pi-ai Access

Preprocessor — Context Compaction

Preprocessor Strategies

Tool Calling

Cost Calculation

Direct Stream/Complete APIs

Configuration

Full Configuration Reference

Local Service Environment Variables

API Reference

new Noosphere(config?)

Generation Methods

ai.chat(options): Promise<NoosphereResult>

ai.stream(options): NoosphereStream

ai.image(options): Promise<NoosphereResult>

ai.video(options): Promise<NoosphereResult>

ai.speak(options): Promise<NoosphereResult>

Discovery Methods

ai.getProviders(modality?): Promise<ProviderInfo[]>

ai.getModels(modality?): Promise<ModelInfo[]>

ai.getModel(provider, modelId): Promise<ModelInfo | null>

ai.syncModels(): Promise<SyncResult>

Usage Tracking

ai.getUsage(options?): UsageSummary

Lifecycle

ai.registerProvider(provider): void

ai.dispose(): Promise<void>

NoosphereResult

Providers In Depth

Pi-AI — LLM Gateway (246+ models)

Anthropic Models (19)

OpenAI Models (24)

Google Gemini Models (19)

xAI Grok Models (20)

Groq Models (15)

Cerebras Models (3)

Zai Models (5)

OpenRouter (141 models)

The Pi-AI Engine — Deep Dive

How Pi Keeps 200+ Models Updated

4 API Protocols — How Pi Talks to Every Provider

Agentic Capabilities

1. Tool Use / Function Calling

2. Reasoning / Extended Thinking

3. Vision / Multimodal Input

`new Noosphere(config?)`

`ai.chat(options): Promise<NoosphereResult>`

`ai.stream(options): NoosphereStream`

`ai.image(options): Promise<NoosphereResult>`

`ai.video(options): Promise<NoosphereResult>`

`ai.speak(options): Promise<NoosphereResult>`

`ai.getProviders(modality?): Promise<ProviderInfo[]>`

`ai.getModels(modality?): Promise<ModelInfo[]>`

`ai.getModel(provider, modelId): Promise<ModelInfo | null>`

`ai.syncModels(): Promise<SyncResult>`

`ai.getUsage(options?): UsageSummary`

`ai.registerProvider(provider): void`

`ai.dispose(): Promise<void>`

5. The `streamFn` Pattern — Injectable Middleware