ai-image

v0.0.9

Published

22 days ago

Unified image generation + CLIP vision analysis. 11 providers (OpenAI, Replicate, Stability, FAL, Together, BFL, Google, Fireworks, Ollama, local, OKSLOP). Zero-shot classification, embeddings, and stock photo search.

ai-image

Unified image generation + CLIP vision analysis. 10 providers, local Apple Silicon support, CLI and programmatic API.

Features

Image Generation — OpenAI, Replicate, Stability AI, FAL.ai, Together AI, BFL, Google Imagen, Fireworks, Ollama, and local (MFLUX on Apple Silicon)
CLIP Classification — zero-shot image classification using Apple's MobileCLIP-S1 (local, no API key)
CLIP Embeddings — 512-dim image and text embeddings for similarity search, clustering, etc.
Stock Photo Search — search AI-generated stock photos via OKSLOP (no API key required)
CLI + API — use from the command line or as a TypeScript library
Auto-managed local server — Python server auto-installs and auto-starts on first local/CLIP call, sleeps after idle

Installation

npm install ai-image

Or use directly with npx:

npx ai-image generate --prompt "A sunset over mountains"
npx ai-image classify --image photo.jpg --labels cat dog bird
npx ai-image search "sunset over mountains"

CLIP: Classify & Embed (local, no API key)

CLIP features use Apple's MobileCLIP-S1 — 4.8x faster than standard CLIP, runs locally on Mac. The model (~85MB) downloads automatically on first use.

CLI

# Zero-shot classification
ai-image classify --image photo.jpg --labels cat dog bird
#  cat                  95.2%  █████████████████████████████
#  dog                   3.1%  █
#  bird                  1.7%

# Save classification to JSON
ai-image classify --image photo.jpg --labels cat dog bird -o result.json

# Get image embedding (512 dims)
ai-image embed --image photo.jpg
ai-image embed --image photo.jpg -o embedding.json

# Get text embedding
ai-image embed --text "a photo of a sunset"

# JSON output (for scripting)
ai-image classify --image photo.jpg --labels cat dog --json
ai-image embed --image photo.jpg --json

Programmatic API

import { classifyImage, embedImage, embedText } from "ai-image";

// Zero-shot classification
const result = await classifyImage("photo.jpg", ["cat", "dog", "bird"]);
// { labels: [{ label: "cat", score: 0.95 }, ...], elapsed: 0.06 }

// Image embedding
const { embedding } = await embedImage("photo.jpg");
// embedding: number[] (512 dims, normalized)

// Text embedding
const { embedding: textEmb } = await embedText("a sunset over the ocean");

// Cosine similarity (embeddings are pre-normalized, so dot product = cosine sim)
const similarity = embedding.reduce((sum, v, i) => sum + v * textEmb[i], 0);

Or using the class API:

import { ImageGenerator } from "ai-image";

const gen = new ImageGenerator({ provider: "local" });

const result = await gen.classifyImage({
  imagePath: "photo.jpg",
  labels: ["cat", "dog", "bird"],
});

const { embedding } = await gen.embed({ imagePath: "photo.jpg" });
const { embedding: textEmb } = await gen.embed({ text: "a cat" });

Image Generation

CLI

# OpenAI (default)
ai-image generate --prompt "A sunset over mountains"

# Local Apple Silicon (no API key needed)
ai-image generate --prompt "A robot in a garden" -p local

# Other providers
ai-image generate --prompt "Cyberpunk city" -p replicate
ai-image generate --prompt "Abstract art" -p stability
ai-image generate --prompt "Ocean waves" -p fal
ai-image generate --prompt "Forest path" -p together
ai-image generate --prompt "Mountain lake" -p bfl
ai-image generate --prompt "Space station" -p google
ai-image generate --prompt "Desert sunset" -p fireworks
ai-image generate --prompt "City skyline" -p ollama

# Options
ai-image generate --prompt "..." --size 1024x1024 --quality high --format png
ai-image generate --prompt "..." --seed 42 --steps 20 --guidance-scale 7.5
ai-image generate --prompt "..." -o output.png --json

Programmatic API

import { generateImage, ImageGenerator } from "ai-image";

// One-shot
const result = await generateImage("A sunset over mountains", {
  provider: "openai",
  size: "1024x1024",
});
console.log(result.filePaths);

// Class API (reuse client)
const gen = new ImageGenerator({
  provider: "openai",
  outputDir: "./images",
});
const result = await gen.generate({
  prompt: "A futuristic city",
  quality: "high",
  n: 2,
});

Metadata Sidecar Files

Generate a .metadata.json file alongside each image with generation parameters and CLIP data. Works with any provider — the local CLIP server auto-starts when needed.

CLI

# Basic metadata (generation params + CLIP embedding)
ai-image generate --prompt "A cat on a roof" -p openai --metadata

# With CLIP classification labels
ai-image generate --prompt "A cat on a roof" -p openai --metadata --metadata-labels cat dog building sky

# Works with any provider
ai-image generate --prompt "A sunset" -p local --metadata --metadata-labels landscape nature urban
ai-image generate --prompt "A sunset" -p replicate --metadata

Programmatic API

const result = await generator.generate({
  prompt: "A cat on a roof",
  metadata: true,
  metadataLabels: ["cat", "dog", "building", "sky"],
});
// Writes: A_cat_on_a_roof.metadata.json alongside A_cat_on_a_roof.png

Output format

{
  "image": "A_cat_on_a_roof.png",
  "provider": "openai",
  "model": "gpt-image-1",
  "prompt": "A cat on a roof",
  "size": "1024x1024",
  "seed": 42,
  "elapsed": 5200,
  "generatedAt": "2026-03-15T04:53:52.191Z",
  "clip": {
    "embedding": [0.01, -0.03, "... 512 dims"],
    "labels": [
      { "label": "cat", "score": 0.82 },
      { "label": "building", "score": 0.15 },
      { "label": "sky", "score": 0.02 },
      { "label": "dog", "score": 0.01 }
    ]
  }
}

clip.embedding — 512-dim normalized vector (always included when local server is available)
clip.labels — only included when --metadata-labels / metadataLabels is provided
If the local server can't start (no Python, not macOS), metadata is still written without the clip field

Stock Photo Search (OKSLOP)

Search AI-generated stock photos from okslop.com. Works without an API key (30 req/min, 500/day). Register at okslop.com/developers for higher limits.

CLI

# Basic search
ai-image search "sunset over mountains"

# With options
ai-image search "office workspace" --per-page 20 --orientation landscape
ai-image search "nature" --sort popular --page 2

# JSON output (for scripting)
ai-image search "cats" --json

# Save to file
ai-image search "abstract art" -o results.json

# With API key for higher rate limits
ai-image search "food photography" -k fk_live_your_key_here

Programmatic API

import { searchPhotos } from "ai-image";

// Basic search (no API key needed)
const result = await searchPhotos("sunset over mountains");
console.log(result.total); // total matches
for (const photo of result.results) {
  console.log(photo.urls.regular); // image URL
  console.log(photo.description);  // photo description
  console.log(photo.user.name);    // contributor name
}

// With options
const result = await searchPhotos("nature", {
  perPage: 20,
  orientation: "landscape",
  sort: "popular",
  apiKey: "fk_live_...", // optional, or set OKSLOP_API_KEY env var
});

Setup

API Keys

Create a .env file or set environment variables:

OPENAI_API_KEY=...          # OpenAI
REPLICATE_API_TOKEN=...     # Replicate
STABILITY_API_KEY=...       # Stability AI
FAL_KEY=...                 # FAL.ai
TOGETHER_API_KEY=...        # Together AI
BFL_API_KEY=...             # Black Forest Labs
GOOGLE_API_KEY=...          # Google Imagen
FIREWORKS_API_KEY=...       # Fireworks AI
OKSLOP_API_KEY=...          # Stock photo search (optional, works without key)
# Ollama and local need no API key

Local Server (Apple Silicon)

The local provider and all CLIP features use a Python server that runs locally on Apple Silicon. It auto-installs to ~/.ai-image/ on first use (requires Python 3.11+).

# Manual start (optional — auto-starts when needed)
~/.ai-image/server/.venv/bin/ai-image-server

# Clean reinstall
rm -rf ~/.ai-image

Supported Providers

| Provider | Models | API Key | |----------|--------|---------| | OpenAI | GPT Image 1, DALL-E 2 | OPENAI_API_KEY | | Replicate | SDXL, Flux, SD3, any model | REPLICATE_API_TOKEN | | Stability AI | SD 3.5 Large/Turbo | STABILITY_API_KEY | | FAL.ai | Flux Dev/Schnell/Pro | FAL_KEY | | Together AI | Flux Schnell (free), SDXL | TOGETHER_API_KEY | | BFL | Flux Pro 1.1/Dev | BFL_API_KEY | | Google | Imagen 4/4 Fast/4 Ultra | GOOGLE_API_KEY | | Fireworks | Flux Schnell FP8 | FIREWORKS_API_KEY | | Ollama | Flux 2 Klein, Z-Image | none (local) | | Local | FLUX.2 Klein 4B (MFLUX) | none (Apple Silicon) | | OKSLOP | Stock photo search | OKSLOP_API_KEY (optional) |

Run ai-image models or ai-image models --json for the full list.

TypeScript Types

All types are exported:

import type {
  // Generation
  ImageGenerator,
  ImageGeneratorOptions,
  GenerateOptions,
  GenerateResult,
  Provider,
  // CLIP
  ClassifyOptions,
  ClassifyResult,
  EmbedOptions,
  EmbedResult,
  // Search
  SearchOptions,
  SearchResult,
  SearchPhoto,
} from "ai-image";

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

ai-image

Features

Installation

CLIP: Classify & Embed (local, no API key)

CLI

Programmatic API

Image Generation

CLI

Programmatic API

Metadata Sidecar Files

CLI

Programmatic API

Output format

Stock Photo Search (OKSLOP)

CLI

Programmatic API

Setup

API Keys

Local Server (Apple Silicon)

Supported Providers

TypeScript Types

License