@polytts/core

v0.1.2

Published

2 months ago

Core runtime and adapter contracts for polytts.

Downloads

230

0High
0Medium
0Low

dengqing

adapter onnx polytts runtime text-to-speech tts

@polytts/core

Core runtime and adapter contracts for polytts.

Use this package when you need custom adapters, custom catalogs, or direct runtime orchestration. Most app code should start with polytts, @polytts/browser, or @polytts/node instead.

Install

npm install @polytts/core

Usage

import { createTTSRuntime } from "@polytts/core";
import { officialAdapters } from "@polytts/browser-adapters";
import { officialCatalog } from "@polytts/presets";

const runtime = createTTSRuntime({
  adapters: officialAdapters,
  catalogs: [officialCatalog],
  initialModelId: "browser-speech",
});

await runtime.prepare("browser-speech");
await runtime.speak("Hello from the core runtime.");

Custom models

There are two ways to extend polytts with new models.

Add a model to an existing family

If the model fits one of the built-in adapters (Piper, Kokoro, KittenTTS, Supertonic), add a catalog entry and keep using the official adapter. This is the normal path for another Piper bundle, KittenTTS checkpoint, or any ONNX export that matches an existing adapter contract.

import { createStaticCatalog, type ModelSpec } from "@polytts/core";
import { createBrowserTTSRuntime } from "@polytts/browser";

const myPiperVoice: ModelSpec = {
  id: "piper-en_US-custom-medium",
  adapterId: "piper",
  name: "Piper Custom",
  family: "piper",
  revision: "v1.0.0",
  license: "mit",
  languages: ["en-US"],
  voiceMode: "per-voice-model",
  distribution: {
    kind: "managed-assets",
    sizeBytes: 63_000_000,
    assets: [
      {
        name: "en_US-custom-medium.onnx",
        url: "https://example.com/en_US-custom-medium.onnx",
        size: 63_000_000,
      },
      {
        name: "en_US-custom-medium.onnx.json",
        url: "https://example.com/en_US-custom-medium.onnx.json",
        size: 8_192,
      },
    ],
  },
  voices: [
    {
      id: "en_US-custom-medium",
      name: "Custom",
      language: "en-US",
    },
  ],
  defaultVoiceId: "en_US-custom-medium",
};

const runtime = createBrowserTTSRuntime({
  extraCatalogs: [createStaticCatalog([myPiperVoice])],
});

Guidelines

Use the built-in adapter id (piper, kokoro, kitten, or supertonic)
Keep family stable if the model should appear in the same grouped family in the simple API
Use voiceMode: "per-voice-model" when each downloadable bundle is a separate voice model
Use voiceMode: "multi" when one model exposes multiple voices internally
For per-voice-model families such as Piper, switch variants by model id rather than assuming one shared multi-voice model

Add a new runtime family

If the model needs a different inference path, create a new adapter and register it alongside the built-in ones.

Model instance

Every adapter creates model instances. There are two kinds:

SynthesizingModelInstance (kind: "synthesizing") — returns audio data via generate(). Optionally supports stream() for incremental chunks. This is the most common type.
SpeakingModelInstance (kind: "speaking") — plays audio directly via speak() (e.g. the Web Speech API). No audio data is returned.

import { type ModelSpec, type SynthesizingModelInstance } from "@polytts/core";

class MyModel implements SynthesizingModelInstance {
  readonly kind = "synthesizing" as const;
  readonly modelId: string;
  readonly adapterId: string;

  constructor(private readonly spec: ModelSpec) {
    this.modelId = spec.id;
    this.adapterId = spec.adapterId;
  }

  async load(_signal: AbortSignal): Promise<void> {
    // Initialize your runtime here (load ONNX, WASM, etc.).
  }

  async generate(
    text: string,
    _voiceId: string,
    _signal: AbortSignal,
  ): Promise<{ sampleRate: number; channels: Float32Array[] }> {
    // Return PCM audio data for the synthesized speech.
    throw new Error(`Not implemented for: ${text}`);
  }

  listVoices() {
    return this.spec.voices ?? [];
  }

  dispose(): void {}
}

Adapter

The adapter tells the runtime how to create model instances and whether it can run on the current platform.

import {
  createStaticCatalog,
  createTTSRuntime,
  type TTSAdapter,
  type SynthesizingModelInstance,
} from "@polytts/core";
import { officialAdapters, officialCatalog } from "@polytts/browser";

const myAdapter: TTSAdapter<SynthesizingModelInstance> = {
  id: "my-runtime",
  name: "My Runtime",
  isSupported: (_spec) => typeof Worker !== "undefined",
  createModel(spec) {
    return new MyModel(spec);
  },
};

Catalog

| distribution.kind | Meaning | | ------------------- | --------------------------------------------------- | | "managed-assets" | Runtime downloads and caches assets automatically | | "adapter-managed" | Adapter owns download logic via custom install() | | "none" (or omit) | No downloadable assets needed |

const myCatalog = createStaticCatalog([
  {
    id: "my-runtime-v1",
    adapterId: "my-runtime",
    name: "My Runtime V1",
    family: "my-runtime",
    revision: "v1",
    license: "custom",
    languages: ["en-US"],
    voiceMode: "single",
    distribution: { kind: "none" },
  },
]);

Wire it up

const runtime = createTTSRuntime({
  adapters: [...officialAdapters, myAdapter],
  catalogs: [officialCatalog, myCatalog],
});

ModelSpec reference

Required fields for every ModelSpec:

| Field | Type | Description | | ----------- | ------------------------------------------ | ----------------------------------------- | | id | string | Unique model identifier | | adapterId | string | Which adapter runs this model | | name | string | Human-readable display name | | family | string | Grouping key (e.g. "piper", "kokoro") | | revision | string | Version string for cache invalidation | | license | string | SPDX license identifier | | languages | string[] | BCP-47 language codes | | voiceMode | "single" \| "multi" \| "per-voice-model" | How the model handles voices |

Optional fields:

| Field | Type | Description | | ---------------- | ------------------------- | ------------------------------------------ | | distribution | ModelDistribution | Asset delivery strategy (see above) | | voices | Voice[] | Pre-declared voice list | | defaultVoiceId | string | Default voice to select | | requirements | ModelRequirements | Runtime needs (wasm, webgpu, worker) | | config | Record<string, unknown> | Adapter-specific configuration | | description | string | Model description | | homepage | string | Project URL | | tags | string[] | Searchable tags |

Exports

Key exports from @polytts/core:

createTTSRuntime — create a low-level runtime
createStaticCatalog — create a catalog from a model array
MemoryAssetStore — in-memory asset store (useful for testing)
Types: TTSRuntime, TTSAdapter, TTSModelInstance, SynthesizingModelInstance, SpeakingModelInstance, ModelSpec, ModelCatalog, AssetStore, AudioData, Voice

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@polytts/core

Install

Usage

Custom models

Add a model to an existing family

Guidelines

Add a new runtime family

Model instance

Adapter

Catalog

Wire it up

ModelSpec reference

Exports