@karaoke-cms/enrich

v0.20.10

Published

a month ago

AI enrichment pipeline for karaoke-cms

0High
0Medium
0Low

krishanm

astro cms ai enrichment

@karaoke-cms/enrich

AI enrichment pipeline for karaoke-cms. Reads published vault content and writes AI-generated metadata (reading_time, tags, description, related) back into frontmatter.

Usage

CLI

# Enrich all published files in your vault
KARAOKE_VAULT=./my-vault karaoke-enrich

# Enrich only specific files (e.g. from a git pre-commit hook)
karaoke-enrich --staged path/to/post.md another/post.md

Library

import { run, defineProvider } from '@karaoke-cms/enrich'

// Use a built-in provider (reads API key from env)
const result = await run({ vault: '/absolute/path/to/vault' })
console.log(`Enriched ${result.enriched} files, wrote ${result.written}`)

// Use a custom provider — no API key required
const myProvider = defineProvider({
  async enrich(body) {
    return { tags: ['custom'], description: 'Custom description.' }
  }
})

const result = await run({ vault: '/path/to/vault', provider: myProvider })

Configuration

All options can be passed to run() or set via environment variables:

| Option | Env var | Default | |---|---|---| | vault | KARAOKE_VAULT | — (required) | | provider | ENRICH_PROVIDER | 'openai' | | maxEnrich | MAX_ENRICHMENTS_PER_RUN | 20 | | relatedMax | RELATED_MAX | 3 | | dryRun | DRY_RUN | false | | cachePath | ENRICH_CACHE_PATH | {cwd}/.enrich-cache.json | | stagedFiles | — | null (all files) | | config | — | undefined (library only unless you pass it) |

Karaoke config (CLI and library)

When you run karaoke-enrich from your Astro project root, the CLI tries to load karaoke.config.ts, then karaoke.config.mjs, then karaoke.config.js, and passes the default export to run({ config }). That keeps enrichment scan paths aligned with your site: custom blog() folder (meta.folder), extra docs() collection directories, etc.

If no config file exists or loading fails, enrich falls back to scanning only {vault}/blog and {vault}/docs (same as before).

In code, pass config explicitly:

// Pass the same object shape as karaoke.config.ts exports (e.g. from your own loader).
await run({ vault: '/path/to/vault', config: { modules: [...] }, provider: myProvider })

What it does

For each published Markdown file that hasn't been enriched yet (or whose body has changed):

Title — generates a title for untitled documents, writes title
Reading time — counts words, writes reading_time in minutes
Tags — asks the AI for 3–6 tags, writes tags array
Description — asks the AI for a 20–30 word summary, writes description
Related posts — computes tag overlap across all posts, writes related (slugs)

Results are cached in .enrich-cache.json (gitignored) — unchanged files are skipped. A per-run cap prevents runaway API costs.

Privacy gate: only processes files with publish: true. Private vault notes are never sent to an AI provider.

Custom providers

Any object with an enrich(body) method works as a provider:

import { run, defineProvider } from '@karaoke-cms/enrich'

const llamaProvider = defineProvider({
  async enrich(body) {
    const result = await callYourLlama(body)
    return { tags: result.keywords, description: result.summary }
  }
})

await run({ vault: '/path/to/vault', provider: llamaProvider })

Built-in providers (openai, anthropic) are also importable directly:

import { enrich } from '@karaoke-cms/enrich/providers/openai'
import { enrich, parseEnrichResponse } from '@karaoke-cms/enrich/providers/anthropic'

Pre-commit hook

Add to .githooks/pre-commit (or wherever your project configures hooks):

#!/usr/bin/env zsh
staged_md=("${(@f)$(git diff --cached --name-only | grep '\.md$' || true)}")
[[ ${#staged_md[@]} -eq 0 ]] && exit 0
[[ -z "${OPENAI_API_KEY:-}" && -z "${ANTHROPIC_API_KEY:-}" ]] && exit 0

pnpm exec karaoke-enrich --staged "${staged_md[@]}" || true
git add "${staged_md[@]}" .enrich-cache.json

What's new in 0.18.2

No user-facing changes in this release.

What's new in 0.18.0

AI-generated titles -- untitled documents now receive an AI-generated title in frontmatter, so every published page has a meaningful heading without manual effort.
Shorter descriptions -- the description field now targets 20--30 words instead of longer summaries, producing tighter copy for meta tags and link previews.

What's new in 0.17.0

No user-facing changes in this release.

Return value

run() returns a structured result:

{
  enriched: number   // files where the AI provider was called
  written:  number   // files written to disk
  skipped:  number   // files skipped (cache hit, no change needed)
  errors:   Array<{ path: string, error: Error }>
}

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@karaoke-cms/enrich

Usage

CLI

Library

Configuration

Karaoke config (CLI and library)

What it does

Custom providers

Pre-commit hook

What's new in 0.18.2

What's new in 0.18.0

What's new in 0.17.0

Return value