@percepta/kaizen

v0.8.1

Published

18 hours ago

Automated AI researcher that improves AI systems

0High
0Medium
0Low

darryqueen

tmathew0309-percepta

percepta-service

ai cli evals kaizen langfuse

Kaizen

Kaizen is an agentic eval platform for AI systems. It helps a coding agent create a system definition, curate a Langfuse-backed dataset, write an eval script, run a baseline, and iterate on variants while Kaizen records scored runs under kaizen/.kaizen/runs/.

Install In A Target Repo

npm install -g @percepta/kaizen
kaizen init
kaizen guide
kaizen create system <system-id>
kaizen create view <system-id> --type trace
kaizen create view <system-id> --type dataset-item
kaizen run --system <system-id> --variant baseline --diagnostic --hypothesis "starting baseline"
kaizen studio

For one-off use:

npx @percepta/kaizen init

Kaizen is installed inside the customer repo. The customer-owned footprint is intentionally small:

kaizen/config.ts
kaizen/systems/<system-id>/system.md
kaizen/systems/<system-id>/eval.py|ts
optional kaizen/systems/<system-id>/trace.tsx
optional kaizen/systems/<system-id>/dataset-item.tsx
optional kaizen/systems/<system-id>/rubric.md
kaizen/.kaizen/runs/

Package-owned agent guidance is printed with kaizen guide. Customer-specific durable notes belong in kaizen/systems/<system-id>/system.md; Kaizen does not create repo-level agent markdown such as KAIZEN.md, AGENTS.md, or CLAUDE.md.

Lifecycle

Run kaizen init once in the target repo.
Run kaizen create system <system-id> and fill in kaizen/systems/<system-id>/system.md.
Use Studio Data to create or select a Langfuse dataset, add useful source traces, and label dataset items.
Replace kaizen/systems/<system-id>/eval.py|ts with a real eval that reads the dataset named by dataset_version.
Run a diagnostic baseline, then a full baseline.
Run variants with kaizen run, inspect kaizen log, and use Studio to compare runs and failures.

The eval script emits NDJSON events to --out-fd; the runner owns process supervision, kaizen/.kaizen/runs/, crash recording, and automatic promotion. For Langfuse-backed evals, the eval should also link each dataset item to the fresh trace generated by that run and write the primary metric as a trace score.

Custom Views

Custom views are plain React components co-located with the system:

kaizen create view <system-id> --type trace
kaizen create view <system-id> --type dataset-item

trace.tsx receives the full Langfuse trace payload plus actions for writing scores. dataset-item.tsx receives the dataset item, the linked source trace when available, and actions for updating the dataset item or linking run items. Browser-side credentials are not required; Studio proxies the write actions through local API routes.

Run kaizen guide views for the exact prop and action interfaces.

Developing This Repo

pnpm install
pnpm --filter @percepta/kaizen dev:studio

This starts Studio at http://localhost:6789 against examples/demo-workspace, a local fixture for package development. The CLI lives in src/; the bundled Next.js Studio lives in dashboard/.

Useful scripts:

| Script | What it does | | ------------------------------------------- | ---------------------------------- | | pnpm --filter @percepta/kaizen dev:studio | Start Studio with the demo fixture | | pnpm --filter @percepta/kaizen dev:next | Start only the Next.js dev server | | pnpm --filter @percepta/kaizen typecheck | Typecheck the package | | pnpm --filter @percepta/kaizen test | Run package tests |

Environment

Create .env.local in the workspace repo root:

LANGFUSE_HOST=https://...
LANGFUSE_PUBLIC_KEY=pk-lf-...
LANGFUSE_SECRET_KEY=sk-lf-...
LINEAR_API_KEY=lin_api_...
LINEAR_TEAM_KEY=ENG

Langfuse credentials power the Data surface and custom view actions. LINEAR_API_KEY and LINEAR_TEAM_KEY power kaizen ideas --system <id>.

System Ideas configuration should use a stable Linear project URL or ID in system.md:

linear_project: https://linear.app/<workspace>/project/<project-slug>

Publishing

Publishing @percepta/kaizen to npm is automated with Changesets. For changes that affect the published package, add a changeset:

pnpm changeset

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme