@percepta/kaizen
v0.8.1
Published
Automated AI researcher that improves AI systems
Readme
Kaizen
Kaizen is an agentic eval platform for AI systems. It helps a coding agent create a system definition, curate a Langfuse-backed dataset, write an eval script, run a baseline, and iterate on variants while Kaizen records scored runs under kaizen/.kaizen/runs/.
Install In A Target Repo
npm install -g @percepta/kaizen
kaizen init
kaizen guide
kaizen create system <system-id>
kaizen create view <system-id> --type trace
kaizen create view <system-id> --type dataset-item
kaizen run --system <system-id> --variant baseline --diagnostic --hypothesis "starting baseline"
kaizen studioFor one-off use:
npx @percepta/kaizen initKaizen is installed inside the customer repo. The customer-owned footprint is intentionally small:
kaizen/config.tskaizen/systems/<system-id>/system.mdkaizen/systems/<system-id>/eval.py|ts- optional
kaizen/systems/<system-id>/trace.tsx - optional
kaizen/systems/<system-id>/dataset-item.tsx - optional
kaizen/systems/<system-id>/rubric.md kaizen/.kaizen/runs/
Package-owned agent guidance is printed with kaizen guide. Customer-specific durable notes belong in kaizen/systems/<system-id>/system.md; Kaizen does not create repo-level agent markdown such as KAIZEN.md, AGENTS.md, or CLAUDE.md.
Lifecycle
- Run
kaizen initonce in the target repo. - Run
kaizen create system <system-id>and fill inkaizen/systems/<system-id>/system.md. - Use Studio Data to create or select a Langfuse dataset, add useful source traces, and label dataset items.
- Replace
kaizen/systems/<system-id>/eval.py|tswith a real eval that reads the dataset named bydataset_version. - Run a diagnostic baseline, then a full baseline.
- Run variants with
kaizen run, inspectkaizen log, and use Studio to compare runs and failures.
The eval script emits NDJSON events to --out-fd; the runner owns process supervision, kaizen/.kaizen/runs/, crash recording, and automatic promotion. For Langfuse-backed evals, the eval should also link each dataset item to the fresh trace generated by that run and write the primary metric as a trace score.
Custom Views
Custom views are plain React components co-located with the system:
kaizen create view <system-id> --type trace
kaizen create view <system-id> --type dataset-itemtrace.tsx receives the full Langfuse trace payload plus actions for writing scores. dataset-item.tsx receives the dataset item, the linked source trace when available, and actions for updating the dataset item or linking run items. Browser-side credentials are not required; Studio proxies the write actions through local API routes.
Run kaizen guide views for the exact prop and action interfaces.
Developing This Repo
pnpm install
pnpm --filter @percepta/kaizen dev:studioThis starts Studio at http://localhost:6789 against examples/demo-workspace, a local fixture for package development. The CLI lives in src/; the bundled Next.js Studio lives in dashboard/.
Useful scripts:
| Script | What it does |
| ------------------------------------------- | ---------------------------------- |
| pnpm --filter @percepta/kaizen dev:studio | Start Studio with the demo fixture |
| pnpm --filter @percepta/kaizen dev:next | Start only the Next.js dev server |
| pnpm --filter @percepta/kaizen typecheck | Typecheck the package |
| pnpm --filter @percepta/kaizen test | Run package tests |
Environment
Create .env.local in the workspace repo root:
LANGFUSE_HOST=https://...
LANGFUSE_PUBLIC_KEY=pk-lf-...
LANGFUSE_SECRET_KEY=sk-lf-...
LINEAR_API_KEY=lin_api_...
LINEAR_TEAM_KEY=ENGLangfuse credentials power the Data surface and custom view actions. LINEAR_API_KEY and LINEAR_TEAM_KEY power kaizen ideas --system <id>.
System Ideas configuration should use a stable Linear project URL or ID in system.md:
linear_project: https://linear.app/<workspace>/project/<project-slug>Publishing
Publishing @percepta/kaizen to npm is automated with Changesets. For changes that affect the published package, add a changeset:
pnpm changeset