@nopixie/crucible
v1.0.0
Published
Code-first LLM evaluation & red-teaming framework
Downloads
65
Readme
@nopixie/crucible
Code-first LLM evaluation & red-teaming for engineers who want to know if their prompts hold up.
Crucible is a single npm package — a library plus a CLI — for writing evals and red-team checks as code, running them, and seeing whether your prompts survive contact with adversarial inputs. No dashboard to babysit, no separate service: your evals live next to your code and run in your toolchain.
Status
Stable (1.0.0). The public API follows semver — breaking changes ship as a
major version bump. See docs/usage.md for
what's implemented today versus still landing.
Install
npm install @nopixie/crucibleRequires Node.js 20+ and an ESM project (
"type": "module").
Quick start
// greeting.eval.ts
import { createOllamaProvider, describe, expect, test } from '@nopixie/crucible';
const provider = createOllamaProvider({ model: 'llama3.1' });
describe('greeting', () => {
test('responds with a greeting', async () => {
const { output } = await provider.callApi('Say hello in one sentence.');
expect(output).toMatchRegex(/hello/i);
});
});npx crucible run greeting.eval.tsEval files (*.eval.ts) are discovered automatically; crucible run exits
non-zero on any failure, so it drops into CI as a guardrail.
See docs/usage.md for the full guide: writing suites, deterministic and model-graded assertions, providers (Anthropic, OpenAI, Bedrock, Ollama, custom), response caching, the CLI reference, and current API limitations.
Examples
Runnable examples live in examples/:
Contributing
See CONTRIBUTING.md for the development workflow, branch and commit conventions, and the MR checklist.
Development
npm install # install dependencies
npm run type-check # tsc --noEmit
npm run lint # eslint
npm run format # prettier --write
npm test # vitest
npm run build # emit dist/Releasing
Versioning and npm publishing use Changesets, automated on GitLab CI. Any MR that changes published behaviour must include a changeset:
npm run changeset # pick a bump type + write a summary, then commit the fileOn merge to main, CI consumes the changesets (bumping the version and updating
CHANGELOG.md), tags the release, and publishes to npm automatically. Full
details: .claude/rules/publishing.md.
License
MIT
