npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

apastra

v1.0.1

Published

PromptOps — version, evaluate, and ship AI prompts as disciplined software assets

Downloads

67

Readme

Apastra - Lightweight Prompt Evaluation

npm version GitHub stars Regression Gate License

Git-native prompt operations for teams that want to treat prompts like software assets.

Apastra keeps prompt specs, datasets, evaluators, suites, and baselines as plain files in your repo. Your coding agent reads those files to scaffold changes, run evals, compare against baselines, and validate contracts without requiring a hosted platform.

What Is This?

Apastra is a file-based protocol and skill pack for prompt engineering workflows.

| If you want to... | Apastra gives you... | |---|---| | Version prompts like code | YAML prompt specs with stable IDs, variables, and output contracts | | Test prompt behavior repeatedly | Datasets, evaluators, and suites stored in Git | | Catch quality regressions before shipping | Baselines, scorecards, and regression reports | | Stay local-first | Agent-driven workflows with optional GitHub Actions automation | | Keep things inspectable | Plain files, schema validation, and reviewable diffs |

Documentation

Quick Start

1. Install the skill pack

Two install paths — pick whichever fits your project.

Option A — Git clone (language-agnostic, recommended):

git clone --single-branch --depth 1 https://github.com/BintzGavin/apastra.git .agent/skills/apastra
.agent/skills/apastra/setup

Option B — npm:

npm install apastra

Either path installs to the same layout:

  • .agent/skills/apastra/ — SKILL.md instructions your agent loads
  • .agent/scripts/apastra/ — deterministic Python runtime + shell validators

The setup script auto-installs pyyaml and jsonschema (falls back to clear manual-install guidance on PEP-668 environments). npm's postinstall.sh does the same.

2. Scaffold your first prompt workflow

Ask your agent:

"Use the apastra-scaffold skill to create a prompt spec, dataset, evaluator, and suite for summarizing text"

You will get a repo-native setup like:

promptops/
├── prompts/summarize-v1.yaml
├── datasets/summarize-smoke.jsonl
├── evaluators/contains-keywords.yaml
└── suites/summarize-smoke.yaml

3. Run an eval

Ask your agent:

"Use the apastra-eval skill to run the summarize-smoke suite"

The agent loads the suite, renders the prompt for each case, calls the model, scores the outputs, and reports a scorecard.

Suite: summarize-smoke
Status: PASS

Metrics:
  keyword_recall: 0.85 (threshold: 0.60)

4. Save a baseline

Ask your agent:

"Use the apastra-baseline skill to set the current results as the baseline"

Future evals can now detect regressions automatically when prompt quality drops below the accepted threshold.

That is enough to start using apastra locally. CI and release automation are available when you want them, but they are not required to get value from the repo.

Note for AI agents: This README is the quickstart. For the full architectural model and design principles, start with docs/vision.md.

Included Skills

| Skill | What it does | |---|---| | apastra-getting-started | Project setup and onboarding walkthrough | | apastra-eval | Run evaluations from suites, score outputs, and compare baselines | | apastra-baseline | Establish and manage known-good baselines | | apastra-scaffold | Generate prompt specs, datasets, evaluators, and suites | | apastra-validate | Validate protocol files against JSON schemas | | apastra-red-team | Generate adversarial test cases | | apastra-setup-ci | Install the GitHub Actions workflows for regression gating and release |

All skills install together — there is no per-skill install path. Once installed under .agent/skills/apastra/, your agent discovers each sub-skill by its SKILL.md.

Core Concepts

Prompt Spec

A YAML file defining a prompt with a stable ID, input variables, a template, and an optional output contract.

id: summarize-v1
variables:
  text: { type: string }
template: "Summarize: {{text}}"

Dataset

A .jsonl file of test cases — one JSON object per line with a case_id and inputs.

{"case_id": "case-1", "inputs": {"text": "..."}, "expected_outputs": {"should_contain": ["key", "words"]}}

Evaluator

A scoring rule — deterministic checks, schema validation, or AI judge grading.

id: keyword-check
type: deterministic
metrics: [keyword_recall]

Inline Assertions (Quick Mode)

For simple checks, skip the evaluator file entirely — put assertions directly on your test cases:

{"case_id": "case-1", "inputs": {"text": "..."}, "assert": [{"type": "contains", "value": "summary"}, {"type": "is-json"}]}

Built-in assertion types: equals, contains, icontains, contains-any, contains-all, regex, starts-with, is-json, contains-json, similar, llm-rubric, factuality, latency, cost. Negate any with not- prefix (e.g. not-contains).

Quick Eval (Single File)

For rapid iteration, combine prompt + cases + assertions into one file (promptops/evals/my-eval.yaml):

id: summarize-quick
prompt: "Summarize in {{max_length}} words: {{text}}"
cases:
  - id: short
    inputs: { text: "The fox jumps over the dog.", max_length: "10" }
    assert:
      - type: icontains
        value: "fox"
thresholds:
  pass_rate: 1.0

Graduate to the full spec/dataset/evaluator/suite structure as complexity grows.

Suite

A test configuration that ties everything together: which datasets, which evaluators, which models.

id: smoke
name: Smoke Suite
datasets: [summarize-smoke]
evaluators: [keyword-check]
model_matrix: [default]
thresholds: { keyword_recall: 0.6 }

Baseline & Regression

A baseline is a saved scorecard from a passing run. Future evals compare against it. If quality drops beyond allowed thresholds, it's a regression.

File Structure

In your project (after install)

.agent/
├── skills/apastra/       # Agent-facing SKILL.md files (eval, baseline, scaffold, …)
└── scripts/apastra/      # Deterministic runtime (Python + shell validators)
promptops/                # Created by the scaffold skill on first use
├── prompts/              # Prompt specs (YAML)
├── datasets/             # Test cases (JSONL)
├── evaluators/           # Scoring rules (YAML)
├── suites/               # Test configurations (YAML)
└── policies/             # Regression policies (allowed thresholds)
derived-index/
├── baselines/            # Known-good scorecards
└── regressions/          # Regression reports

In this repo (what gets shipped)

promptops/ here contains the runtime source that lands in your project's .agent/scripts/apastra/ at install time — schemas, validators, resolver, runs, harnesses. You do not copy this directory into your project directly; setup / postinstall.sh does that.

How the Agent Runs Evals

Your IDE agent is the harness. When you ask it to run an eval:

flowchart TD
  A[Read suite spec] --> B[Load dataset cases + evaluators]
  B --> C[For each case: render prompt template]
  C --> D[Call the model with rendered prompt]
  D --> E[Score output using evaluators]
  E --> F[Aggregate into scorecard]
  F --> G{Baseline exists?}
  G -->|Yes| H[Compare against baseline]
  G -->|No| I[Report scorecard only]
  H --> J[Regression report: PASS/FAIL]

Deterministic steps (prompt rendering, digest computation, scorecard normalization, baseline comparison, schema validation) are delegated to Python + shell scripts under .agent/scripts/apastra/. Your agent handles the LLM-dependent parts: calling the model and grading with judge evaluators. No hosted service, no SaaS dependency — just files, scripts, and your agent.


Scaling Up (Optional)

When you're ready for more structure, apastra supports:

GitHub Actions CI

Five pre-built workflows gate merges and automate promotions:

| Workflow | Trigger | What it does | |---|---|---| | regression-gate.yml | Pull requests | Blocks merge if regression is detected | | promote.yml | Manual or release publish | Creates append-only promotion records | | deliver.yml | After promotion | Syncs approved versions to delivery targets | | immutable-release.yml | Tag push | Creates immutable GitHub releases | | auto-merge.yml | CI pass | Auto-merges PRs that pass all checks |

Git-First Consumption

Apps can pin prompts by commit SHA, tag, or semver — npm and pip both support Git dependencies natively:

# consumption.yaml
version: "1.0"
prompts:
  summarize-v1:
    pin: "abc123"  # commit SHA, tag, or semver

Resolution order: local override → workspace → git ref → packaged artifact.

Governed Releases

| Packaging | When to use | |---|---| | Git ref (tag/SHA) | Default — zero publishing overhead | | GitHub Release asset | Governed releases with optional immutability | | OCI artifact | Org-wide digest-addressed distribution |


Principles

  • Files in Git are the source of truth — not a database, not a platform
  • Your agent is the harness — no framework lock-in
  • Append-only artifacts — never mutate old results; create new records
  • Reproducibility by default — content digests, environment metadata
  • Local-first, CI-optional — start with zero infrastructure

Planned Expansions

| Skill / Feature | What it does | |---|---| | apastra-audit | Scans your codebase for hardcoded, untested prompts and reports "prompt debt" — proves value in 60 seconds on an existing project | | apastra-drift | Canary suites that run on a schedule to catch post-ship quality erosion when model providers update silently | | apastra-compare | Multi-model evaluation — run a suite against N models and get a cost/quality/latency comparison scorecard | | apastra-review | "Paranoid staff prompt engineer" — reviews prompt specs for ambiguity, injection surface, variable hygiene, and output contract completeness | | apastra-optimize | Analyzes token usage, suggests prompt compression, estimates cost reduction | | Community prompt packs | Curated starter packs (summarization, extraction, classification, code review) installable as git dependencies with pre-built baselines | | Observability adapters | Lightweight bridges to emit run artifacts to Langfuse, OpenTelemetry, and other existing observability systems |

Planned Refinements

  • Simplified minimal mode — auto-detected when ≤3 prompt specs exist; only prompts/, evals/, and baselines/ directories
  • Project-level configpromptops.config.yaml for default model, temperature, thresholds, and auto-baseline behavior
  • MCP integration — support MCP tool definitions in prompt specs and provide an MCP server adapter for agent discovery
  • First-class cost tracking — total cost in every run manifest, cost delta in regression reports, cost_budget field on suites
  • Approachable terminology — "your agent" everywhere user-facing; "harness" reserved for technical specs

License

Apache-2.0