npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@adia-ai/a2ui-corpus

v0.6.47

Published

AdiaUI A2UI training corpus — canonical v0.9 catalog + chunks + eval fixtures + feedback + gap registry. Consumed by the compose engine's retrieval layer + the MCP pipeline.

Downloads

17,164

Readme

@adia-ai/a2ui-corpus

Corpus and operational-learning artifacts for the gen-UI pipeline — chunks (the canonical retrieval surface), feedback, gaps, eval fixtures. Pure data plus the scripts that maintain it. No runtime.

The pipeline reads this package; this package never reads the pipeline. See @adia-ai/a2ui-compose for engine code, @adia-ai/web-components for UI atoms, @adia-ai/a2ui-runtime for the A2UI runtime (renderer, registry, streams, wiring), @adia-ai/a2ui-mcp for the MCP server.

Install

npm install @adia-ai/a2ui-corpus

Pure data — typically consumed transitively by @adia-ai/a2ui-compose and @adia-ai/a2ui-mcp, which list it as a runtime dependency. Direct installs are useful for offline eval tooling or building bespoke retrieval layers.

Dependency direction

a2ui-compose   ──reads──▶  a2ui-corpus
a2ui-mcp       ──reads──▶  a2ui-compose, a2ui-corpus
web-components ──used-by──▶  apps/, playgrounds/, catalog/ (chunk sources)

No back-writes. No circular reads. Web-components ships UI atoms only; corpus lives here; runtime lives in a2ui-compose. The chunk pipeline harvests data-chunk-tagged regions from the apps that consume web-components, so the read-direction stays one-way.

Glossary

The corpus has converged on one retrievable concept: the chunk.

| Term | Source of truth | Granularity | What it carries | Engine that consumes it | | --- | --- | --- | --- | --- | | chunk | chunks/<id>.json (one file per chunk) + _index.json | A single labeled HTML region, harvested via data-chunk markers | html, intent, domain, kind (block / page / panel), keywords, metadata (when annotated), template (transpiled A2UI tree, when annotated) | chunk-zettel + monolithic-pro + zettel (composition-library wraps annotated chunks) |

Annotated vs raw chunks: every chunk has source/page provenance, but only chunks carrying data-chunk-{domain,description,keywords,kind} attributes on their source HTML become retrievable as compositions (the harvester's transpile pass produces a template + lifts the metadata block onto them). Raw chunks remain as substrate for nested-expand reference resolution but don't compete in retrieval.

Grounding rule (locked v0.4.6, enforced v0.4.7): every retrievable unit MUST trace to a real page under site/pages/, apps/, playgrounds/, or catalog/ via its data-chunk-* annotations. No hand-authored ungrounded JSON lives in corpus/.

Historical note: Fragments (atomic A2UI sub-trees with named slots, §37, 2026-05-12), patterns (hand-authored full-canvas A2UI templates, patterns/ dir), and compositions (hand-authored multi-section A2UI surfaces, compositions/ dir) were earlier corpus formats. Fragments retired in v0.4.4 (§37). Patterns + compositions retired in v0.4.7 (§72 — the v0.4.6 carryover of §65). Their retrievable equivalents were either (a) already represented in the chunks corpus via annotated data-chunk-* regions, or (b) deemed non-grounded under the locked rule and DELETE'd in §73 Mode-C triage (target for v0.4.7). See docs/journal/2026/05/2026-05-12.md §§ 36-§42, §65, §72 for the multi-arc retirement narrative.

Layout

a2ui/corpus/
├── chunks/                  retrievable + raw chunks (~190 entries)
│                              — one JSON per chunk; carries `source`,
│                              `metadata` (when annotated), `template`
│                              (transpiled A2UI tree, when annotated).
│                              Harvested by `scripts/build/harvest-chunks.mjs`
│                              from data-chunk markers across site/pages/*,
│                              apps/*, playgrounds/*, catalog/*.
│   └── _index.json            harvester output — name + by-kind tallies +
│                              normalized chunk list (shape consumed by
│                              chunk-loader + composition-library)
│
├── evals/                   held-out.jsonl + eval fixtures
├── feedback/                daily JSONL — user feedback events
├── gaps/                    gap registry — prompts with missing coverage
├── scripts/                 maintenance tooling (extract, ingest, feedback, ticket)
│   └── chunk-library.js       in-memory loader + keyword/semantic search over chunks/
│
├── catalog-a2ui_0_9.json       aggregated artifact — what a2ui-compose + a2ui-mcp read
├── catalog-a2ui_0_9_rules.txt  natural-language composition rules (per component)
├── common_types.json           shared A2UI type shapes
├── chunk-embeddings.json       pre-computed embeddings for chunk semantic search (0.0.4+)
├── functions.json              declarative wiring-engine function catalog
├── manifest.json               extraction metadata (what / when / counts)
├── pattern-specs.md            written specs for each pattern category (historical reference)
└── data-flow.md                how the signal sources feed the pipeline

What's committed vs generated vs published

| Kind | Committed | Published to npm | Source of truth | |----------------------------|:---------:|:----------------:|----------------------------------------| | Chunks (chunks/) | ✓ | ✓ | scripts/build/harvest-chunks.mjs (from data-chunk markers across site/pages, apps, playgrounds, catalog) | | catalog-a2ui_0_9.json | ✓ | ✓ | npm run components (assembled from yamls in packages/web-components/) | | chunk-embeddings.json | ✓ | ✗ since 0.2.1| scripts/build/embeddings-chunks.mjs | | Feedback JSONL | ✓ | ✓ | Written by @adia-ai/a2ui-retrieval at runtime | | Gap registry | ✓ | ✓ | Written by @adia-ai/a2ui-retrieval at runtime |

Extracted artifacts are committed for convenience (avoids a build step to read the pipeline), but the scripts are authoritative — regenerate via npm run harvest:chunks (chunks), npm run components (catalog), or npm run build:embeddings:chunks (chunk embeddings) if anything drifts.

Why embeddings ship via git, not npm

chunk-embeddings.json (~20 MB) is committed in git so the monorepo's own pipeline runs without a network round-trip, but excluded from the published npm tarball — every npm i @adia-ai/a2ui-corpus was pulling ~20 MB of pre-computed float arrays that consumers had no reliable way to address (the chunk-embedding-retriever resolves them via a relative-path that breaks under a node_modules/@adia-ai/a2ui-corpus/ install layout).

The companion pattern-embeddings.json retired in v0.4.7 §72 along with the patterns/ source directory. Its only consumer was concept-mapper.js (dead post-v0.4.6 §64 retirement of pattern-library.js). The pattern-embeddings build script (scripts/build/embeddings.mjs) and the matching embedding-retriever.js were retired in the same arc.

Consumers who want embedding-based retrieval either:

  1. Regenerate locally — npm run build:embeddings:chunks produces the chunk-embeddings.json file in your node_modules/@adia-ai/a2ui-corpus/ checkout. Requires API access to your embedding provider.
  2. Use the keyword-only fallback — chunk-library.searchChunks() works without embeddings; the embedding-aware searchChunksAsync path falls through to keyword scoring when the index file is absent (chunk-embedding-retriever.js returns null gracefully).

Embedding model pinning

The provider and model recorded in each *-embeddings.json header are the source of truth at query time. The retrievers (chunk-embedding-retriever.js, embedding-retriever.js) re-resolve the same embedder from those header fields — they do not auto-pick a different provider when the recorded one's API key is unset, because cross-model cosine similarity is meaningless and same-provider/ different-model emits different-dim vectors that cosine() short-circuits to 0 (silent retrieval failure).

Currently pinned defaults:

| Provider | Model | Dims | Env | |---|---|---:|---| | openai | text-embedding-3-small | 1536 | OPENAI_API_KEY | | voyage | voyage-3-lite | 1024 | VOYAGE_API_KEY |

detectProvider() (in packages/a2ui/retrieval/embedding/embedding-provider.js) prefers Voyage when both keys are present (denser vectors, lower cost). The build:embeddings* scripts record the chosen provider/model into the .json header, so subsequent reads always re-bind to the same model.

When upgrading to a new model (e.g. text-embedding-3-smalltext-embedding-3-large):

  1. Update the default in embedding-provider.js.
  2. Rebuild the chunk index (npm run build:embeddings:chunks).
  3. Verify with npm run check:embeddings-fresh that both index headers record the new model.
  4. Re-run npm run eval:diff -- --engine zettel to confirm the new model doesn't regress retrieval quality (different models score queries differently — thresholds in chunk-synthesizer.js may need a re-look, though the absolute keyword score floor remains independent).

Don't mix models. If one index records voyage-3-lite and the other records text-embedding-3-small, the retrievers will load both fine but the rankings will be incomparable across the two corpora.

Scripts

All run from repo root via npm:

npm run harvest:chunks       # full re-harvest of chunks/ from data-chunk markers
npm run components           # regenerate v0.9 sidecars + assemble catalog-a2ui_0_9.json
npm run components -- --verify   # fail if catalog/sidecars are stale vs yamls

npm run feedback:report      # human-readable feedback digest
npm run feedback:promote     # promote high-confidence feedback → new training data

npm run ticket               # open ticket tracker
npm run ticket:list          # list open tickets
npm run ticket:create        # create a ticket against corpus/pipeline

Script inventory (scripts/):

| Script | Purpose | |---------------------------|------------------------------------------------------| | chunk-library.js | In-memory loader + keyword/semantic search over chunks/ | | feedback-report.js | Aggregates feedback JSONL into a readable digest | | feedback-promote.js | Moves high-confidence feedback into training data | | ticket.mjs | Corpus/pipeline issue tracker |

Retired in v0.4.7 §72 (with the patterns/ + compositions/ dirs): extract.js, ingest.js, run-pipeline.mjs, build-pattern-index.mjs. These fed the pattern-library retrieval surface (pattern-library.js, retired v0.4.6 §64) and the exemplar→chunks pipeline (retired v0.4.4 §36). Pattern embeddings (scripts/build/embeddings.mjs) and the grounded-corpus triage audit (scripts/audit/grounded-corpus-triage.mjs) also retired in the same arc — the corpus is now one-format (chunks-only) and harvester-driven.

Repo-side build scripts (not in tarball; run from the workspace root):

| Script | Purpose | |-------------------------------------|----------------------------------------------------------| | npm run harvest:chunks | Walks site/pages/, apps/, playgrounds/, catalog/, harvests every [data-chunk] element, writes chunks/<name>.json + _index.json | | npm run build:embeddings:chunks | Generates chunk-embeddings.json (~190 chunks × 1536d) |

Exports

// Catalog — the aggregated read-target for engines.
// Carries per-component aliases under `components[name].x-adiaui.synonyms.tags`.
import catalog from '@adia-ai/a2ui-corpus';

// Chunk corpus (since 0.0.3 / 0.0.4)
import chunkIndex from '@adia-ai/a2ui-corpus/chunks';   // _index.json (metadata only)
import { searchChunks, searchChunksAsync, getChunk }
  from '@adia-ai/a2ui-corpus/chunk-library';            // in-memory query API

Authoring order — demo page → data-chunk marker → training

When adding coverage for a new intent:

  1. Live demo page — author the HTML in apps/<name>/app/<demo>/<demo>.contents.html (or under playgrounds/ / catalog/) using pure primitive composition. See repo-root AGENTS.md.
  2. Tag the reusable region — add data-chunk="<slug>" + data-chunk-kind="<kind>" (block / page / panel / field) on the element. The harvester extracts the bounding HTML on the next build. See docs/specs/genui-chunk-marker.md for the marker convention.
  3. Harvest + ingestnpm run harvest:chunks writes chunks/<slug>.json and refreshes _index.json. npm run pipeline does the full extract → ingest → catalog refresh.
  4. Verifynpm run eval:diff -- --engine zettel should still hold coverage ≥ 83%, avgScore ≥ 88 (per the regression floors in AGENTS.md).

See data-flow.md for the full pipeline (chunks → feedback).

Regression floors

The pipeline must hold these thresholds — tracked in the held-out benchmark:

  • Fragment reuse ratio ≥ 29.9% — 167 refs / 559 composition nodes
  • Zettel: coverage 100%, avgScore ≥ 88, MRR ≥ 0.94
  • Monolithic: coverage 100%, avgScore ≥ 95
  • Dogfood: 20/20 intents at avg ≥ 95

What this package does NOT contain

  • Pipeline runtime — gen-ui/
  • UI custom elements — web-components/
  • MCP transport — gen-ui-mcp/
  • Site / playground UI — /site/

If a file here is .js / .mjs, it's a maintenance script, not runtime. Runtime readers go through gen-ui/retrieval/*.

License

MIT