@autoview/cli

v0.4.3

Published

13 days ago

Read any OpenAPI document, prove it can be driven as a product, and emit the surfaces that drive it — a human frontend and an agent tool surface (MCP) — each verified.

Downloads

1,332

AutoView

Read any OpenAPI document, prove it can actually be driven as a product, and emit the surfaces that drive it — a human frontend, an agent tool surface (MCP), an RFP QA verdict — each verified to work.

              ┌─▶  frontend/        a human admin UI (Next.js + shadcn/ui)
swagger.json ─┼─▶  --emit mcp       an agent tool surface (runnable MCP server)
   (any)      ├─▶  --emit report    consumability verdict (can this API be driven?)
              ├─▶  --emit qa --rfp  RFP QA: does the API implement each requirement?
              └─▶  verify           proof it works (real browser / real agent)

AutoView reads an OpenAPI 3.x document into a deterministic semantic model of the API — its resources, their hierarchy, which endpoint produces the id another endpoint consumes, what is a read vs a mutation — and projects that one model into the surfaces that consume the API. Then it verifies them: the frontend's user journeys in a real browser, the agent's tasks against the live backend, and a static consumability report that flags ids no endpoint can supply.

The point is not "a pretty generated screen" (that is a commodity). The point is proof: that your API is consumable as a product, demonstrated rather than asserted.

AutoView is standalone and the whole READ → MAP → EMIT pipeline is LLM-free and deterministic — same document, same output, every time. The domain map, the screen set, and the page render are all derived structurally, so even a large swagger generates whole, un-sliced (Box's 234 operations → 142 pages, DigitalOcean's 287 → 106, both typecheck-clean). A model you supply is used only as an optional typecheck-recovery pass during frontend generation, and for agent-task verification.

Origin note: AutoView was first written inside the AutoBE monorepo and is published as @autoview/cli. All @autobe/* runtime dependencies have been removed.

The one idea

Everything is one deterministic IR (intermediate representation — the API's semantic model) and several emits off it. Build the IR once; every surface and every check is a projection of it.

                       ┌── frontend   (human view)
swagger ─▶  IR  ───────┼── tool surface / MCP  (agent view)
            │          └── consumability report (verdict)
            │
            └── verification  (proves the emits actually work)

The IR carries, per operation: the resource it belongs to and the resource hierarchy (sales → questions → comments), its role (list / detail / create / update / delete / action → read vs write), and the producer→consumer chain (the saleId that questions.list needs is produced by sales.list's id). That model is what a flat "OpenAPI → tools" or "OpenAPI → CRUD UI" dump lacks, and it is what makes the surfaces navigable and the report possible.

What it emits

| Emit | Command | For | LLM | | --- | --- | --- | --- | | Frontend | autoview swagger.json --out ./frontend | humans — a Next.js + shadcn/ui admin (sidebar nav, dense tables, detail views with actions, forms) | optional | | Agent tool surface (MCP) | autoview swagger.json --emit mcp | AI agents — a runnable MCP server; each tool carries read/write annotations and producer hints, and executes the real API over HTTP | none | | Consumability report | autoview swagger.json --emit report | API authors / CI — a deterministic verdict: every id a tool needs should be obtainable from another tool | none | | RFP QA report | autoview swagger.json --emit qa --rfp <reqs> | QA / product — does the API implement and drive each requirement? (coverage + missing capabilities) | none¹ | | Preview | autoview swagger.json --preview | a quick look — the endpoint tree + schemas as one HTML page | none |

All five read the same IR. The frontend and the MCP server are the same resource model rendered for two different consumers; the reports are that model checked for navigability (consumability) and against a spec (QA).

¹ The QA report itself is LLM-free; only turning a prose RFP into the structured requirement list needs a model. A structured JSON RFP runs with no key.

Two different things are called a "mode" — don't conflate them.
Emit mode — what AutoView produces, chosen on the autoview command line (the rows above: frontend / --emit mcp / --emit report / --emit qa / --preview).
Data mode — how the generated frontend runs, chosen later at npm run dev time: simulate (typia-mock data, no backend — NEXT_PUBLIC_API_SIMULATE=true) or live (real backend — NEXT_PUBLIC_API_HOST=<host>). It is just an env toggle on the generated app, switchable without regenerating.
For a demo you only touch one emit (frontend) and one data mode at a time (simulate to always have populated screens; live when the backend is up).

From an AutoBE backend to a UI (demo workflow)

AutoBE generates a backend and its OpenAPI document. AutoView turns that document into a running Next.js admin you can open in a browser — the human-facing proof that the AutoBE backend actually works. The whole flow is four commands, and with --no-llm no API key is needed at all — the READ → MAP → render pipeline is fully deterministic.

TL;DR — swagger → localhost in one paste

Where is the swagger? An AutoBE-generated backend always carries its OpenAPI document at a fixed path inside the project folder:

<project>/packages/api/swagger.json

That exact file is AutoView's input — the CLI reads the path you give it, it does not scan a folder, so point it straight at packages/api/swagger.json.

Copy the whole block (no inline comments — paste-safe). Set PROJECT to the AutoBE backend folder.

PROJECT=/Users/yongrean/Downloads/AutoBE.interfaceComplete
SWAGGER="$PROJECT/packages/api/swagger.json"
OUT="$PROJECT-frontend"

cd /Users/yongrean/Downloads/AutoView-Legacy
npm install
npm run build
node lib/cli/main.js "$SWAGGER" --out "$OUT" --no-llm
cd "$OUT"
npm install
PORT=3000 NEXT_PUBLIC_API_SIMULATE=true npm run dev

Open the exact URL the dev server prints — the Local: http://localhost:PORT line. Do not assume 3000: if that port is already taken, Next.js silently bumps to 3001, 3002, … and you would otherwise be staring at a different (possibly stale) app on 3000. If you see ⚠ Port 3000 is in use, trying 3001 instead, a previous npm run dev is still running — either open the new port it picked, or free 3000 first:

lsof -tiTCP:3000 -sTCP:LISTEN | xargs kill   # kill whatever holds 3000, then rerun

NEXT_PUBLIC_API_SIMULATE=true boots with typia-mock data so every screen is walkable without a backend. To run a second app alongside the first (e.g. a -2 project), give it its own port: PORT=3001 …. For real data instead of mock, swap the env for NEXT_PUBLIC_API_HOST=https://your-backend-host.

The numbered steps below break this same flow apart and explain each flag.

0 · One-time setup (in this repo)

npm install
npm run build            # builds lib/cli/main.js — invoked below as `node lib/cli/main.js`

1 · Inspect the swagger first — instant

node lib/cli/main.js ./erp.swagger.json --preview --out ./out
open ./out/preview.html          # endpoint tree + schemas + a READ-layer report

2 · Generate the frontend (no key)

node lib/cli/main.js ./erp.swagger.json --out ./erp-frontend --no-llm \
  --backend https://your-erp-host

--no-llm generates everything deterministically — same swagger, same output, zero tokens. (Drop --no-llm and pass --model/--api-key only if you want the optional LLM pass that repairs a page on the rare chance the deterministic render does not typecheck.)

--backend is the live ERP host baked into connection.ts. Omit it and AutoView auto-extracts servers[0].url from the swagger.
Large ERP swagger? Slice it — --include keeps only matching paths and the component schemas they reference, so even a huge spec generates: --include "erp/admin/**,erp/inventory/**" (or generate the whole thing; Box's 234 operations → 142 pages compiles clean).

3 · Open it in the browser

cd erp-frontend && npm install

# A) Guaranteed populated demo — typia-mock data, no backend needed:
PORT=3000 NEXT_PUBLIC_API_SIMULATE=true npm run dev

# B) Real data — against the live ERP backend (host must be reachable + allow CORS):
PORT=3000 NEXT_PUBLIC_API_HOST=https://your-erp-host npm run dev

Then open the Local: http://localhost:PORT line the dev server prints — not a hard-coded 3000. If 3000 is busy Next.js silently falls back to 3001/3002, and ⚠ Port 3000 is in use means a previous npm run dev is still holding it (lsof -tiTCP:3000 -sTCP:LISTEN | xargs kill frees it).

For a demo where the backend might not be reachable, use (A) — every list, detail, and form renders with mock data so the whole UI is walkable. Switch to (B) the moment the real ERP host is up to show live rows.

Tip for an ERP swagger with authentication: in live mode the app may try to bootstrap a session against the auth endpoints on first load. If that stalls (host down, CORS, no seed account), demo in simulate mode (A) — it never calls the backend, so the full UI is always walkable.

Common knobs

| Goal | Flag | | --- | --- | | Slice a huge ERP spec | --include "erp/admin/**" (and --exclude to drop noise) | | Force mock mode at generate time | --backend="" | | Inspect without generating | --preview (HTML) or --emit report (navigability verdict) | | Tune render concurrency | --semaphore 8 |

Quick start

npm install -g @autoview/cli     # installs the `autoview` command — or: npx @autoview/cli ...

A human frontend — deterministic, no key needed with --no-llm:

autoview ./swagger.json --out ./frontend --no-llm
cd frontend && npm install && npm run dev        # open the Local URL it prints

(Drop --no-llm and pass --model/--api-key to add the optional LLM typecheck-recovery pass.)

An agent tool surface (MCP server):

autoview ./swagger.json --emit mcp --out ./mcp
cd mcp && npm install
API_HOST=https://api.example.com API_TOKEN="Bearer …" npm start

Then wire it into Claude Desktop / Cursor / Claude Code (the generated README.md has the exact mcpServers JSON).

A consumability verdict:

autoview ./swagger.json --emit report --out ./out
cat out/consumability-report.md

An RFP QA verdict — does the API implement each requirement? (no key for a structured JSON list; a prose RFP needs --model):

autoview ./swagger.json --emit qa --rfp ./requirements.json --out ./out
cat out/qa-report.md

Verification — proof, not assertion

A typecheck score is not proof a product works. AutoView verifies the emits the way a user (or an agent) actually exercises them.

Frontend workflows — autoview … --verify boots the generated app in a real headless browser and walks the derived user journeys (open the list → it renders rows or an honest empty state → open a record → its fields render), writing wiki/verification.md with per-step pass/fail evidence.
Agent tasks — verifyAgentTasks(document, { client, model, baseUrl, tasks }) drives a real agent through named tasks against the tool surface + live backend and grades each. The model and key are injected by you (e.g. from .env) — which model your API's agents use is your call, not ours.
Structural (LLM-free) — analyzeConsumability(document) proves the tool graph is navigable, with one model shared by the tool surface so the report never claims navigability the tools don't hint. Each path-param input is one of: resolved (a list/search tool produces the id and the surface says which one), nested (the id appears inside a parent read's response — order → goods[].id — navigable but not via a dedicated list), orphan (an entity *_id no endpoint can supply — a genuine gap), caller-supplied key (repository_name, scope — a human-known value, not a chained id), or undetermined (the resource is read but no id could be traced, often an inline schema). A detail read is never counted as its own producer (circular); only true id orphans count as defects.

That find-and-fix loop is the product: run it, and you get the surface plus the evidence it works — or exactly which step/endpoint breaks.

Run it yourself

examples/verify-agent.ts verifies a swagger as an agent tool surface both ways — structurally (always) and behaviorally (when a model is configured):

npm run verify:agent                       # bundled petstore → public backend
# or: npm run verify:agent ./your.json https://api.example.com

── structural (deterministic) ──
   18 tools — 8 read, 10 write
   3 id inputs resolved · 0 orphan · 100% navigable

── behavioral (openai/gpt-4o-mini → https://petstore3.swagger.io/api/v3) ──
   PASS  list
         tools: pet.findByStatus.get
   PASS  producer→consumer chain
         tools: pet.findByStatus.get → pet.getByPetid
         answer: The pet's ID is 105484548 and its name is "modi".

   2/2 tasks passed end-to-end (real agent, live backend).

(The behavioral transcript was recorded 2026-06-08; petstore3 is a public demo server and is sometimes down — on a bad day the structural pass still runs, and the agent honestly reports the failing calls. The 3/2/3 structural split: petId ×3 resolved from the pet list, orderId ×2 untraceable — the API has no order list to produce it — and username ×3 caller-supplied.)

The chain task is the point: the agent lists pets, takes an id from the response, and calls the detail endpoint with it — the producer→consumer link the tool surface declares, exercised against a live server. The structural pass runs with no key; the behavioral pass needs AUTOVIEW_API_KEY + AUTOVIEW_MODEL (the model is your choice).

The measured wedge — where the hints actually matter

We A/B'd the shaped surface against a naive METHOD /path tool dump — same model, same tools, same deterministic backend, blind arms, temperature 0, five graded tasks × 5 runs each (suite · full results):

| task | naive dump | shaped surface | what it measures | | -------- | ---------- | -------------- | ------------------------------------------ | | flat | 5/5 | 5/5 | control — list → read a field | | nested | 5/5 | 5/5 | read a value buried in units[].stocks[] | | write | 5/5 | 5/5 | source one id, POST, report the response | | chained | 0/5 | 5/5 | a discovered id must FEED a follow-up call | | boundary | 5/5 | 5/5 | the id only appears under a renamed field |

The verdict is narrower and sharper than "agents fail on nested data": reads take care of themselves — a capable model lifts values (even renamed ids) straight out of response bodies. What it cannot do is know where an id comes from the moment that id becomes an input to the next call: the naive arm fabricated stock ids and 404'd on every single chained run. The producer chain in the tool descriptions eliminates exactly that failure, deterministically. And ids don't only travel as path params: a cart's commodities.create has no path params at all — its sale_id / stocks[].unit_id / stocks[].stock_id live in the request body. The surface wires those too (bodyProducers): the producing tool is annotated on the body schema property itself, in the description, and in the error channel, with the consumer's own namespace excluded (the cart's index can't bootstrap the cart's first create). Reproduce it with your own model:

AUTOVIEW_SELFTEST=1 npm run ab:suite     # LLM-free wiring check first
npm run ab:suite -- 5                    # needs AUTOVIEW_API_KEY + AUTOVIEW_MODEL

And it scales: with 110 distractor tools from the real shopping swagger on the table (AUTOVIEW_DISTRACTORS=examples/shopping.swagger.json), chained stays 0/5 vs 5/5 — and the naive arm additionally collapses on plain tool selection (1/5 on a task it solved at 6 tools, blind-firing destructive calls while wandering), while the shaped surface stays 5/5 throughout.

Reproducibility note (2026-06-12). Every number above was re-run a day after it was recorded. The headline rows reproduce exactly: chained 0/5 vs 5/5 at 6 tools and again at 116 tools, ab:nested 0/5 vs 5/5, and the naive selection collapse under noise (boundary 0–1/5). The non-headline 116-tool control rows drifted with the provider snapshot (shaped flat 5/5 → 0/5 — identical traces at two different code revisions, so provider-side, not ours), and live-backend runs of the real-surface tasks swung between same-day runs (temperature: 0 does not pin OpenRouter outputs). Two standing conclusions: treat live/small-N behavioral runs as smoke and defect discovery, never headline metrics — and the drift itself exposed a real failure mode worth knowing: when the first tool selection lands in the wrong distractor cluster, hint-following keeps the agent coherently wrong, while a hint-less arm recovers by name-scanning. The full reproduction audit is in the results file.

RFP QA — check the API against a spec

--emit qa re-aims the consumability check from "is this navigable" to "does this satisfy the requirements". Give it a requirement list and it rules on each: satisfiable (the endpoints that implement it exist AND every id they need is obtainable), unreachable (implemented, but an id has no producer), missing (a named endpoint is absent — an API gap), or unmapped (could not be tied to any endpoint). It never silently passes a requirement it could not place.

A structured JSON requirement list runs with no key:

cat > rfp.json <<'EOF'
{ "requirements": [
  { "id": "FR-1", "statement": "An employee can browse invoices." },
  { "id": "FR-2", "statement": "An employee can create a journal entry." },
  { "id": "FR-3", "statement": "A manager can export the tax filing as PDF." }
] }
EOF
autoview ./erp.swagger.json --emit qa --rfp ./rfp.json --out ./out

- 3 requirements · 67% satisfiable end-to-end.
- ✅ 2 satisfiable — backing endpoints exist and are reachable.
- ❔ 1 unmapped — could not be tied to any endpoint.

## 🔴 Functional capabilities with no endpoint
- FR-3  A manager can export the tax filing as PDF.   ← the API never implemented it

A prose RFP (.md / .txt) works too — the model splits it into atomic requirements and maps each to real endpoints (using the document's own AutoBE annotations as context), then the verdict above is computed deterministically. That parse is the only model-gated step; pass --model / --api-key:

autoview ./erp.swagger.json --emit qa --rfp ./requirements.md \
  --model gpt-4.1-mini --api-key "$OPENAI_API_KEY" --out ./out

To go one step further and actually run each satisfiable requirement as a real agent against a live backend (a live pass/fail beside the static verdict), see examples/qa-live-shopping.ts.

How it reads the API (the IR)

swagger.json
   │
   ▼  READ      fromSwagger() → toEndpoints()         (deterministic, LLM-free)
   │            • upgrade Swagger 2 / OpenAPI 3.0/3.1 → one normalized document
   │            • resolve $ref parameters (incl. cross-path JSON pointers),
   │              flatten allOf composition, drop unroutable (`#`-fragment) paths
   │            • every operation: method · path · accessor · path-params ·
   │              QUERY · requestBody · responseBody
   │
   ▼  MAP       classifyEndpoints → resourcePlan       (deterministic, LLM-free)
   │            • CRUD role per endpoint (read vs write)
   │            • resource hierarchy / chain from the path
   │            • producer→consumer links (which id comes from where)
   │
   ▼  EMIT      frontend page-gen · MCP tools · report  (deterministic)
   │            + optional LLM typecheck-recovery pass on the frontend only
   ▼
  surfaces + verification

The READ and MAP layers — the IR — are 100% deterministic and reused by every emit. Robustness was hardened against real third-party specs: $ref parameter pointers (DigitalOcean), allOf composition and entries-style collection wrappers (Box), numeric ids, and #-fragment paths.

CLI

autoview <swagger.json> [options]

  --out <dir>           Output directory (default: ./frontend)
  --emit mcp            Generate a runnable MCP server (agent tool surface). LLM-free.
  --emit report         Write consumability-report.md (navigable graph + orphans). LLM-free.
  --emit qa --rfp <f>   Write qa-report.md — does the API implement & drive each
                        requirement in <f>? <f> is structured JSON
                        ({ "requirements": [{ id, statement, endpoints? }] }, LLM-free)
                        or a prose RFP (.md/.txt → needs --model to structure it).
  --preview             Write preview.html (endpoint tree + schemas). LLM-free.
  --verify              Verify the frontend in a real headless browser (user workflows).
  --backend <url>       Live backend for the frontend's connection.ts / the report's hints.
                        Auto-extracted from the swagger's servers[] when omitted.
  --include <globs>     Comma-separated path globs to KEEP — slices the whole
                        document (paths AND the component schemas they reference),
                        so a large swagger fits the model and actually generates.
  --exclude <globs>     Comma-separated path globs to DROP (applied after include).
  --model <name>        LLM model (only for the optional frontend RENDER polish).
  --api-key <key>       Vendor key (or AUTOVIEW_API_KEY / OPENAI_API_KEY env).
  --base-url <url>      OpenAI-compatible endpoint (or AUTOVIEW_BASE_URL env).

--emit, --report, and --preview need no model or key — they are deterministic.

Install & programmatic API

npm install @autoview/cli

import {
  fromSwagger,
  buildToolSurface,           // IR → agent tools (read/write annotated, producer hints)
  emitMcpServer,              // IR → a runnable MCP server project (files map)
  analyzeConsumability,       // IR → navigable / orphan / undetermined breakdown
  emitConsumabilityReport,    // → markdown report
  analyzeRequirementCoverage, // requirements + IR → per-requirement QA verdict
  emitQaReport,               // → markdown RFP QA report
  parseRfp,                   // prose RFP + IR → structured requirements (LLM)
  runRequirementChecks,       // drive each requirement live against a backend
  verifyAgentTasks,           // run an agent through tasks against the tool surface
  AutoViewAgent,              // the frontend generator (optional LLM polish)
} from "@autoview/cli";

const document = fromSwagger(require("./swagger.json"));

// e.g. RFP QA, no LLM:
const qa = emitQaReport(
  [{ id: "FR-1", statement: "Browse invoices", endpoints: ["invoices.get"] }],
  document,
);

// agent tool surface + structural proof — no LLM, no network
const tools = buildToolSurface(document);
const coverage = analyzeConsumability(document);
console.log(`${tools.length} tools, ${coverage.orphan.length} orphan inputs`);

// a runnable MCP server, written to disk by the caller
const files = emitMcpServer(document, { backend: "https://api.example.com" });

Verified on

Real third-party OpenAPI documents, end-to-end:

| Swagger | Domain | Ops | Frontend typecheck | Navigability (report) | | --- | --- | --- | --- | --- | | @samchon/shopping | e-commerce | 206 | 0 errors | 100% (203 resolved + 11 nested, 0 orphan) | | DigitalOcean | cloud infra | 287 | 0 errors (106 pages, whole) | 99% (146 resolved, 1 orphan; 47 undetermined, 52 caller-keys) | | Box | file platform | 234 | 0 errors (142 pages, whole) | 97% (77 resolved, 2 orphan; 84 undetermined, 19 caller-keys) | | Petstore | (canonical) | 18 | 0 errors (11 pages) | 100% (3 resolved, 0 orphan) |

Frontend workflow verification (real browser): shopping 13/13 journeys pass, DigitalOcean 7/7. The emitted MCP server was driven end-to-end by a real MCP client; agent tasks pass against the tool surface with a configurable model.

Honest limits

Tool shaping helps exactly where the chain is hard to guess. On a flat 1-hop chain (list → detail), a capable model reconstructs it from names alone, so a naive surface chains about as well as the shaped one — shaping is not magic there. But on a nested chain — an id buried in a parent's response array (a sale's units[].stocks[].id) — the model cannot guess where the id lives, and a naive surface fails. A blind A/B (same model, same backend; examples/ab-nested-chain.ts, npm run ab:nested) is decisive: naive 0/5, shaped 5/5 — the naive agent blindly calls the deep endpoint with ids it doesn't have; the shaped agent is told "stockId ← sale detail at units[].stocks[].id" and gets it in two calls. So the durable value is the read/write safety annotations, the deterministic navigability proof, AND the nested producer hints that close chains a model can't infer.
Inline response schemas are partly handled. The producer→consumer analysis now introspects inline object responses and resource-named collection wrappers ({ databases: [...] }) including inline array items — lifting DigitalOcean's navigability from 13% to 86%. Inputs it still cannot trace are reported as undetermined, never as a defect.
Frontend RENDER is deterministic by default. The pages are generated from the schema (one column per field, typed SDK wiring) — no LLM needed for a working app. An LLM is only used for an optional aesthetic polish pass.

License

See LICENSE.