yo-url-yo-json
v0.2.0
Published
Extract validated JSON from webpages using JSON Schema, CloakBrowser, llm-scraper, and Codex CLI.
Maintainers
Readme
🪀✨ yo url yo json
URL + Schema -> JSON
yo-url-yo-json is a Bun + TypeScript CLI for extracting validated JSON from a webpage using llm-scraper, CloakBrowser, and ai-sdk-provider-codex-cli.
how to use it
- say what you want to parse via cli or agent skill.
- get a generated json schema.
- use url + schema to generate reusable parser code.
- run it as many times as you want.
usage examples
install
- Bun
- Docker
- Codex auth:
codex loginorOPENAI_API_KEY - Project deps:
bun install
usage
agent skill
Project-local skill: skills/yo-url-yo-json/SKILL.md.
cli
Generate a JSON Schema with Codex:
yo-url-yo-json generate-schema --out ./schemas/product.schema.jsonParse a page with Codex-powered extraction:
yo-url-yo-json parse --url "https://example.com" --schema ./schemas/product.schema.jsonparse and generate-schema are subcommands of the single yo-url-yo-json executable; they are not separate npm bin aliases.
Useful options for yo-url-yo-json parse:
--cache-dir .yo-url-yo-json/scripts
--model gpt-5.5
--force-regenerate
--headed
--verbosepackage.json
From another TypeScript project:
bun add -d yo-url-yo-json{
"scripts": {
"extract": "yo-url-yo-json parse --url https://example.com --schema ./schemas/product.schema.json",
"schema": "yo-url-yo-json generate-schema --out ./schemas/product.schema.json"
}
}Stdout is parsed JSON. Diagnostics go to stderr.
runtime flow
flowchart TD
A["URL + JSON Schema"] --> B["Start CloakBrowser via Docker"]
B --> C{"Cached extractor valid?"}
C -- "yes" --> D["Run cached Playwright extractor"]
C -- "no" --> E["Generate extractor with llm-scraper + Codex CLI"]
D --> F["Validate JSON Schema"]
E --> F
F --> G["Print JSON"]Generated extractors are saved under .yo-url-yo-json/scripts/.
schemas
The project uses JSON Schema. Schema paths must end in .json.
cloakbrowser docker runtime
We use CloakBrowser via Docker.
Useful commands:
bun run docker:pull
bun run docker:cleanup
yo-url-yo-json parse --url "https://example.com" --schema ./examples/product.schema.jsondevelopment
bun run typecheck
bun test
# publish
bun run build
npm pack --dry-run
npm publish