npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@cafitac/ai-crawler

v0.1.2

Published

npm delivery wrapper for the ai-crawler Python CLI

Readme

ai-crawler

CI

AI-driven network-first crawler compiler for authorized workflows.

ai-crawler turns captured network evidence into reusable crawler recipes. The browser is used as a short-lived probe for API discovery, not as the crawling engine. Bulk collection runs through deterministic HTTP replay with curl-cffi.

Browser is not the crawler. Browser is the probe.
AI is not the request loop. AI is the planner/debugger/recipe author.

What it is

ai-crawler is an early-stage Python OSS library and CLI for building crawler recipes from network evidence.

It focuses on:

  • Network-first API discovery and replay
  • Recipe generation, testing, repair, and deterministic execution
  • Simple CLI defaults for humans and AI harnesses
  • Python SDK facade for application integrations
  • stdio MCP server for Hermes, Claude Code, Codex, and other agents
  • Local-first tests with fake transports and fixture sites
  • Security boundaries: redaction, challenge detection, and no CAPTCHA/MFA/bot-challenge bypass logic

Install for local development

git clone https://github.com/cafitac/ai-crawler.git
cd ai-crawler
uv sync --extra dev --extra http --extra mcp

If you are already inside a local checkout:

uv sync --extra dev --extra http --extra mcp

npm wrapper

For npm-first onboarding, the repo also ships a thin Node wrapper that delegates to the Python core:

npx @cafitac/ai-crawler --help
npx @cafitac/ai-crawler auto evidence.json --json
npx @cafitac/ai-crawler mcp

Wrapper behavior:

  • inside the repo checkout: runs the local Python core with uv run --project <repo> ai-crawler ...
  • outside the repo checkout: runs the published Python core via a git-pinned uvx spec when the wrapper package includes gitHead, otherwise falls back to uvx --from "git+https://github.com/cafitac/ai-crawler.git[all]" ai-crawler ...
  • override the published Python package spec with AI_CRAWLER_PYTHON_SPEC
  • override the uvx Python version with AI_CRAWLER_UVX_PYTHON

Quick start

The one-command path from URL to crawler artifacts is:

uv sync --extra browser --extra http
uv run --extra browser --extra http ai-crawler compile https://example.com/products --goal "collect products" --json

compile opens the page briefly, records normalized network response events into evidence.json, generates a recipe, tests it, repairs extraction when possible, retests, and writes final JSONL output. The browser is only used for discovery; the generated recipe and final crawl use deterministic HTTP replay. By default, probe evidence keeps replay-friendly fetch/xhr 2xx/3xx responses and drops static assets, failed responses, and other browser noise.

If you want to inspect or edit evidence before compiling, split the flow:

uv run --extra browser ai-crawler probe https://example.com/products --goal "collect products"
uv run --extra browser ai-crawler probe https://example.com/products --goal "collect products" --wait-ms 2500 --max-events 50 --include-resource-type fetch,xhr,document
uv run --extra http ai-crawler auto evidence.json --json

If you already have an evidence file, the main AI-harness command is:

ai-crawler auto evidence.json --json

With a local checkout:

uv run --extra http ai-crawler auto evidence.json --json

This writes default artifacts:

evidence.json            # browser probe evidence, if generated by probe
recipe.yaml              # initial generated recipe
repaired.recipe.yaml     # repaired/final recipe
test.jsonl               # initial diagnostic crawl output
crawl.jsonl              # final crawl output
auto.report.json         # stable machine-readable report

The JSON report includes:

  • final success/failure status
  • command_type (compile or auto)
  • failure_phase for quick triage (probe, generate, final_test, or empty on success)
  • ordered phase_diagnostics for probe -> generate -> initial_test -> repair -> final_test
  • recipe/output paths
  • initial and final crawl results
  • bounded/redacted diagnostic samples
  • failure classifications such as success, extraction_failed, http_error, no_response, challenge_detected, probe_failed, and no_endpoint_candidates

In --json mode, stdout is reserved for one machine-readable JSON object. Human-readable failures are written to stderr. Exit code 2 still writes auto.report.json so agents can inspect the failure.

Evidence format

Create evidence with a short browser probe:

uv run --extra browser ai-crawler probe https://example.com/products --goal "collect products" --output evidence.json

The probe tuning options are available on both probe and compile:

  • --wait-ms: browser settle time after network idle (default: 1000)
  • --max-events: maximum replay candidates retained after filtering (default: 200)
  • --include-resource-type: comma-separated Playwright resource types to retain (default: fetch,xhr)

Minimal evidence JSON:

{
  "target_url": "https://example.com/products",
  "goal": "collect products",
  "events": [
    {
      "method": "GET",
      "url": "https://example.com/api/products?page=1",
      "status_code": 200,
      "resource_type": "fetch"
    }
  ]
}

Generate and run manually:

uv run --extra browser --extra http ai-crawler compile https://example.com/products --goal "collect products" --json

Or run each artifact step yourself:

uv run --extra http ai-crawler generate-recipe evidence.json
uv run --extra http ai-crawler test-recipe recipe.yaml
uv run --extra http ai-crawler repair-recipe recipe.yaml
uv run --extra http ai-crawler test-recipe repaired.recipe.yaml --output crawl.jsonl

MCP usage

Generate client config snippets for local uv-project usage. For copy-paste examples across CLI/MCP/SDK flows, also see docs/harness-examples.md.

uv run ai-crawler mcp-config --client hermes --project /path/to/ai-crawler
uv run ai-crawler mcp-config --client claude-code --project /path/to/ai-crawler
uv run ai-crawler mcp-config --client codex --project /path/to/ai-crawler

Generate npm-first snippets for the published wrapper:

uv run ai-crawler mcp-config --client hermes --launcher npm

Run as a stdio MCP server:

uv run --extra mcp --extra http ai-crawler mcp

Exposed tools:

  • compile_url
  • auto_compile
  • generate_recipe
  • test_recipe
  • repair_recipe

If you prefer npm-first installation for agent tooling, the wrapper can also launch the MCP server:

npx @cafitac/ai-crawler mcp

Hermes development snippet shape:

mcp_servers:
  ai-crawler:
    command: "uv"
    args: ["run", "--project", "/path/to/ai-crawler", "--extra", "mcp", "--extra", "http", "ai-crawler", "mcp"]
    timeout: 300
    connect_timeout: 60

Hermes npm-first snippet shape:

mcp_servers:
  ai-crawler:
    command: "npx"
    args: ["-y", "@cafitac/ai-crawler", "mcp"]
    timeout: 300
    connect_timeout: 60

Python SDK

The Python SDK remains the stable embedded/programmatic surface. The npm package is only a launcher wrapper around this Python core. See docs/harness-examples.md for copy-paste SDK, MCP, and published-wrapper examples.

npm publishing is automated with .github/workflows/npm-publish.yml.

  • push a tag matching the package version, for example npm-v0.1.2
  • or run the workflow manually with workflow_dispatch
  • the workflow validates that package.json, pyproject.toml, and src/ai_crawler/__init__.py agree on the release version before publish
  • tag-triggered publishes also validate that the pushed tag matches npm-v<package.json version>
  • use docs/release-runbook.md for the full version bump, tagging, and post-publish smoke checklist

Example tag flow:

git tag npm-v0.1.2
git push origin npm-v0.1.2
from ai_crawler import AICrawler

crawler = AICrawler()
result = crawler.auto("evidence.json")
print(result.ok)
print(result.exit_code)
print(result.report)

compile_result = crawler.compile_url("https://example.com/products", goal="collect products")
print(compile_result.report["command_type"])

For tests or embedded usage, inject a fake fetcher:

crawler = AICrawler(fetcher=my_fake_fetcher)

Verification

Fast local lint/type checks while iterating:

bash scripts/check-python.sh

Full project verification:

bash scripts/verify-ai-harness.sh

MCP auto_compile fixture smoke test:

uv run --extra http python scripts/smoke-mcp-auto-compile.py

This starts a local fixture HTTP site and verifies generate -> test -> repair -> retest without external internet, a real browser, or a real LLM.

Security and compliance boundary

ai-crawler is intended for authorized crawling, internal QA/testing, research, owned or allowed web property monitoring, and data portability workflows.

It does not implement:

  • CAPTCHA solving
  • MFA bypass
  • Cloudflare/bot-challenge bypass
  • stealth fingerprint manipulation
  • evasion proxy rotation

Challenge-like responses are classified and surfaced as requiring human/manual handoff where appropriate.

Sensitive values in diagnostic reports are redacted, including common bearer tokens, cookies, session IDs, API keys, and JSON-embedded token fields.

Documentation

Development docs live under .dev/:

  • .dev/README.md
  • .dev/03-ai/auto-harness-contract.md
  • .dev/04-mcp/server.md
  • .dev/08-operations/security-and-compliance.md
  • .dev/08-operations/challenge-handling-policy.md

Status

Alpha. The deterministic recipe compiler, one-command compile flow, browser probe CLI, CLI, SDK facade, MCP server, redaction, failure classification, and fixture smoke tests are implemented. Real LLM provider integrations are intentionally optional/future layers behind adapter boundaries.

License

MIT