npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@aiacta-org/crawl-manifest-client

v1.0.14

Published

Client for the AIACTA Crawl Manifest API. Query AI providers to see which of your pages were crawled, when, and for what purpose (AIACTA Proposal 1, §2.2)

Readme

@aiacta-org/crawl-manifest-client

Query any AIACTA-compliant AI provider's Crawl Manifest API. See exactly which of your pages were crawled, when, and for what purpose (AIACTA Proposal 1, §2.2).

npm version License AIACTA Spec

Available in Node.js and Python.


What is this?

The Crawl Manifest API is what AI providers expose so publishers can audit their crawl history — the AI equivalent of Google Search Console. This client handles the API query, automatic pagination, rate-limit backoff, and local caching.


Install

Node.js

npm install @aiacta-org/crawl-manifest-client

Python

pip install crawl-manifest-client

Quick Start

Node.js

const { CrawlManifestClient } = require('@aiacta-org/crawl-manifest-client');

const client = new CrawlManifestClient({
  provider: 'anthropic',
  apiKey:   process.env.ANTHROPIC_PUBLISHER_KEY,
});

// Fetch all crawl events for your domain — pagination handled automatically
for await (const entry of client.fetchAll({
  domain: 'yourdomain.com',
  from:   '2026-03-01T00:00:00Z',
  to:     '2026-03-31T23:59:59Z',
})) {
  console.log(entry.url);
  console.log('Last crawled:', entry.last_crawled);
  console.log('Purpose:',     entry.purpose);           // ['rag', 'index', ...]
  console.log('HTTP status:', entry.http_status_at_crawl);
  console.log('Content hash:', entry.content_hash);     // sha256:...
}

Python

from crawl_manifest_client import CrawlManifestClient

client = CrawlManifestClient(
    provider='anthropic',
    api_key=os.environ['ANTHROPIC_PUBLISHER_KEY']
)

for entry in client.fetch_all(
    domain='yourdomain.com',
    from_date='2026-03-01T00:00:00Z',
    to_date='2026-03-31T23:59:59Z'
):
    print(entry['url'], entry['purpose'])

Filter by purpose

// See only pages crawled for model training
for await (const entry of client.fetchAll({
  domain:  'yourdomain.com',
  from:    '2026-01-01T00:00:00Z',
  to:      '2026-03-31T00:00:00Z',
  purpose: ['training'],
})) {
  console.log('Trained on:', entry.url);
}

Valid purpose values: training · rag · index · quality-eval


Each result contains

| Field | Type | Description | |-------|------|-------------| | url | string | The full URL that was crawled | | last_crawled | ISO 8601 | When most recently crawled | | crawl_count_30d | integer | How many times in the last 30 days | | purpose | string[] | What the AI used the content for | | http_status_at_crawl | integer | HTTP status received (200, 404, etc.) | | content_hash | string | SHA-256 of the page content at crawl time |


Date range limits

The API allows a maximum of 90 days per request (§2.2). For longer periods, split your query:

// Query 6 months across two requests
const ranges = [
  { from: '2025-10-01T00:00:00Z', to: '2025-12-30T00:00:00Z' },
  { from: '2026-01-01T00:00:00Z', to: '2026-03-31T00:00:00Z' },
];
for (const range of ranges) {
  for await (const entry of client.fetchAll({ domain: 'yourdomain.com', ...range })) {
    // process entry
  }
}

Requesting more than 90 days throws a RangeError before making any API calls.


Rate limits & caching

  • The spec allows 60 requests/hour per domain (§2.2)
  • The client warns when you approach the limit
  • Automatically waits on 429 responses, respecting X-RateLimit-Reset
  • Caches responses in memory for 1 hour — repeated queries return instantly

API Reference

new CrawlManifestClient({ provider, apiKey, baseUrl? })

| Option | Type | Required | Description | |--------|------|----------|-------------| | provider | string | Yes | Provider identifier, e.g. 'anthropic', 'openai', 'google' | | apiKey | string | Yes | Bearer token from the provider's Publisher Portal | | baseUrl | string | No | Override the API base URL (for testing) |

client.fetchAll({ domain, from, to, purpose? })

Returns an async generator yielding CrawlManifestUrl objects. Handles all pagination automatically.

| Option | Type | Required | Description | |--------|------|----------|-------------| | domain | string | Yes | Your domain, e.g. 'yourdomain.com' | | from | ISO 8601 | Yes | Start of query window | | to | ISO 8601 | Yes | End of query window (max 90 days from from) | | purpose | string[] | No | Filter to specific purpose values |


Verify content integrity

const { computeContentHash } = require('@aiacta-org/crawl-manifest-client/src/node/content-hash');

const yourPageHtml = fs.readFileSync('./article.html', 'utf-8');
const yourHash     = computeContentHash(yourPageHtml);

for await (const entry of client.fetchAll({ domain: 'yourdomain.com', from, to })) {
  if (entry.content_hash !== yourHash) {
    console.warn('Content mismatch on', entry.url, '— AI may have crawled an older version');
  }
}

Related packages

| Package | Purpose | |---------|---------| | @aiacta-org/ai-attribution-lint | Validate your ai-attribution.txt | | @aiacta-org/ai-citation-sdk | Receive citation webhook events |


License & Copyright

Copyright © 2026 Eric Michel, PhD. Licensed under the Apache License 2.0.

Part of the AIACTA open standard.