npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

convex-firecrawl-scrape

v0.1.2

Published

A Convex component for scraping web pages using the Firecrawl API with durable caching and reactive queries.

Readme

Convex Firecrawl Scrape Component

npm version

Scrape any URL and get clean markdown, HTML, screenshots, or structured JSON - with durable caching and reactive queries.

const { jobId } = await scrape({ url: "https://example.com" });
// Status updates reactively as the scrape completes
const status = useQuery(api.firecrawl.getStatus, { id: jobId });
  • Durable caching with configurable TTL (default 30 days)
  • Reactive status updates via Convex subscriptions
  • Multiple output formats: markdown, HTML, raw HTML, screenshots, links, images, AI summaries
  • JSON extraction via schema-based LLM processing
  • Built-in SSRF protection blocks private IPs and localhost
  • Secure by default with required auth wrapper

Live Demo | Example Code

Play with the example:

git clone https://github.com/gitmaxd/convex-firecrawl-scrape.git
cd convex-firecrawl-scrape
npm install
npm run dev

Pre-requisite: Convex

You'll need an existing Convex project. Convex is a hosted backend platform with a database, serverless functions, and more. Learn more here.

Run npm create convex or follow any of the quickstarts to set one up.

Installation

npm install convex-firecrawl-scrape

Install the component in your convex/convex.config.ts:

// convex/convex.config.ts
import { defineApp } from "convex/server";
import firecrawlScrape from "convex-firecrawl-scrape/convex.config.js";

const app = defineApp();
app.use(firecrawlScrape);
export default app;

Set your Firecrawl API key:

npx convex env set FIRECRAWL_API_KEY your_api_key_here

Get your API key at firecrawl.dev.

Usage

Always use exposeApi() to expose component functionality. This wrapper enforces authentication and controls API key access.

// convex/firecrawl.ts
import { exposeApi } from "convex-firecrawl-scrape";
import { components } from "./_generated/api";

export const { scrape, getCached, getStatus, getContent, invalidate } =
  exposeApi(components.firecrawlScrape, {
    auth: async (ctx, operation) => {
      const identity = await ctx.auth.getUserIdentity();
      if (!identity) throw new Error("Unauthorized");
      return process.env.FIRECRAWL_API_KEY!;
    },
  });

React Integration

import { useMutation, useQuery } from "convex/react";
import { api } from "../convex/_generated/api";
import { useState } from "react";

function ScrapeButton({ url }: { url: string }) {
  const [jobId, setJobId] = useState<string | null>(null);
  const scrape = useMutation(api.firecrawl.scrape);
  const status = useQuery(
    api.firecrawl.getStatus,
    jobId ? { id: jobId } : "skip",
  );
  const content = useQuery(
    api.firecrawl.getContent,
    jobId && status?.status === "completed" ? { id: jobId } : "skip",
  );

  return (
    <div>
      <button
        onClick={async () => setJobId((await scrape({ url })).jobId)}
        disabled={status?.status === "scraping"}
      >
        {status?.status === "scraping" ? "Scraping..." : "Scrape"}
      </button>
      {status?.status === "completed" && <pre>{content?.markdown}</pre>}
      {status?.status === "failed" && <p>Error: {status.error}</p>}
    </div>
  );
}

Output Formats

const { jobId } = await scrape({
  url: "https://example.com",
  options: {
    formats: ["markdown", "html", "links", "images", "screenshot"],
    storeScreenshot: true,
  },
});

| Format | Description | | ------------ | ------------------------------------------------------- | | markdown | Clean markdown content (default) | | html | Cleaned HTML | | rawHtml | Original HTML source | | links | URLs found on the page | | images | Image URLs found on the page | | summary | AI-generated page summary | | screenshot | Screenshot URL (use storeScreenshot: true to persist) |

JSON Extraction

Extract structured data using a JSON schema:

const { jobId } = await scrape({
  url: "https://example.com/product",
  options: {
    extractionSchema: {
      type: "object",
      properties: {
        name: { type: "string" },
        price: { type: "number" },
      },
      required: ["name", "price"],
    },
  },
});

const content = await getContent({ id: jobId });
console.log(content.extractedJson); // { name: "Widget", price: 99.99 }

Cache Management

Cached results use superset matching: a cache entry with ["markdown", "screenshot"] satisfies a request for ["markdown"].

// Check cache
const cached = await getCached({ url: "https://example.com" });

// Force refresh
const { jobId } = await scrape({ url, options: { force: true } });

// Invalidate cache
await invalidate({ url: "https://example.com" });

Proxy Options

For anti-bot protected sites:

const { jobId } = await scrape({
  url: "https://protected-site.com",
  options: {
    proxy: "stealth", // Residential proxy
    waitFor: 3000, // Wait for dynamic content
  },
});

Security

Always use exposeApi() - never expose component functions directly to clients. Server-side code can call component internals directly, but doing so bypasses authentication. It ensures:

  • Authentication before any operation
  • API key controlled by your callback, not callers
  • Operation-specific authorization support
// ❌ DANGEROUS - bypasses auth
export const scrape = components.firecrawlScrape.lib.startScrape;

// ✅ SAFE - auth enforced
export const { scrape } = exposeApi(components.firecrawlScrape, { auth: ... });

SSRF Protection: Built-in validation blocks localhost, private IPs, and non-HTTP schemes.

For domain allowlists, rate limiting, and detailed security guidance, see docs/SECURITY.md.

Error Handling

const status = await getStatus({ id: jobId });
if (status?.status === "failed") {
  console.error(status.error, status.errorCode);
  // errorCode is the HTTP status from Firecrawl (e.g., 402, 403, 429, 500)
}

Found a bug? Feature request? File it here.

Advanced Usage

For configuration options, the FirecrawlScrape class API, and URL utilities, see docs/ADVANCED.md.

Development

npm install
npm run dev

License

Apache-2.0