npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@sholajegede/convex-bright-data-datasets

v0.1.3

Published

A convex bright data datasets component for Convex.

Readme

@sholajegede/convex-bright-data-datasets

A Convex component that wraps Bright Data's Datasets API with reactive storage. Trigger async collections for LinkedIn profiles, Amazon products, Instagram posts, job listings, Airbnb, Zillow, Google Maps, and 120+ other datasets — receive results via webhook, and subscribe to structured records in real time via useQuery. No polling, no custom webhook infrastructure, no storage layer to build.

npm version Convex Component

Found a bug? Feature request? File it here.

How it works

Without this component, getting fresh LinkedIn company data, Amazon product data, or job postings into a Convex app means building the whole pipeline yourself: trigger the snapshot, handle the webhook, parse NDJSON, store the records, expose queries. This component does all of that in one install.

You call brightDatasets.trigger() from a Convex action. The component stores the snapshot metadata immediately, mounts a webhook handler that receives the results when Bright Data is done, parses and stores every record in component-owned tables, and updates the snapshot status to ready. Your frontend subscribes via useQuery and updates the moment data lands.

App calls trigger()
        ↓
Bright Data collection job starts
        ↓
Component stores snapshot as "pending"
        ↓
Bright Data POSTs results to webhook handler
        ↓
Component parses NDJSON, stores records, marks snapshot "ready"
        ↓
All useQuery subscribers notified automatically
        ↓
UI updates in real time

Features

  • Async dataset collections — trigger any Bright Data dataset (LinkedIn, Amazon, Instagram, job postings, and 120+ more) from a Convex action
  • Webhook receiver — mount a single HTTP route and the component handles the rest: parsing, storage, status updates
  • Reactive records — subscribe to records via useQuery, live updates as webhook delivers batches
  • Snapshot tracking — every job is stored with status (pendingcollectingdigestingready), record count, and timing
  • Synchronous scrape — for small single-URL jobs, get results immediately without a webhook
  • Progress polling — poll Bright Data for status updates and sync to Convex reactively
  • Cancel support — cancel a running collection and update snapshot status instantly
  • Delivery logs — every webhook event is logged per snapshot for debugging
  • Discovery mode — trigger keyword, category, or URL-based discovery collections
  • Custom output fields — filter which fields Bright Data returns

Prerequisites

Installation

npm install @sholajegede/convex-bright-data-datasets

Add the component to your convex/convex.config.ts:

import { defineApp } from "convex/server";
import convexBrightDataDatasets from "@sholajegede/convex-bright-data-datasets/convex.config.js";

const app = defineApp();
app.use(convexBrightDataDatasets);

export default app;

Setup

1. Instantiate the client in your Convex functions:

// convex/brightDatasets.ts
import { components } from "./_generated/api.js";
import { BrightDatasets } from "@sholajegede/convex-bright-data-datasets";

export const brightDatasets = new BrightDatasets(components.convexBrightDataDatasets, {
  BRIGHTDATA_API_TOKEN: process.env.BRIGHTDATA_API_TOKEN!,
});

2. Mount the webhook handler in convex/http.ts:

import { httpRouter } from "convex/server";
import { components } from "./_generated/api.js";
import { createWebhookHandler } from "@sholajegede/convex-bright-data-datasets";

const http = httpRouter();

http.route({
  path: "/webhooks/brightdata",
  method: "POST",
  handler: createWebhookHandler(components.convexBrightDataDatasets),
});

export default http;

3. Set your Convex environment variable:

npx convex env set BRIGHTDATA_API_TOKEN your_token_here

Your Convex HTTP actions URL (the webhook endpoint to register in Bright Data) is:

https://<your-deployment>.convex.site/webhooks/brightdata

You can find this by running npx convex dev and looking for VITE_CONVEX_SITE_URL in your .env.local.

Usage

Trigger an async collection

// convex/myFunctions.ts
import { action, query } from "./_generated/server.js";
import { components } from "./_generated/api.js";
import { brightDatasets } from "./brightDatasets.js";
import { v } from "convex/values";

// Trigger a LinkedIn profile collection
export const collectProfiles = action({
  args: { urls: v.array(v.string()) },
  handler: async (ctx, args) => {
    return await brightDatasets.trigger(ctx, {
      datasetId: "gd_l1viktl72bvl7bjuj0", // LinkedIn profiles dataset
      inputs: args.urls.map((url) => ({ url })),
      webhookUrl: process.env.CONVEX_SITE_URL + "/webhooks/brightdata",
    });
    // Returns: { snapshotId: "s_...", status: "pending" }
  },
});

// Reactive query — subscribe to snapshot status from the frontend
export const getSnapshot = query({
  args: { snapshotId: v.string() },
  handler: async (ctx, args) => {
    return await ctx.runQuery(components.convexBrightDataDatasets.lib.getSnapshot, {
      snapshotId: args.snapshotId,
    });
  },
});

// Reactive query — subscribe to records as they arrive
export const getRecords = query({
  args: { snapshotId: v.string() },
  handler: async (ctx, args) => {
    return await ctx.runQuery(components.convexBrightDataDatasets.lib.getRecords, {
      snapshotId: args.snapshotId,
    });
  },
});
// React — subscribes reactively, re-renders when status or records update
const snapshot = useQuery(api.myFunctions.getSnapshot, { snapshotId });
// snapshot.status   — "pending" | "collecting" | "digesting" | "ready" | "failed" | "canceled"
// snapshot.recordCount — number of records received so far

const records = useQuery(api.myFunctions.getRecords, { snapshotId });
// records — array of structured records from Bright Data, parsed from NDJSON

Synchronous scrape (small jobs)

export const scrapeProfile = action({
  args: { url: v.string() },
  handler: async (ctx, args) => {
    return await brightDatasets.scrape(ctx, {
      datasetId: "gd_l1viktl72bvl7bjuj0",
      inputs: [{ url: args.url }],
    });
    // Returns: { records: [...], status: "ready" }
    // If job exceeds 1 min: { records: [], snapshotId: "s_...", status: "running" }
  },
});

Poll for status

export const checkStatus = action({
  args: { snapshotId: v.string() },
  handler: async (ctx, args) => {
    return await brightDatasets.pollStatus(ctx, args.snapshotId);
    // Fetches from Bright Data, updates snapshot in Convex, returns current status
  },
});

Cancel a collection

export const cancelJob = action({
  args: { snapshotId: v.string() },
  handler: async (ctx, args) => {
    return await brightDatasets.cancel(ctx, args.snapshotId);
  },
});

List all snapshots

export const listJobs = query({
  args: {},
  handler: async (ctx) => {
    return await ctx.runQuery(components.convexBrightDataDatasets.lib.listSnapshots, {
      limit: 20,
    });
  },
});

Discovery mode

// Discover Amazon products by keyword
export const discoverProducts = action({
  args: { keywords: v.array(v.string()) },
  handler: async (ctx, args) => {
    return await brightDatasets.trigger(ctx, {
      datasetId: "gd_l7q7dkf244hwjntr0",
      inputs: args.keywords.map((keyword) => ({ keyword })),
      discoveryMode: "discover_new",
      discoverBy: "keyword",
      limitPerInput: 10,
      webhookUrl: process.env.CONVEX_SITE_URL + "/webhooks/brightdata",
    });
  },
});

API

BrightDatasets class

| Method | Description | |--------|-------------| | trigger(ctx, opts) | Trigger an async Bright Data dataset collection. Returns { snapshotId, status } immediately. | | scrape(ctx, opts) | Synchronous scrape for small single-URL jobs. Falls back to snapshot polling if job exceeds 1 minute. | | pollStatus(ctx, snapshotId) | Poll Bright Data for snapshot status and sync to Convex. | | cancel(ctx, snapshotId) | Cancel a running collection. | | getSnapshot(ctx, snapshotId) | Get snapshot metadata. Reactive via useQuery. | | listSnapshots(ctx, opts?) | List snapshots, optionally filtered by datasetId, status, or limit. Reactive. | | getRecords(ctx, snapshotId, limit?) | Get stored records for a snapshot. Reactive — updates as webhook delivers data. | | getDeliveryLogs(ctx, snapshotId) | Get webhook delivery events for debugging. Reactive. |

createWebhookHandler(component)

Creates the HTTP action handler for receiving Bright Data webhook deliveries. Mount in convex/http.ts.

trigger options

| Option | Type | Description | |--------|------|-------------| | datasetId | string | Bright Data dataset ID (e.g. gd_l1viktl72bvl7bjuj0) | | inputs | object[] | Array of input objects (e.g. [{ url: "..." }]) | | format | string? | Output format: "json" | "ndjson" | "csv" (default: "json") | | webhookUrl | string? | Webhook URL where Bright Data delivers results | | notifyUrl | string? | Notification URL called on completion with snapshot_id and status | | discoveryMode | string? | Set to "discover_new" to enable discovery | | discoverBy | string? | Discovery method: "keyword" | "category_url" | "best_sellers_url" | "location" | | limitPerInput | number? | Max results per input (discovery mode) | | totalLimit | number? | Max total results | | customOutputFields | string? | Pipe-separated fields to return (e.g. "url\|name\|price") | | includeErrors | boolean? | Include error records in results (default: true) |

Snapshot status lifecycle

pending → collecting → digesting → ready
                                 → failed
                                 → canceled

Reactive queries (call via ctx.runQuery)

| Function | Args | Returns | |----------|------|---------| | components.convexBrightDataDatasets.lib.getSnapshot | { snapshotId } | Snapshot or null | | components.convexBrightDataDatasets.lib.listSnapshots | { datasetId?, status?, limit? } | Array of snapshots | | components.convexBrightDataDatasets.lib.getRecords | { snapshotId, limit? } | Array of records | | components.convexBrightDataDatasets.lib.getDeliveryLogs | { snapshotId } | Array of delivery events |

Example app

See example/ for a working Vite + React demo showing async dataset triggering, live snapshot status tracking, and reactive record display.

Development

npm i
npm run dev

License

Apache-2.0