npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@bsbofmusic/cdper-runtime-reddit

v1.1.3

Published

Reddit research runtime for cdper: collect, analyze, and export structured Reddit data

Readme

@bsbofmusic/cdper-runtime-reddit

Reddit research toolkit for authenticated CDP browser sessions.

Version: 8.4.0 | Runtime: Node.js ≥18 | Browser: Puppeteer-compatible CDP endpoint


Release posture

This package is usable for maintainer-operated deployments, but it is not a fully generalized product package yet.

What is verified:

  • Live collection through a reachable authenticated CDP browser session
  • Database writes through docker exec -i <container> psql ...
  • Queue-based scheduling and npm bin packaging presence
  • A real smoke run on 2026-03-23 for project piercing_pillow

What is not fully verified:

  • Automated tests are not shipped
  • Cross-environment install validation beyond the maintainer environment
  • Consistent run_id workflows across every downstream report/export/dedup path
  • Watcher / cron behavior as a polished end-user product feature

Do not market this release as “production-ready for all environments”.


Overview

@bsbofmusic/cdp-reddit collects Reddit posts/comments through an authenticated CDP browser session, stores results in PostgreSQL, and supports a research workflow of collection → annotation → export.

Current scope in this package:

  • Collection with nested comment traversal and Reddit-native depth
  • Queue-based scheduling with configurable CDP concurrency
  • Rule-based / semi-automatic JTBD annotation helpers
  • CSV / JSON export and data-quality utilities

Important current limitations:

  • run_id is written by collector / scheduler and schema support exists, but downstream tooling is still not fully normalized around run_id
  • Some utilities already share cdp-reddit-config.js; others still use older inline defaults and are not fully productized
  • Watcher / cron examples are operational recipes, not auto-installed package features

Installation

npm install @bsbofmusic/cdp-reddit puppeteer

This package expects:

  • a reachable CDP browser WebSocket endpoint
  • a PostgreSQL instance accessible via docker exec -i <container> psql ...
  • an existing schema compatible with the shipped migration files

CLI usage

All binaries are exposed through npm bin entries and can also be run with node inside the package directory.

# Register a new project
cdp-reddit-collect \
  --register \
  --project=my_project \
  --label="Dog toy JTBD research" \
  --keywords="dog toy,best chew toy" \
  --sub=dogtoys \
  --threshold=15

# Collect with stored project config
cdp-reddit-collect --project=my_project

# Collect with explicit version / run id
cdp-reddit-collect \
  --project=my_project \
  --version=v1_research \
  --run-id=my_project_20260323_181500_a7f2 \
  --keywords='["dog toy","chew toy"]' \
  --threshold=15

# Enqueue through the scheduler
cdp-reddit-scheduler --enqueue --project=my_project --version=v1

# Queue status
cdp-reddit-scheduler --status

# JTBD tagging
cdp-reddit-jtbd --project=my_project --min-score=15
cdp-reddit-jtbd-autonomous --project=my_project --batch-size=30

# Export
cdp-reddit-export --project=my_project --format=csv --min-score=10
cdp-reddit-export --project=my_project --format=json --jtbd-only

# Quality / maintenance
cdp-reddit-freshness --project=my_project --threshold=10
cdp-reddit-backfill-post --project=my_project
cdp-reddit-dedup --project=my_project
cdp-reddit-report --project=my_project --stats-only

If you prefer direct node execution, run the corresponding *.js file in this directory.


Configuration

Environment variables

| Variable | Default | Description | |----------|---------|-------------| | CDP_REDDIT_CDP_WS | none | CDP browser WebSocket endpoint; must be set for real collection | | CDP_REDDIT_DB_CONTAINER | memos-postgres | PostgreSQL Docker container name | | CDP_REDDIT_DB_USER | memos | DB username | | CDP_REDDIT_DB_NAME | memos | Database name | | CDP_REDDIT_DEFAULT_SUBREDDIT | piercing | Default subreddit when project config is absent | | CDP_REDDIT_MAX_CDP_SESSIONS | 2 | Scheduler concurrency cap | | CDP_REDDIT_WORKDIR | package dir | Scheduler working directory | | CDP_REDDIT_REPORT_FILE | cdp-reddit-report.md | Report append target | | CDP_REDDIT_WATCHER_STATE_FILE | /tmp/.cdp_watcher_state | Watcher state file | | CDP_REDDIT_WATCHER_MAX_IDLE_ROUNDS | 2 | Watcher completion threshold | | CDP_REDDIT_WATCHER_MINUTES | 30 | Watcher idle alert threshold | | CDP_REDDIT_WATCHER_PROJECTS | empty | Optional comma-separated project list for watcher summary output |

Notes:

  • cdp-reddit-collect.js, cdp-reddit-scheduler.js, and cdp-reddit-report-gen.js already consume shared config from cdp-reddit-config.js
  • Other scripts still contain inline defaults and should be reviewed before broader packaging
  • Any built-in default CDP endpoint in source should be overridden in real deployments

Database connection

Current implementation shells out to:

docker exec -i <container> psql -U <user> -d <db>

This package does not yet provide a direct TCP PostgreSQL client path for all scripts.


Data model

Semantic layers

| Field | Scope | Description | |-------|-------|-------------| | project | Research boundary | Data isolation key | | version | Batch label | Human-readable strategy / keyword-set label per run | | data_source | Compatibility layer | Currently equals version; stored in DB data_source column | | run_id | Single execution instance | Unique execution id written by collector / scheduler |

Current run_id status

  • reddit_crawl_log.run_id exists and is written by the collector
  • reddit_comments.run_id exists in schema and is written by the collector
  • reddit_posts.run_id exists in schema and is written by the collector
  • ⚠️ Report / export / dedup paths have --run-id support, but maintainers should still verify the exact output path they rely on before external release claims

Key tables

| Table | Status | |-------|--------| | reddit_comments | Main collected comments table | | reddit_posts | Post metadata table | | crawl_projects | Registered projects and defaults | | reddit_crawl_log | Per-run execution log; authoritative source for current run_id metrics |

See CHANGELOG_PHASE1.md and migrations/2026-03-23-add-run-id.sql for the current schema transition notes.


Included scripts

| Script | Purpose | |--------|---------| | cdp-reddit-collect.js | Main collector | | cdp-reddit-scheduler.js | Queue / concurrency scheduler | | cdp-reddit-watcher.sh | Optional watchdog script for local ops use | | cdp-reddit-keyword.js | Keyword pre-analysis | | cdp-reddit-jtbd.js | Rule-based JTBD tagging | | cdp-reddit-jtbd-autonomous.js | Batch JTBD tagging | | cdp-reddit-export.js | CSV / JSON export | | cdp-reddit-freshness.js | Freshness detection | | cdp-reddit-dedup.js | Deduplication / version cleanup | | cdp-reddit-report-gen.js | Data quality report generator | | cdp-reddit-backfill-post.js | Historical post metadata backfill |


Smoke checklist

Use this before publishing or tagging a release.

1) Install/package smoke

npm install
npm run smoke:install

Confirms the declared npm bin targets exist.

2) Environment smoke

Prepare .env from .env.example and set a real CDP endpoint.

Minimum required values:

  • CDP_REDDIT_CDP_WS
  • CDP_REDDIT_DB_CONTAINER
  • CDP_REDDIT_DB_USER
  • CDP_REDDIT_DB_NAME

3) Live collector smoke

cd /path/to/cdp-reddit
set -a
source ./.env
set +a

node cdp-reddit-collect.js \
  --project=smoke_project \
  --version=smoke_v1 \
  --run-id=smoke_project_$(date +%Y%m%d_%H%M%S) \
  --keywords='["test keyword"]' \
  --threshold=1

Verify all of the following manually:

  • CDP bridge is reachable
  • browser is actually ready, not just the bridge process
  • collector exits successfully
  • reddit_crawl_log contains the expected run_id
  • inserted rows exist in reddit_comments and/or reddit_posts

4) Scheduler/report smoke

node cdp-reddit-scheduler.js --status
node cdp-reddit-report-gen.js --project=smoke_project --stats-only

5) Optional watcher smoke

CDP_REDDIT_WATCHER_PROJECTS="smoke_project" bash cdp-reddit-watcher.sh

Treat watcher output as ops assistance, not as release-grade verification by itself.


Release steps

  1. Confirm package metadata in package.json
  2. Confirm .env.example matches the current env contract
  3. Run the smoke checklist above in the target environment
  4. Confirm required migrations are present and applied
  5. Review CHANGELOG_PHASE1.md so release notes match current facts
  6. Publish only after repository/homepage fields are filled with real values

Example publish flow:

npm pack
npm publish

Only publish after maintainer review of metadata, access, and release notes.


Operational notes

Watcher

cdp-reddit-watcher.sh is included as an ops helper script. It is shipped as a CLI entry, but it should still be treated as environment-specific operations tooling rather than a polished general-purpose watcher.

Current behavior to know before publishing:

  • it reads whole-table counts from reddit_comments
  • it can optionally print per-project summaries via CDP_REDDIT_WATCHER_PROJECTS="proj_a,proj_b"
  • when no watcher project list is provided, it falls back to whole-table totals only
  • it does not prove cron is installed or that a deployment is self-managing

Example:

CDP_REDDIT_WATCHER_PROJECTS="my_project,backup_project" bash cdp-reddit-watcher.sh

Cron metadata

package.json contains cron metadata as suggestions only. They are not proof that cron jobs are installed in the current environment.

Live smoke result

Validated on 2026-03-23 with a real piercing_pillow smoke run:

  • CDP bridge reachable but browser initially not ready
  • Browser successfully woken via /control/start?mode=advanced&profile=Default
  • Collector completed with run_id = smoke_piercing_pillow_20260323_f6
  • reddit_crawl_log.comments_new = 3
  • reddit_comments inserted rows for that run = 3

Productization gaps

Before promoting this as a broadly reusable npm package, validate:

  • repository / homepage metadata
  • whether the default CDP endpoint should be removed from source defaults
  • whether watcher behavior should stay shipped, be generalized, or be documented as example ops tooling only

Known limitations

  • No automated test suite is shipped yet
  • Some scripts still rely on environment-specific assumptions or legacy defaults
  • Direct PostgreSQL connectivity is not standardized across scripts
  • Release claims should stay limited to maintainer-verified workflows unless broader validation is completed

Files

  • README.md — package overview, smoke checklist, and release posture
  • package.json — npm metadata and CLI entry points
  • .env.example — deploy-time environment template
  • CHANGELOG_PHASE1.md — current implementation status for Phase 1
  • migrations/2026-03-23-add-run-id.sql — current shipped migration