reddit-blog-scout
v1.1.0
Published
Mine Reddit discussions for blog topic ideas: keyword -> real posts + subreddits as Markdown. Headless-Chrome cookie harvest beats Reddit's 403 anti-bot, then plain fetch. Feeds /generate-blog.
Maintainers
Readme
reddit-blog-scout
Mine Reddit discussions for blog topic ideas. Give it a keyword, get the real posts and
subreddits back as Markdown, then feed that into SEO-focused blog topic discovery. Pure
JavaScript (npm package). It beats Reddit's 403 anti-bot wall by harvesting a guest cookie
with headless Chrome once, then making fast plain fetch requests with that cookie.
Two ways to use it:
/generate-blog(Claude Code command — the main path) — end-to-end and interactive: Reddit research → topic pick → full SEO blog + images. No OpenAI (Claude does query + topic generation). NeedsGEMINI_API_KEYfor images.reddit-scout(standalone CLI / library) — just scrapes Reddit and writes the raw posts + subreddits to.previous/<keyword>.md. No AI, no keys.
Install
npm i -D reddit-blog-scout # installs puppeteer + stealth + bundled ChromiumRequires Node >= 20.6 (uses process.loadEnvFile for .env).
CLI
npx reddit-scout "instagram dm automation" # writes .previous/<keyword>.md
npx reddit-scout "bitcoin trading" --limit 15
npx reddit-scout "bitcoin trading" --deep 3 # also pull the 3 top posts' bodies + commentsFlags:
| Flag | Description | Default |
|-------------|--------------------------------------------------------------------------|---------|
| --limit N | Number of posts / subreddits to fetch. | 10 |
| --deep N | Deep mode: also fetch the body (selftext) + top comments of the N highest-scoring posts. 0 = surface only (titles/scores). | 0 |
Output .previous/<keyword>.md:
# Reddit Research: <keyword>
_Query: `<query>` — N posts across M subreddits. No AI applied; topic ideas are generated downstream._
## Posts
- [title](url) — r/subreddit (score pts)
...
## Subreddits
- r/name (subscribers subs) — description
...With --deep N, a Deep Dive section is appended below (the sections above are unchanged):
## Deep Dive (post bodies + top comments)
### <title> — r/<subreddit> (<score> pts)
<url>
<post body / selftext>
**Top comments:**
- (<score>) <comment body>
- ...Deep mode is best-effort: if a single thread is blocked, that post falls back to an empty body / no comments instead of failing the whole run.
Programmatic use (devDependency)
import { search } from "reddit-blog-scout";
const { subreddits, posts, threads } = await search("instagram dm automation", 10, 3);
// posts: [{ title, subreddit, score, url, numComments, upvoteRatio, permalink }, ...]
// subreddits: [{ name, subscribers, description }, ...]
// threads: [{ title, subreddit, score, url, selftext, comments: [{ body, score }] }, ...]
// (the 3rd arg is `deep`; 0/omitted -> threads is [])On the first call the cookie is harvested with headless Chrome and cached; later calls return straight from the cache over plain fetch (no browser launch).
.env variables
| Variable | Description | Required | Default |
|------------------------|--------------------------------------------------|----------------------|------------------|
| GEMINI_API_KEY | Image generation — /generate-blog image step | Yes for images | — |
| IMG_ASPECT | Image aspect ratio | Optional | 16:9 |
| IMG_WIDTH | Image width (px, via macOS sips) | Optional | 1200 |
| IMG_HEIGHT | Image height (px, via macOS sips) | Optional | 630 |
| REDDIT_COOKIE_TTL_MS | Guest-cookie cache lifetime (ms) | Optional | 21600000 (6h) |
The /generate-blog command (main path)
/generate-blog is a Claude Code command; it is NOT shipped inside the npm package
(npm i only pulls the CLI + library). To use it, copy the command file from this repo into
your own project:
# 1) Install the CLI as a devDependency (ships reddit-scout + blog-image)
npm i -D reddit-blog-scout
# 2) Drop the command file into your project's .claude/commands/
mkdir -p .claude/commands
curl -o .claude/commands/generate-blog.md \
https://raw.githubusercontent.com/akifkadioglu/reddit-ai-scout/main/.claude/commands/generate-blog.md
# (or copy .claude/commands/generate-blog.md from this repo by hand)
# 3) Key for images
echo "GEMINI_API_KEY=..." >> .envThen fill in the ──── CONFIG ──── block at the top of that file (brand, author pool,
category whitelist, image style, tone, paths) — everything below it is generic and reads
those values. Now, inside Claude Code:
/generate-blog <locale> <keyword> # locale optional, defaults to en- Research — Claude turns the keyword into a Reddit query and pulls posts via
npx reddit-scout "<query>". - Pick a topic — it offers 4 blog topics from the real discussions; pick with arrow keys or type your own.
- Additions — it asks "anything to add?" (angle, audience, tone, length…).
- Generate — it writes the full SEO blog post to
content/blog/<locale>/<slug>.md. - Images — Claude generates the cover + in-content images directly via
npx blog-image(GEMINI_API_KEY).
How it reaches Reddit
Reddit blocks plain HTTP with a 403 on most IPs (both stdlib urllib and node fetch).
This tool gets past that in two steps:
- Cookie harvest (headless Chrome). Once, headless Chrome (Puppeteer + stealth) visits reddit.com and harvests the guest cookie via
page.cookies()(including httpOnly cookies). The cookie is cached in the OS cache dir (~/.cache/reddit-blog-scout/cookie.json,%LOCALAPPDATA%on Windows), so the consumer's repo stays clean. - Requests (plain fetch). Every later search is a plain
fetchwith the cookie header and the same Chrome UA — no browser launch, so it's fast. Once the cookie goes stale pastREDDIT_COOKIE_TTL_MS(default 6h) it is re-harvested automatically.
Note: Reddit blocks the
HeadlessChromeUA, so both the harvest and the fetch use the same normal Chrome UA (a UA mismatch is itself a 403 trigger).
If a request is blocked (an HTML wall instead of JSON), the cookie is force-refreshed once and retried. On anti-bot flagged IPs (datacenter / VPN) the guest cookie may still not be enough — in that case you get a clear error; try a clean network (there is no interactive login fallback).
Image CLI (blog-image)
/generate-blog calls this for every image; you can also run it by hand:
PROMPT='bright modern home office, no text, no logos' \
OUT='public/images/blogs/my-post/cover.jpg' \
npx blog-imageIt generates with Gemini gemini-2.5-flash-image (Nano Banana) and skips if OUT already
exists (idempotent). On macOS it resizes to exact pixels via sips; elsewhere it stays at
the API aspect ratio.
Layout
reddit-blog-scout/
├── .claude/commands/
│ └── generate-blog.md # /generate-blog command (CONFIG + rules)
├── bin/
│ ├── reddit-scout.js # CLI: research
│ └── blog-image.js # CLI: Gemini image generation
├── src/
│ ├── cookie.js # Guest-cookie harvest (headless) + OS cache
│ ├── reddit.js # Reddit search (plain fetch + cookie)
│ └── markdown.js # Markdown render + .previous/ output
├── .previous/ # reddit-scout output (<keyword>.md)
├── package.json
└── .env.exampleNotes
- The Reddit cookie is harvested once with a headless browser (Puppeteer), then requests are plain fetch — no OAuth/API key.
npm ialso downloads Chromium. /generate-blogneeds no OpenAI; Claude does query + topic generation.- Image generation needs
GEMINI_API_KEY; without itblog-imagewarns. - The cookie cache lives in the OS cache dir, not the consumer's repo;
.envis git-ignored.
License
MIT © Akif
