mcp-content

v0.10.1

Published

10 days ago

Thin MCP client for the mcp-content pipeline: researches a keyword and writes one ICP-aligned article. Scrapes locally through MCP Scraper; all generation (prompts + models) runs on a private access-gated server.

0High
0Medium
0Low

vilovieta

mcp-content

A standalone TypeScript MCP server that researches a keyword and writes one ICP-aligned article through a fully sequential, atomic-step pipeline.

No REST API. No front end. No embeddings, no Python, no Modal, no clustering.

This package is a thin client. It scrapes locally through MCP Scraper and orchestrates the run, but every generation step (note-taking, ICP inference, headings, writing) is delegated to a private access-gated server that holds the prompts, the model strategy, and the OpenRouter key. The client ships no prompts and no model logic.

How it works

The orchestrating agent (Claude or Codex) calls the Step Tools in order. Each tool does one atomic thing against a Session Folder on disk and returns only a small summary — bulk scraped pages, transcripts, and the 30 parallel OpenRouter transform calls never pass through the agent's context.

start_run(keyword)                 → sessionId
  ingest_paa(sessionId)            → 50 PAA questions               [MCP Scraper: harvest_paa]
  ingest_serp(sessionId)           → AI overview + ~20 organic (2 pages) + YT urls, scrape each page
                                                                    [MCP Scraper: search_serp pages=2, extract_url]
  ingest_youtube(sessionId)        → transcribe SERP-surfaced videos [MCP Scraper: youtube_transcribe]   (optional)
  ingest_reddit(sessionId)         → browser-agent thread search + scrape [MCP Scraper: browser_*, extract_url] (optional)
  ingest_research(sessionId)       → agent-packet deep research → Brief + Evidence   [MCP Scraper: workflow_run] (optional)
  research_stats(sessionId)        → stat-targeted harvest → sourced stats bank      [MCP Scraper + server: /api/extract-stats] (optional)
  transform_sources(sessionId)     → Note Versions, up to 8 parallel server calls         [server: /api/transform]
  run_textrazor(sessionId)         → entities for top 5 SERP sources                      [server: /api/textrazor] (optional)
  infer_icp(sessionId)             → icp.md / icp.json                                     [server: /api/icp]
  propose_headings(sessionId)      → question-formatted H1–H3, sources mapped per section  [server: /api/headings]
  write_sections(sessionId)        → one section per server call, intro skipped           [server: /api/write-section]
  assemble_article(sessionId)      → article.md
  improve(sessionId, focus?)       → critique → more research → re-craft (optional)      [server: /api/critique]
  add_images(sessionId)            → 5-stage visual plan → fal.ai photos + baked text overlays (opt)  [server: /api/image-plan, /api/image, /api/compose]
  add_graphics(sessionId)          → infographic GATE → SVG data graphics, gated & inserted (opt)     [server: /api/infographic-plan, /api/graphic]
  publish_preview(sessionId)       → ephemeral 24h Vercel URL (Article + Analysis tabs)  [server: /api/publish]

improve runs before the image steps (it regenerates the article from sections). Image system:

add_images — a 5-stage art-director plan (analyze → expand → finalize by visual-communication principles → text-overlay design → prompt) curating ~3-5 photos (one hero), generated with fal.ai nano-banana-pro via the async queue. Overlay text is composed as crisp SVG and baked into the photo with sharp (/api/compose) — AI image models can't render legible text, so text lives in a vector layer.
research_stats — runs stat-targeted searches via MCP Scraper, scrapes results, and extracts a deduped, source-attributed stats bank (value · claim · source · year). Feeds the narrative gate so data viz is grounded and honest (NN/g's "informational honesty").
add_graphics — a narrative gate (/api/visual-plan): an art director finds the article's core story, designs ONE composite lead-infographic (tall multi-panel SVG: header → hero stat → icon flow → comparison → source) plus 1-2 supporting graphics, grounding every number in the stats bank. Full data-viz toolkit, all real SVG: lead-infographic, statement-card, stat-card, diagram (flow/timeline), breakdown, comparison, bar-chart, trend, before-after, pictograph, ranked-bars, gauge, funnel, quadrant, scatter, stacked-100, quote-card — with an icon library and NN/g rules (one focal point, honest scale, limited palette, hierarchy). Never forced.

YouTube and Reddit run by default. Set includeYoutube: false / includeReddit: false on start_run to skip them.

Session Folder layout

sessions/<keyword>-<id>/
├── manifest.json           run options, per-step status, source registry
├── raw/                    paa.json, serp.json, <sourceId>.txt, textrazor-entities.json
├── notes/                  <sourceId>.json  (Note Versions)
├── icp.md / icp.json
├── headings.json
├── sections/NN.md
└── article.md              final output

Generation server

All generation is delegated to a private server (the mcp-content-server Vercel project). The server holds the prompts, the model profiles (transform → Qwen 3.6; icp/headings/writer → GPT-OSS-120B Nitro, with Qwen fallbacks), the OpenRouter key, and the TextRazor key. It validates MCP_CONTENT_API_KEY on every request. The client never sees prompts or model ids — only the tool results.

Setup

npm install
npm run build

Set environment (see .env.example):

MCP_CONTENT_API_KEY — required. Your issued access key. The server returns 401 without it.
MCP_CONTENT_SERVER_URL — optional. Defaults to the hosted server; override for your own deployment or vercel dev.
MCP_SCRAPER_CMD / MCP_SCRAPER_ARGS — how to launch MCP Scraper (defaults to ~/Desktop/mcp-scraper/dist/bin/mcp-scraper.js).
MCP_CONTENT_MAX_PARALLEL — transform fan-out cap (default 8).
MCP_CONTENT_SESSIONS_DIR — where Session Folders live (default ./sessions).

OpenRouter and TextRazor keys are not set on the client — they live on the server.

Register with an MCP client

{
  "mcpServers": {
    "mcp-content": {
      "command": "npx",
      "args": ["-y", "mcp-content"],
      "env": { "MCP_CONTENT_API_KEY": "..." }
    }
  }
}

The same client must also have MCP Scraper available to the host environment, since mcp-content launches it as a child stdio client.

MCP Scraper

mcp-content depends on the MCP Scraper server for all external scraping. Tool-name and argument shapes are read defensively (src/scraper/parse.ts) so minor response-shape differences degrade gracefully rather than crash.

Develop

npm run dev        # run the server from source over stdio
npm run typecheck
npm test

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

mcp-content

How it works

Session Folder layout

Generation server

Setup

Register with an MCP client

MCP Scraper

Develop