@llm-newsletter-kit/core

v3.0.0

Published

15 days ago

An extensible framework to automate your entire newsletter workflow. Handles data collection, LLM-based content analysis, and email generation, letting you focus on your unique domain logic.

LLM Newsletter Kit

Automate domain-expert newsletters powered by AI

license node

Important: Code of Conduct • Security Policy • Contributing

What is this?

A type‑first, extensible toolkit that automates LLM‑based newsletter creation end‑to‑end. It orchestrates Crawling → Analysis → Content Generation → Save (with optional preview email), and every stage is swappable via DI‑capable Provider interfaces so you can plug in your own crawlers/LLMs/DB/logging. Built‑in operational features (retries, chain options) help control cost and improve reliability.

Type-first design (TypeScript, ESM) with strong contracts
Flexible dependency injection: easily swap Crawling/Analysis/ContentGenerate/Task/Logging/Email
Operational features built-in: retries, chain options, preview email sending, etc.
Rollup build (ESM+d.ts), Vitest 100% coverage, GitHub Actions CI included

Project Background

This project originated from a Korean cultural heritage newsletter service called “Research Radar.”

It was architected by Kim Hongyeon, a unique archaeologist-turned-software engineer. Driven by a question he held for over a decade—"Why must research be such grueling manual labor?"—he combined his domain expertise with 10+ years of engineering experience to solve this problem.

After completing an academic research project on A Study on Archaeological Informatization Using Large Language Models (LLMs), a personal automation script created to keep up with academic trends evolved into a service with a high engagement rate (15% CTR) and near-zero maintenance cost.

Real-world production metrics:

LLM API cost: $0.2-1 USD per issue with optimized model usage
Operational overhead: Truly hands-off automation—runs 24/7 without human intervention; the only ongoing work is occasional code maintenance
Time investment: Set it up once, let it run indefinitely; it operates while you sleep

Kim extracted the generic, high-performance core engine from that service to create this toolkit, allowing other developers to build their own AI-driven media pipelines without starting from scratch.

His design philosophy: "Logic in code, reasoning in AI, connections in architecture." This principle guides every aspect of the kit—deterministic workflows are implemented in type-safe code, intelligent analysis is delegated to LLMs, and the entire system is glued together through clean, swappable interfaces.

Core (This Repository): A domain-agnostic, type-safe engine. It orchestrates the full lifecycle (Crawling → Analysis → Content Generation → Save) via DI-capable Providers.
Research Radar (Reference Implementation): A real-world application built with this Core. It serves as a live demo and a "preset" for how to implement the providers.

Quick Links

Research Radar (Live Service): https://heripo.app/research-radar/subscribe
Source Code (Usage Example): https://github.com/heripo-lab/heripo-research-radar

Why Code-Based?

Newsletter automation generally falls into two approaches: no-code and code-based. This kit takes the code-based approach because it produces significantly better output quality.

Key advantages:

Advanced AI workflows: Implement sophisticated techniques like self-reflection, chain-of-thought reasoning, and multi-step verification—impossible or prohibitively expensive in no-code platforms
Cost control: Use different models per stage, cap tokens, control retries, and prevent runaway costs with granular configuration
Full customization: Swap any component (crawlers, LLMs, databases, email) via Provider interfaces without vendor lock-in
Production-grade: Type-safe contracts, 100% test coverage, CI/CD integration, and observability built-in

Real-world output example: See the quality for yourself—an actual newsletter generated by this kit: https://heripo.app/research-radar-newsletter-example.html

Trade-off: Higher initial setup complexity vs. no-code. This kit mitigates it with strong types, IDE autocompletion, sensible defaults, and a complete reference implementation.

Installation

npm i @llm-newsletter-kit/core

Node.js >= 22 (CI verified with 24.x)

Quick Start

import type { GenerateNewsletterConfig } from '@llm-newsletter-kit/core';

import { createOpenAI } from '@ai-sdk/openai';
import { GenerateNewsletter } from '@llm-newsletter-kit/core';

const openai = createOpenAI({ apiKey: process.env.OPENAI_API_KEY });

const config: GenerateNewsletterConfig<string> = {
  contentOptions: {
    outputLanguage: 'English',
    expertField: ['Technology', 'AI'],
    // freeFormIntro: true,       // Optional: free-form briefing intro
    // titleContext: 'AI Weekly',  // Optional: keyword to include in title
  },
  dateService: {
    getPublicationISODateString: () => new Date().toISOString().split('T')[0],
    getPublicationDisplayDateString: () =>
      new Date().toLocaleDateString('en-US'),
  },
  taskService: {
    start: async () => `task-${Date.now()}`,
    end: async () => {},
  },
  crawlingProvider: {
    // customFetch: myProxyFetch,  // Optional: custom fetch for proxy support
    crawlingTargetGroups: [/* ... */],
    fetchExistingArticlesByUrls: async (urls) => [/* ... */],
    saveCrawledArticles: async (articles, context) => articles.length,
  },
  analysisProvider: {
    // Configure LLM models for analysis
    classifyTagOptions: { model: openai('gpt-5-mini') },
    analyzeImagesOptions: { model: openai('gpt-5.1') },
    determineScoreOptions: { model: openai('gpt-5.1') },
    fetchUnscoredArticles: async () => [/* ... */],
    fetchTags: async () => [/* ... */],
    update: async (article) => {},
  },
  contentGenerateProvider: {
    // Configure content generation
    model: openai('gpt-5.1'),
    issueOrder: 1,
    newsletterBrandName: 'Tech Insight Weekly',
    publicationCriteria: { minimumArticleCountForIssue: 5 },
    fetchArticleCandidates: async () => [/* ... */],
    htmlTemplate: ({ content }) => `<html>...</html>`,
    saveNewsletter: async ({ newsletter }) => ({ id: 1 }),
  },
};

const generator = new GenerateNewsletter(config);
const newsletterId = await generator.generate();

⚠️ This is a minimal example showing the structure. For a complete, production-ready implementation with:

Real database integration (Prisma/Drizzle)
Actual crawling targets and parsing logic
HTML email templates
Preview email configuration

👉 See the reference implementation: https://github.com/heripo-lab/heripo-research-radar

Customizing LLM Prompts (PromptProvider)

The kit ships with built-in prompts designed to cover a wide range of domains. However, these general-purpose prompts may not be flexible enough for specialized requirements. For example:

Your domain uses unique terminology or jargon that the default prompts don't account for
You need a specific newsletter tone or structure (e.g., academic style, casual briefing)
Tag classification requires domain-specific taxonomy rules
Importance scoring needs custom criteria tailored to your industry

PromptProvider lets you replace any built-in prompt — system prompt, user prompt, or both — on a per-stage basis while keeping the rest of the pipeline intact.

How It Works

PromptProvider is organized by pipeline stage. Every field is optional — omitted prompts fall back to the built-in defaults.

PromptProvider
├── analysis
│   ├── classifyTags        — Tag classification (system / user)
│   ├── analyzeImages       — Image analysis (system / user)
│   └── determineImportance — Importance scoring (system / user)
└── contentGenerate
    └── generateNewsletter  — Final newsletter generation (system / user)

Each prompt slot is a PromptBuilder<TContext> — an object with optional system and user functions. The function receives a typed context object containing all the data the default prompt would use (articles, tags, expert fields, dates, etc.), so you have full control over prompt construction.

Dependency note: Analysis prompts (classifyTags → analyzeImages → determineImportance) produce data that flows into the content generation prompt. Changing an analysis prompt may indirectly affect the final newsletter output.

Example

import type {
  GenerateNewsletterConfig,
  PromptProvider,
} from '@llm-newsletter-kit/core';

const promptProvider: PromptProvider = {
  analysis: {
    // Override only the system prompt for tag classification
    classifyTags: {
      system: (ctx) =>
        `You are a legal-domain specialist. Classify articles using ` +
        `legal taxonomy standards. Available tags: ${ctx.existTags.join(', ')}. ` +
        `Output language: ${ctx.outputLanguage}.`,
      // user prompt falls back to the built-in default
    },
    // Override importance scoring with domain-specific criteria
    determineImportance: {
      system: (ctx) =>
        `Score article importance for ${ctx.expertFields.join(', ')} professionals. ` +
        `Regulatory changes and court rulings score 8+. ` +
        `Commentary and opinion pieces score 3-5.`,
      user: (ctx) =>
        `Article: ${ctx.targetArticle.title}\n` +
        `Content: ${ctx.targetArticle.detailContent}\n` +
        `Score this article 1-10.`,
    },
  },
  contentGenerate: {
    // Override the newsletter generation prompt entirely
    generateNewsletter: {
      system: (ctx) =>
        `You produce a weekly legal digest for "${ctx.newsletterBrandName}". ` +
        `Write in ${ctx.outputLanguage}. Use formal academic tone.`,
      user: (ctx) =>
        `Publication date: ${ctx.dateService.getPublicationDisplayDateString()}\n\n` +
        ctx.targetArticles
          .map((a) => `- [${a.title}](${a.url}) (score: ${a.importanceScore})`)
          .join('\n'),
    },
  },
};

const config: GenerateNewsletterConfig<string> = {
  // ... other config (contentOptions, dateService, taskService, providers, etc.)
  promptProvider, // Inject custom prompts
};

For full context types (ClassifyTagsPromptContext, GenerateNewsletterPromptContext, etc.), see src/generate-newsletter/models/prompt-provider.ts.

Public API Overview

Entry point: src/index.ts

Class
- GenerateNewsletter
  - constructor(config: GenerateNewsletterConfig)
  - generate(): Promise<string | number | null>
Main Types
- TaskService { start(): Promise; end(): Promise }
- CrawlingProvider { crawlingTargetGroups, customFetch?, fetchExistingArticlesByUrls, saveCrawledArticles, ... }
- AnalysisProvider { classifyTagOptions.model, analyzeImagesOptions.model, determineScoreOptions(model, minimumImportanceScoreRules), fetchUnscoredArticles, fetchTags, update }
- ContentGenerateProvider { model and generation options, issueOrder, publicationCriteria, subscribePageUrl, newsletterBrandName, fetchArticleCandidates, htmlTemplate, saveNewsletter }
- GenerateNewsletterOptions { logger, llm, chain, previewNewsletter(emailService/emailMessage/fetchNewsletterForPreview) }
- PromptProvider { analysis?(classifyTags, analyzeImages, determineImportance), contentGenerate?(generateNewsletter) }
- Domain models: DateService, EmailService, Newsletter, etc.

For detailed field descriptions, see src/generate-newsletter/models/interfaces.ts and type definitions under src/models/*.

Architecture & Flow

CrawlingChain: Collect/parse/save articles from targets
AnalysisChain: Tagging/image analysis/importance scoring and update
ContentGenerateChain: Select candidates → generate Markdown via LLM → apply template (HTML) → save → return id
If previewNewsletter option is present, send a preview email

All chains are composed as a single pipeline using @langchain/core/runnables sequence.

Crawling & Parsing Philosophy: "Bring Your Own Scraper"

This kit prioritizes flexibility over rigid tooling. Instead of locking you into a specific scraper (like Puppeteer, Playwright, or Cheerio), we define a strict interface for the pipeline. We handle the flow; you handle the logic.

Total Freedom: You can use lightweight HTTP requests for static sites or full headless browsers for complex SPAs. As long as you satisfy the CrawlingProvider interface, anything works.
Asynchronous Injection: Parsing logic is injected asynchronously, allowing you to integrate third-party APIs or AI-based parsers effortlessly.
Recommendation: While the kit supports LLM-based parsing (HTML-to-JSON), we generally recommend rule-based parsing (e.g., CSS selectors) for production environments to ensure speed, cost-efficiency, and stability.

Playground

Playground scripts let you run individual LLM query classes in isolation — no full pipeline needed. Useful for prompt tuning, testing new options, or debugging output quality.

Setup

Install playground dependencies:
```
npm install -D tsx @ai-sdk/openai
```

Copy example data files and customize:

mkdir -p playground/data
cp playground/data-examples/config.example.json playground/data/config.json
cp playground/data-examples/articles.example.json playground/data/articles.json
cp playground/data-examples/template.example.html playground/data/template.html

Edit playground/data/config.json with your OpenAI API key and options.
Edit playground/data/articles.json with your target articles.
(Optional) Replace playground/data/template.html with your actual email template.

Run

npm run playground:generate-newsletter

Output

Results are saved to playground/output/ (git-ignored):

newsletter.md — Generated markdown with title in frontmatter
newsletter.html — Rendered HTML with CSS inlined (juice)

Data Management

| Directory | Git | Purpose | | --------------------------- | ------- | --------------------------------------- | | playground/data-examples/ | Tracked | Format reference files (.example.*) | | playground/data/ | Ignored | Your actual config, articles, templates | | playground/output/ | Ignored | Generated results |

Development / Build / Test / CI

For the full developer guide (environment, scripts, testing/coverage, and CI), see CONTRIBUTING.md.

Contributing & Policies

Please refer to CONTRIBUTING.md for all contribution guidelines and project policies, including:

Issue labels and triage
Branch strategy and PR process
Versioning and release policy
CI workflow and coverage requirements

Citation & Attribution

If you use this project in your research, service, or derivative works, please include the following attribution:

Powered by LLM Newsletter Kit

This acknowledgment helps support the open source project and gives credit to its contributors.

BibTeX Citation

For academic papers or research documentation, you may use the following BibTeX entry:

@software{llm_newsletter_kit,
  author = {Kim, Hongyeon},
  title = {LLM Newsletter Kit: Type-First Extensible Toolkit for Automating LLM-Based Newsletter Creation},
  year = {2025},
  url = {https://github.com/heripo-lab/llm-newsletter-kit-core},
  note = {Apache License 2.0}
}

Sponsor

If you’d like to support heripo lab's open-source research, you can sponsor us through:

Open Collective for general project sponsorship.
fairy.hada.io/@heripo for Korean individual supporters who prefer KRW payments.