@heripo/research-radar

v3.1.1

Published

3 days ago

AI-driven intelligence for Korean cultural heritage. This package serves as both a ready-to-use newsletter service and a practical implementation example for the LLM-Newsletter-Kit.

heripo Research Radar

English | 한국어

license

Code of Conduct • Security Policy • Contributing

What is this?

An AI-powered newsletter service for Korean cultural heritage. Built on @llm-newsletter-kit/core, it's both a production service (live at heripo.com) and a reference implementation showing how to build automated newsletters with LLMs.

Production metrics:

Cost: $0.2-1 USD per issue
Operation: Fully autonomous 24/7 (no human intervention)
Engagement: 15% CTR

Technical highlights:

Type-safe TypeScript with strict interfaces
Provider pattern for swapping components (Crawling/Analysis/Content/Email)
66 crawling targets across heritage agencies, museums, academic societies
Multi LLM providers: OpenAI GPT-5 (analysis) + selectable content generation (OpenAI / Anthropic / Google)
Built-in retries, chain options, preview emails

Links: Live service • Newsletter example • Core engine

Background

Created by archaeologist-turned-engineer Hongyeon Kim to answer: "Why must research rely on labor-intensive manual work?"

A personal script evolved into a production service after completing research on Archaeological Informatization Using LLMs. This repository open-sources the running service so developers can build domain-specific newsletters without starting from scratch.

License

Apache License 2.0 — see LICENSE and NOTICE for details.

Citation & Attribution

If you fork this project to build your own newsletter service or use this code in your research, please include the following attribution:

Powered by LLM Newsletter Kit

We recommend adding this notice to your newsletter template footer or service documentation. This attribution helps support the project and its continued development.

BibTeX Citation

For academic publications:

@software{heripo research radar,
  author = {Kim, Hongyeon},
  title = {heripo research radar},
  year = {2025},
  url = {https://github.com/heripo-lab/heripo-research-radar},
  note = {Apache License 2.0}
}

Installation

npm install @heripo/research-radar @llm-newsletter-kit/core

Requirements: Node.js >= 24, OpenAI API key, content generation API key (OpenAI / Anthropic / Google)

Note: @llm-newsletter-kit/core is a peer dependency and must be installed separately.

Quick Start

import { generateNewsletter } from '@heripo/research-radar';

const newsletterId = await generateNewsletter({
  openAIApiKey: process.env.OPENAI_API_KEY,
  contentGeneration: {
    provider: 'anthropic',           // 'openai' | 'anthropic' | 'google'
    apiKey: process.env.ANTHROPIC_API_KEY,
    // model: 'claude-sonnet-4-6',   // optional, uses sensible default
  },

  // Implement these repository interfaces (see src/types/dependencies.ts)
  taskRepository: {
    createTask: async () => db.tasks.create({ status: 'running' }),
    completeTask: async (id) => db.tasks.update(id, { status: 'completed' }),
  },

  articleRepository: {
    findByUrls: async (urls) => db.articles.findByUrls(urls),
    saveCrawledArticles: async (articles, ctx) => db.articles.save(articles, ctx),
    findUnscoredArticles: async () => db.articles.findUnscored(),
    updateAnalysis: async (article) => db.articles.updateAnalysis(article),
    findCandidatesForNewsletter: async () => db.articles.findCandidates(),
  },

  tagRepository: {
    findAllTags: async () => db.tags.findAll(),
  },

  newsletterRepository: {
    getNextIssueOrder: async () => db.newsletters.getNextOrder(),
    saveNewsletter: async (data) => db.newsletters.save(data),
  },

  // Optional parameters:
  logger: console,
  publishDate: '2026-02-20',        // Override publication date (ISO format)
  templateOptions: { /* ... */ },    // Newsletter template customization
  customFetch: proxyFetch,           // Custom fetch for proxy-based crawling
  previewNewsletter: {
    fetchNewsletterForPreview: async () => db.newsletters.latest(),
    emailService: resendEmailService,
    emailMessage: { from: '[email protected]', to: '[email protected]' },
  },
});

Repository interfaces are defined in src/types/dependencies.ts. Each method signature includes JSDoc with expected input/output types.

Architecture

Pipeline: Crawling → Analysis → Content Generation → Save

Crawling: Fetch articles from target websites
Analysis: LLM tags and scores articles
Generation: Create newsletter from top-scoring articles
Save: Store and optionally send preview email

Uses the Provider-Service pattern from @llm-newsletter-kit/core. See core docs for flow diagrams.

Components

Config (src/config/): Brand, language, LLM settings

Targets (src/config/crawling-targets.ts): 66 sources (News 52, Business 4, Employment 10)

Parsers (src/parsers/): Custom extractors per organization

Templates (src/templates/): newsletter-html.ts (responsive email with light/dark mode), welcome-html.ts (generateWelcomeHTML()), shared.ts (shared HTML components)

Development commands

# build
npm run build              # clean dist/ and build with Rollup (CJS + ESM + types)

# type-check & lint
npm run lint               # lint source files
npm run lint:fix           # lint with autofix
npm run typecheck          # TypeScript type-check

# formatting
npm run format             # Prettier formatting

Crawler Debugger

A web-based tool for testing crawling parsers during development. Built with Express.js and vanilla HTML/CSS/JS to minimize dependencies.

npm run dev:crawler        # Start at http://localhost:3333
npm run dev:crawler:proxy  # Start with proxy support (uses .env)

Features:

Test parseList() and parseDetail() parsers via web UI
View raw HTML source for debugging
Copy parsed results as JSON
5-minute response cache (with skip/clear options)
Timing info for fetch and parse operations

Newsletter Preview

Preview rendered newsletter HTML with sample content.

npm run dev:newsletter-preview  # Start at http://localhost:3334

Query params: ?kras=true (KRAS mode), ?krasNews=true (KRAS news section), ?heripolabNews=true (heripo lab news section)

Welcome Email Preview

Preview rendered welcome email HTML.

npm run dev:welcome-preview  # Start at http://localhost:3335

Query params: ?kras=true (KRAS mode), ?name=홍길동 (subscriber name)

Parser Health-Check

CLI tool that validates all active crawling parsers against live websites. Detects silent failures (empty results, broken selectors) caused by upstream website redesigns.

npm run health-check        # Run health-check
npm run health-check:proxy  # Run with proxy support (uses .env)

What it checks per target:

parseList(): Returns non-empty array with valid title, date, and detailUrl
parseDetail(): Returns non-empty detailContent (20+ chars)

Output: Console table summary + compact text summary for CI integrations.

CI: Daily automated run via GitHub Actions (.github/workflows/parser-health-check.yml) with Slack notifications on pass/fail.

🤝 Contributing

You can use this project in two ways:

Contribute directly to Heripo Research Radar: bug fixes, improvements, new crawl targets
Build your own newsletter: fork this repo and adapt it to your domain

See CONTRIBUTING.md for contribution workflow, dev setup, and PR guidelines.

Forking for Your Domain

To build your own newsletter, update these files:

1. Template (src/templates/newsletter-html.ts):

Logo URLs, brand colors (#D2691E, #E59866), contact info
Platform intro and footer text
Unsubscribe link format (currently Resend's {{{RESEND_UNSUBSCRIBE_URL}}})

2. Config (src/config/index.ts):

brandName: 'Your Newsletter Name'
subscribeUrl: 'https://yourdomain.com/subscribe'

3. Crawling targets (src/config/crawling-targets.ts):

Replace Korean heritage sites with your domain sources
Implement parsers in src/parsers/

4. Switch content generation LLM provider (optional):

Content generation supports 3 built-in providers — just change contentGeneration.provider:

contentGeneration: {
  provider: 'google',  // 'openai' | 'anthropic' | 'google'
  apiKey: process.env.GOOGLE_API_KEY,
  model: 'gemini-3.1-pro-preview',  // optional, each provider has a default
}

Default models: openai=gpt-5.1, anthropic=claude-sonnet-4-6, google=gemini-3.1-pro-preview

Analysis provider (OpenAI) can be changed by modifying src/providers/analysis.provider.ts. Any Vercel AI SDK provider works.

Search keywords: heripo, kimhongyeon, #D2691E, openai, gpt-5, contentGeneration

Why Code-Based?

Code-based automation delivers superior output quality through advanced AI techniques:

No-code platforms: Generic content, limited to built-in features This kit: Self-reflection, chain-of-thought, multi-step verification workflows

Key advantages:

Quality: Sophisticated prompting strategies, custom validation pipelines
Cost control: Different models per step, token limits, retry logic
Flexibility: Swap any component (Crawling/Analysis/Content/Email) via Provider interfaces
Operations: Built-in retries, preview emails, integrates with CI/CD
No lock-in: OSS, self-hostable, any LLM provider

Design philosophy:

Logic in code (orchestration, deduplication)
Reasoning in AI (analysis, scoring, content generation)
Connections in architecture (swappable Providers)

Related Projects

@llm-newsletter-kit/core — Domain-agnostic newsletter engine
Archaeological Informatization Using LLMs — Academic research (Korean)