@growth-labs/seo
v0.5.1
Published
Astro integration for complete SEO infrastructure on Cloudflare. Handles JSON-LD structured data, meta tags, sitemaps, RSS/podcast/Apple News feeds, AEO (Answer Engine Optimization) with crawler-class dispatch, multilingual support, robots.txt, llms.txt /
Readme
@growth-labs/seo
Astro integration for complete SEO infrastructure on Cloudflare. Handles JSON-LD structured data, meta tags, sitemaps, RSS/podcast/Apple News feeds, AEO (Answer Engine Optimization) with crawler-class dispatch, multilingual support, robots.txt, llms.txt / llms-full.txt, and build-time validation.
Quick start — prerendered content site (WarFronts pattern)
For immutable-after-publish content (articles, docs, archived posts). Zero bindings required.
import seo from '@growth-labs/seo'
export default defineConfig({
integrations: [
seo({
site: 'https://warfronts.channel',
organization: {
name: 'WarFronts',
logo: 'https://media.warfronts.channel/logos/header.png',
},
aeoTwins: true, // → { mode: 'static' }
llmsTxt: true,
rss: true,
// Module specifier (not function). Required for the Astro Cloudflare adapter's
// prerender Worker, which doesn't re-execute astro.config.mjs so a captured
// function reference never reaches it. The module must default-export the
// ContentProvider.
contentProviderModule: '/src/lib/content-provider.mjs',
}),
],
})// src/lib/content-provider.mjs
import { getCollection } from 'astro:content'
export default async function contentProvider({ type, slugs }) {
const entries = await getCollection(type === 'articles' ? 'articles' : 'pages')
return entries
.filter((e) => !slugs || slugs.includes(e.slug))
.map((e) => ({ url: `https://example.com/${e.slug}`, title: e.data.title, ... }))
}mode: 'static' emits .md twin files at build time (via an injected prerender route) to dist/client/article/<slug>.md. Cloudflare Assets (or any static host) serves them directly.
Premium publisher (fronts.co pattern) — SSR, gated content, Flexible Sampling
For paid publications where members see the full body and verified Googlebot gets the paywall-marked full body under Google's Flexible Sampling policy.
seo({
site: { envVar: 'SITE_URL' },
organization: { name: 'Fronts', logo: 'https://fronts.co/logo.png' },
aeoTwins: {
mode: 'middleware',
onDemandRevalidation: true,
revalidateToken: import.meta.env.SEO_REVALIDATE_TOKEN, // ≥32 random bytes
freshLayer: { bindingName: 'AEO_TWINS', type: 'r2' },
},
flexibleSampling: { enabled: true, sampleMode: 'lead-in', leadInParagraphs: 2 },
llmsTxt: true,
llmsFullTxt: true,
contentProviderModule: '/src/lib/content-provider.mjs',
})Gated articles need prerender: false on their route file — enforced at build time via the prerender-gated-content guard.
Site URL Resolution
site accepts three forms:
seo({ site: 'https://fronts.co', ... })
seo({ site: () => 'https://fronts.co', ... })
seo({ site: { envVar: 'SITE_URL' }, ... })String values and resolver function return values are validated as URLs when Astro config is parsed. { envVar } is resolved by runtime routes and head components from the standard Cloudflare Workers env binding (cloudflare:workers), then validated as a URL. process.env is only a Node/test fallback for local tooling; consumers do not need nodejs_compat, process.env, or a Fronts-local runtime shim for site URL resolution.
wrangler.toml (required for middleware/both modes)
# Fresh-twin storage (R2 preferred, KV acceptable)
[[r2_buckets]]
binding = "AEO_TWINS"
bucket_name = "my-site-aeo-twins"
# Revalidation Coordinator — rate limit + per-slug lock + idempotency
[[durable_objects.bindings]]
name = "AEO_REVALIDATION_COORD"
class_name = "AeoRevalidationCoordinator"
[[migrations]]
tag = "v1"
new_sqlite_classes = ["AeoRevalidationCoordinator"]
# Version metadata (R2 key prefixing for rollback safety)
[version_metadata]
binding = "CF_VERSION_METADATA"
# Daily prune cron — deletes old-version R2 entries
[triggers]
crons = ["0 3 * * *"]
# Assets binding — MUST set not_found_handling to "none" or middleware's
# env.ASSETS.fetch() for missing twins will return the SPA fallback.
[assets]
binding = "ASSETS"
directory = "./dist/client"
not_found_handling = "none"Re-export the DO class + scheduled handler from your Worker entrypoint:
export { AeoRevalidationCoordinator } from '@growth-labs/seo/durable-objects'
export { pruneAeoR2 } from '@growth-labs/seo/cron'
export default {
async scheduled(event, env, ctx) {
await pruneAeoR2({ env })
},
}What it injects
Middleware (order: post):
- Classifies every request: verified search crawler (BM fast path + FCrDNS fallback), LLM training crawler, user-directed LLM agent, anonymous.
- Sets
Astro.locals.crawlerClass+effectiveAuthSegmentfor consumer cache-key builders. - 403s LLM training crawlers on
access: 'members'items. - Adds
Content-Signalheader on every response. - Adds
Vary: Accept, User-Agent, Cookie, CF-Connecting-IPon every SSR response. - Adds
Link: rel="alternate"; type="text/markdown"on HTML responses (suppressed for members items). - Serves
.mdtwins viaAccept: text/markdowncontent-negotiation in middleware/both modes (R2 → Assets → 503 stub + background render fallthrough).
Routes:
/sitemap-index.xml+sitemap-articles.xml,sitemap-pages.xml,sitemap-videos.xml,sitemap-products.xml/sitemap-markdown.xml— twin URL sitemap (static/both modes only)/robots.txt/llms.txt,/llms-full.txt/feed.xml(RSS)/apple-news.xml(Apple News Publisher RSS, if enabled)/podcast.xml,/listen.xmlPOST /_seo/revalidate— CMS webhook target (whenonDemandRevalidation: true)
Consumers that own an injected sitemap path can disable only that route with
injectedRoutes, for example injectedRoutes: { sitemapVideos: false }.
The generated sitemap index omits disabled child sitemaps.
Head-tag components:
<SeoHead />emits<title>, description, canonical, robots withmax-image-preview:large, hreflang, OG/Twitter fields, Apple News discovery, markdown twin links, and JSON-LD. Use it in layouts instead of local meta shims.- JSON-LD includes WebSite for non-content pages, Article/NewsArticle or Product for content pages, and VideoObject / AudioObject when
ContentItem.videoorContentItem.audiois present. - For
@growth-labs/opengraph, passgetOgImageUrl()output asitem.imageordefaults.defaultImage; SeoHead uses that URL forog:imageandtwitter:image. <AeoHead />remains available standalone when a site only wants Apple News discovery and markdown twin links.
Runtime behavior
@growth-labs/seo self-seeds config through virtual:growth-labs/seo/config. Runtime entrypoints resolve bindings from the standard Cloudflare surfaces:
cloudflare:workersfor env bindingsAstro.locals.cfContext.waitUntil()for background tasks
For site: { envVar: 'SITE_URL' }, the package reads SITE_URL from the Worker env binding at request time. The Node process.env fallback exists only for tests and build tooling.
Build-time:
- Emits
.mdtwins + summary twins for public items (static/both modes) underdist/client/. - Validates hreflang reciprocity.
- Validates no prerendered route serves a members-gated item (when Flexible Sampling is enabled).
- Per-page HTML validation (title length, meta description, canonical, H1, hero image,
Article JSON-LD on article routes, and
max-image-preview:large). Hard validation errors fail the build; warnings remain informational.
Crawler classes
| Class | What they see | Notes |
|---|---|---|
| verifiedSearchCrawler | Full body (+ paywall JSON-LD on gated items under Flexible Sampling) | FCrDNS-verified Googlebot/Bingbot/Applebot. Cloudflare BM fast path when available. |
| llmTrainingCrawler | 403 on members items, public body on public items | GPTBot, ClaudeBot, CCBot, PerplexityBot, Applebot-Extended, etc. |
| userDirectedLlmAgent | Anonymous body only, regardless of cookies | ChatGPT-User, Claude-User, PerplexityBot-User, Google-NotebookLM. Load-bearing override prevents cookie-based leakage. |
| anonymous | Public body or gate | Everything else. |
Standalone utilities
All pure-function utilities are available without the integration:
import { generateArticleJsonLd, generateProductJsonLd } from '@growth-labs/seo/utils'
import { generateMeta } from '@growth-labs/seo/utils/meta'
import { generateAppleNewsRss, generateAppleNewsAnf } from '@growth-labs/seo/utils'
import { classifyRequest, createFcrdnsVerifier } from '@growth-labs/seo/utils'
import { computeEffectiveAuthSegment } from '@growth-labs/seo/utils'JSON-LD generators: Article, NewsArticle, BlogPosting, FAQPage, VideoObject, AudioObject, Person, HowTo, Product, BreadcrumbList, Organization, WebSite, ItemList, SpeakableSpecification
Feed generators: RSS, Apple News Publisher RSS, Apple News Format (ANF) JSON, podcast RSS, listen feed, llms.txt, llms-full.txt
Utilities: OG + Twitter Card meta, sitemap XML, markdown sitemap, hreflang, robots.txt, AEO markdown generator with RAG chunk markers, summary twin generator, content-hash staleness.
Non-Cloudflare hosts
aeoTwins: { mode: 'static' } works on any host that serves static files (Vercel, Netlify, GitHub Pages, S3+CloudFront). Other modes ('middleware', 'both') and onDemandRevalidation require Cloudflare Workers + R2/KV + Durable Objects. See packages-seo-SPEC-v2.md "Deployment Targets" for details.
Key patterns
- Virtual module:
virtual:growth-labs/seo/config - Runtime routes and middleware resolve bindings from standard Cloudflare surfaces
- AI crawler blocking: enforced at
robots.txtAND per-request 403 on members items .mdtwin canonical: emitted withX-Robots-Tag: noindex+Link: <html-url>; rel="canonical"— prevents Google from clustering the.mdas a duplicate of the HTML- Summary twins:
.summary.mdcompanion emitted whensummaryTwin: true(default), with a 4-tier fallback (item.summary → bullets → first-sentence-per-section → description-only) - Build-time validation: fails on required structural SEO defects and gated content on prerendered routes
Full spec
See packages-seo-SPEC-v2.md for the complete 2,400-line specification including the test matrix, architectural rationale, and worked examples for every code path.
