@pumpspotting/content
v0.2.0
Published
Filesystem + frontmatter utilities for Pumpspotting markdown content repos. File discovery, parsing, validation, sync.
Readme
@pumpspotting/content
Filesystem + frontmatter utilities for markdown content repos. Discover, parse, validate, and sync post/ and node/ markdown trees. Powers the Pumpspotting content pipeline; generic enough to drop into any repo with the same shape.
install
npm install @pumpspotting/content
# or
pnpm add @pumpspotting/contentEvery API takes an explicit root (absolute path to the content repo) — the package itself has no notion of where content lives. Pick the root in your caller (typically process.cwd()).
programmatic API
import {
collectAllContentFiles,
parseFile,
saveFile,
validateFiles,
reportValidationIssues,
} from '@pumpspotting/content';
const root = process.cwd();
// 1. discover
const files = collectAllContentFiles(root); // absolute paths to every .md under post/ and node/
const relFiles = collectAllContentFiles(root, { relative: true });
// 2. parse
const parsed = parseFile(files[0], root);
// → { slug, title, body, date, lede, tags, images, isPublished, metadata, redirects }
// 3. validate
const issues = validateFiles(files, root);
reportValidationIssues(issues, files.length); // prints and exits 1 on failure
// 4. save (round-trip through frontmatter + prettier formatting)
parsed.title = parsed.title.toLowerCase();
await saveFile(files[0], parsed, root);exports
| function | purpose |
| ------------------------ | -------------------------------------------------------------------- |
| collectAllContentFiles | walk post/ + node/ under root, return all .md paths |
| collectMarkdownFiles | recursive .md walk of any directory |
| parseFile | read a .md file → ParsedContent (frontmatter + body + path tags) |
| parseRow | parseFile + attach externalId for API upserts |
| saveFile | write a ParsedContent back as .md (frontmatter + prettier body) |
| readMatter | low-level YAML frontmatter + body split |
| validateFile | check one file's frontmatter + body images |
| validateFiles | validate many files + check for duplicate slugs |
| checkDuplicateSlugs | duplicate-slug check across a file set |
| reportValidationIssues | print + exit 1 on issues, success message otherwise |
| extractBodyImages | pull  images from a markdown body |
| rewriteBodyImagePaths | rewrite image urls in a body via AST roundtrip |
| tagsFromPath | derive tags from a file's directory segments |
Types: ParsedContent, ContentRow, Image, ImageRole, Issue.
Constants: CONTENT_DIRS (['post', 'node']), SLUG_PATTERN, FILES_BASE_URL.
CLIs
Installed as bins — invoke from your content repo root:
content-validate # validate all post/ and node/ files
content-validate post/ # validate only a subtree
content-parse <path> # print a file's parsed JSON
content-format # canonical formatter (parseFile/saveFile roundtrip)
content-format --check # check without writing (exit 1 if dirty)
content-sync # diff HEAD~1..HEAD, validate, POST to content API
content-sync --full # send every file (first run / recovery)
content-sync --dry-run # show what would be sent
content-test-sync # send one file (or --full) without using git diff
content-test-sync post/blog/2016-06-01-welcome.mdcontent-sync and content-test-sync require:
CONTENT_API_URL=https://yourhost.example
CONTENT_API_TOKEN=... # bearer token
CONTENT_SOURCE_ID=github:your-org/your-content # identifies the source in upserts
# optional, set by github actions:
BEFORE_SHA=<sha>
AFTER_SHA=<sha>Both bins auto-load .env from the working directory if present, so local
runs don't need a node --env-file=... wrapper.
expected repo layout
your-content-repo/
├── post/ ← timestamped content, named YYYY-MM-DD-slug.md
└── node/ ← static pages, named slug.mdSubdirectories under each are organizational (post/blog/, post/podcast/, node/legal/) and contribute tags to every file beneath them. tagsFromPath('post/blog/2024-01-01-foo.md', root) → ['post', 'blog'].
frontmatter
Every file needs at minimum:
---
title: Your Title
slug: your-slug
---
Body.slug— lowercase, hyphenated, no slashes (/^[a-z0-9]+(?:-[a-z0-9]+)*$/). Unique across all files.title— non-empty string.date— ISO date (optional).lede— short summary (optional).tags— array of strings. Merged with path-derived tags.images— array of{ role: 'featured'|'primary'|'body', path?, key?, id?, alt? }. Singleimage: ...is sugar for onefeaturedimage.isPublished— boolean (defaulttrue). Legacystatus: draftis still accepted.redirects— array of old slugs (optional).
Body images () become { role: 'body', path, key?, alt } entries on the parsed result.
sync API contract
content-sync POSTs to ${CONTENT_API_URL}/api/v2/content:
{
"actorId": "[email protected]",
"eventId": "<commit-sha-or-INIT>",
"sourceId": "<CONTENT_SOURCE_ID>",
"rows": [
{
"externalId": "post/blog/2024-01-01-foo.md",
"slug": "...",
"title": "...",
"body": "...",
"tags": [...],
"images": [...],
"isPublished": true,
"metadata": {...}
}
]
}Renames PATCH the same endpoint with { sourceId, renames: [{ oldExternalId, newExternalId }] }. Deletes DELETE with ?sourceId=...&externalId=....
The server is responsible for tag resolution, image mapping, and database writes — this package only produces the payloads.
license
MIT
