@pumpspotting/content

v0.3.0

Published

2 months ago

Filesystem + frontmatter utilities for Pumpspotting markdown content repos. File discovery, parsing, validation, sync.

0High
0Medium
0Low

@pumpspotting/content

Filesystem + frontmatter utilities for markdown content repos. Discover, parse, validate, and sync post/ and node/ markdown trees. Powers the Pumpspotting content pipeline; generic enough to drop into any repo with the same shape.

install

npm install @pumpspotting/content
# or
pnpm add @pumpspotting/content

Every API takes an explicit root (absolute path to the content repo) — the package itself has no notion of where content lives. Pick the root in your caller (typically process.cwd()).

programmatic API

import {
	collectAllContentFiles,
	parseFile,
	saveFile,
	validateFiles,
	reportValidationIssues,
} from '@pumpspotting/content';

const root = process.cwd();

// 1. discover
const files = collectAllContentFiles(root); // absolute paths to every .md under post/ and node/
const relFiles = collectAllContentFiles(root, { relative: true });

// 2. parse
const parsed = parseFile(files[0], root);
//  → { slug, title, body, date, lede, tags, images, isPublished, metadata, redirects }

// 3. validate
const issues = validateFiles(files, root);
reportValidationIssues(issues, files.length); // prints and exits 1 on failure

// 4. save (round-trip through frontmatter + prettier formatting)
parsed.title = parsed.title.toLowerCase();
await saveFile(files[0], parsed, root);

exports

| function | purpose | | ------------------------ | -------------------------------------------------------------------- | | collectAllContentFiles | walk post/ + node/ under root, return all .md paths | | collectMarkdownFiles | recursive .md walk of any directory | | parseFile | read a .md file → ParsedContent (frontmatter + body + path tags) | | parseRow | parseFile + attach externalId for API upserts | | saveFile | write a ParsedContent back as .md (frontmatter + prettier body) | | readMatter | low-level YAML frontmatter + body split | | validateFile | check one file's frontmatter + body images | | validateFiles | validate many files + check for duplicate slugs | | checkDuplicateSlugs | duplicate-slug check across a file set | | reportValidationIssues | print + exit 1 on issues, success message otherwise | | extractBodyImages | pull ![alt](url) images from a markdown body | | rewriteBodyImagePaths | rewrite image urls in a body via AST roundtrip | | tagsFromPath | derive tags from a file's directory segments |

Types: ParsedContent, ContentRow, Image, ImageRole, Issue. Constants: CONTENT_DIRS (['post', 'node']), SLUG_PATTERN, FILES_BASE_URL.

CLIs

Installed as bins — invoke from your content repo root:

content-validate            # validate all post/ and node/ files
content-validate post/      # validate only a subtree

content-parse <path>        # print a file's parsed JSON

content-format              # canonical formatter (parseFile/saveFile roundtrip)
content-format --check      # check without writing (exit 1 if dirty)

content-sync                # diff HEAD~1..HEAD, validate, POST to content API
content-sync --full         # send every file (first run / recovery)
content-sync --dry-run      # show what would be sent

content-test-sync           # send one file (or --full) without using git diff
content-test-sync post/blog/2016-06-01-welcome.md

content-sync and content-test-sync require:

CONTENT_API_URL=https://yourhost.example
CONTENT_API_TOKEN=...                              # bearer token
CONTENT_SOURCE_ID=github:your-org/your-content     # identifies the source in upserts

Both bins auto-load .env from the working directory if present, so local runs don't need a node --env-file=... wrapper.

commit range

content-sync resolves which files to send from a git commit range:

| BEFORE_SHA | AFTER_SHA | range used | | ---------------------- | ----------- | --------------------------------------- | | set | set | BEFORE..AFTER | | unset or AFTER unset | — | HEAD~1..HEAD (the latest commit only) |

BEFORE_SHA is exclusive, AFTER_SHA is inclusive. In a GitHub Action triggered by push, wire these from github.event.before / github.event.after. To replay a missed window, dispatch the workflow manually with both inputs set — passing only before_sha silently falls back to the one-commit default.

--full ignores the range entirely and resyncs every file in post/ and node/. Safe to run any time; the server upsert is idempotent on (sourceId, externalId).

expected repo layout

your-content-repo/
├── post/                  ← timestamped content, named YYYY-MM-DD-slug.md
└── node/                  ← static pages, named slug.md

Subdirectories under each are organizational (post/blog/, post/podcast/, node/legal/) and contribute tags to every file beneath them. tagsFromPath('post/blog/2024-01-01-foo.md', root) → ['post', 'blog'].

frontmatter

Every file needs at minimum:

---
title: Your Title
slug: your-slug
---

Body.

slug — lowercase, hyphenated, no slashes (/^[a-z0-9]+(?:-[a-z0-9]+)*$/). Unique across all files.
title — non-empty string.
date — ISO date (optional).
lede — short summary (optional).
tags — array of strings. Merged with path-derived tags.
images — array of { role: 'featured'|'primary'|'body', path?, key?, id?, alt? }. Single image: ... is sugar for one featured image.
isPublished — boolean (default true). Legacy status: draft is still accepted.
redirects — array of old slugs (optional).

Body images (![alt](url)) become { role: 'body', path, key?, alt } entries on the parsed result.

sync API contract

content-sync POSTs to ${CONTENT_API_URL}/api/v2/content:

{
	"actorId": "[email protected]",
	"eventId": "<commit-sha-or-INIT>",
	"sourceId": "<CONTENT_SOURCE_ID>",
	"rows": [
		{
			"externalId": "post/blog/2024-01-01-foo.md",
			"slug": "...",
			"title": "...",
			"body": "...",
			"tags": [...],
			"images": [...],
			"isPublished": true,
			"metadata": {...}
		}
	]
}

Renames PATCH the same endpoint with { sourceId, renames: [{ oldExternalId, newExternalId }] }. Deletes DELETE with ?sourceId=...&externalId=....

The server is responsible for tag resolution, image mapping, and database writes — this package only produces the payloads.

license

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@pumpspotting/content

install

programmatic API

exports

CLIs

commit range

expected repo layout

frontmatter

sync API contract

license