contentbase
v0.2.0
Published
**An ORM for your Markdown.**
Readme
Contentbase
An ORM for your Markdown.
Contentbase treats a folder of Markdown and MDX files as a typed, queryable database. Define models with Zod schemas, extract structured data from headings and lists, traverse parent/child relationships across documents, validate everything, and query it all with a fluent API.
import { Collection, defineModel, section, hasMany, z, toString } from "contentbase";
const Story = defineModel("Story", {
meta: z.object({
status: z.enum(["draft", "ready", "shipped"]).default("draft"),
points: z.number().optional(),
}),
sections: {
acceptanceCriteria: section("Acceptance Criteria", {
extract: (q) => q.selectAll("listItem").map((n) => toString(n)),
schema: z.array(z.string()).min(1),
}),
},
});
const collection = new Collection({ rootPath: "./content" });
await collection.load();
const stories = await collection
.query(Story)
.where("meta.status", "ready")
.fetchAll();
stories[0].meta.status; // "ready" (typed!)
stories[0].sections.acceptanceCriteria; // string[] (typed!)No database. No build step. Your content is the source of truth.
Why
You already organize knowledge in Markdown: specs, stories, docs, runbooks, design decisions. But the moment you need to query across files, validate frontmatter, or extract structured data from a heading, you're writing brittle scripts.
Contentbase gives you the primitives to treat that content like a real data layer:
- Schema-validated frontmatter via Zod. Typos in your
statusfield get caught, not shipped. - Sections as typed data. A heading called "Acceptance Criteria" containing a bullet list becomes
string[]on the model instance, validated and cached. - Relationships derived from document structure. An Epic's
## Storiesheading with### Story Namesub-headings automatically yields ahasManyrelationship. No join tables. No IDs to manage. - Full TypeScript inference.
defineModel()infers all five generic parameters from your config object. You never write a type annotation.
Install
bun add contentbaseContentbase is ESM-only and requires Node 18+ or Bun.
Core Concepts
Documents
Every .md or .mdx file in your content directory becomes a Document. Documents have an id (the file path without the extension), lazily-parsed AST, frontmatter metadata, and a rich set of section operations.
content/
epics/
authentication.mdx -> id: "epics/authentication"
stories/
authentication/
user-can-register.mdx -> id: "stories/authentication/user-can-register"Models
A model is a config object that describes one type of document. It declares:
- description -- human-readable summary (auto-generated from schema if omitted)
- meta -- a Zod schema for frontmatter
- sections -- named extractions from heading-based sections
- relationships --
hasMany/belongsTolinks between models - computed -- derived values calculated from instance data
- defaults -- static default values for frontmatter fields
- pattern -- Express-style path patterns for inferring meta from file paths
const Epic = defineModel("Epic", {
prefix: "epics",
description: "A project epic that groups related user stories.",
meta: z.object({
priority: z.enum(["low", "medium", "high"]).optional(),
status: z.enum(["created", "in-progress", "complete"]).default("created"),
}),
relationships: {
stories: hasMany(() => Story, { heading: "Stories" }),
},
computed: {
isComplete: (self) => self.meta.status === "complete",
},
defaults: {
status: "created",
},
});If description is omitted, one is generated on first access from the model's schema: "An Epic has metadata (priority, status), relationship (stories → Story), and computed property (isComplete)."
The prefix determines which files match this model. Files whose path starts with "epics" are Epics. If omitted, the prefix is auto-pluralized from the name ("Epic" -> "epics").
Explicit Model Assignment with _model
Documents at the root of a collection (not in a subfolder) can't be matched by prefix. You can explicitly assign a model by adding _model to the frontmatter:
---
_model: Epic
title: Platform Migration
status: created
---The _model key takes priority over prefix matching, so this works even if the file lives outside the model's prefix folder.
The Base Model
Every collection automatically registers a built-in Base model as a catch-all. Documents that don't match any other model (by _model or prefix) are assigned to Base. You can query these unmatched documents:
import { Base } from "contentbase";
const misc = await collection.query(Base).fetchAll();If you want to customize the Base model (e.g. add a meta schema), register your own before calling load():
const MyBase = defineModel("Base", {
meta: z.object({ tags: z.array(z.string()).optional() }),
});
collection.register(MyBase);
await collection.load(); // won't auto-register the built-in BasePath Patterns
Models can declare Express-style path patterns to automatically infer meta values from the document's file path:
const Story = defineModel("Story", {
prefix: "stories",
pattern: "stories/:epic/:slug",
meta: z.object({
epic: z.string(),
slug: z.string(),
}),
});A file at stories/authentication/user-can-register.mdx will automatically have { epic: "authentication", slug: "user-can-register" } inferred into its meta. Explicit frontmatter values always take precedence over pattern-inferred values. You can also supply an array of patterns -- the first match wins.
Collections
A Collection loads a directory tree and gives you access to documents and typed model instances.
const collection = new Collection({
rootPath: "./content",
extensions: ["mdx", "md"], // default
autoDiscover: true, // auto-load models.ts if no models registered
});
await collection.load();
// Register models for prefix-based matching
collection.register(Epic);
collection.register(Story);
// Get a typed instance
const epic = collection.getModel("epics/authentication", Epic);
epic.meta.priority; // "high" | "medium" | "low" | undefinedSections
Sections let you extract typed, structured data from the content beneath a heading.
Given this Markdown:
## Acceptance Criteria
- Users can sign up with email and password
- Validation errors are shown inline
- Confirmation email is sentDefine a section to extract the list items:
import { section, toString } from "contentbase";
const Story = defineModel("Story", {
sections: {
acceptanceCriteria: section("Acceptance Criteria", {
extract: (query) =>
query.selectAll("listItem").map((node) => toString(node)),
schema: z.array(z.string()),
alternatives: ["Requirements"], // fallback heading names
}),
},
});The extract function receives an AstQuery scoped to the content under that heading. The schema is optional and used during validation. The alternatives array provides fallback heading names -- if "Acceptance Criteria" isn't found, it tries "Requirements" next.
Section data is lazily computed and cached -- the extract function only runs the first time you access the property.
instance.sections.acceptanceCriteria;
// ["Users can sign up with email and password", "Validation errors are shown inline", ...]Relationships
hasMany
A hasMany relationship extracts child models from sub-headings. Given an Epic document:
# Authentication
## Stories
### User can register
As a user I want to register...
### User can login
As a user I want to login...Defining the relationship:
const Epic = defineModel("Epic", {
relationships: {
stories: hasMany(() => Story, { heading: "Stories" }),
},
});Contentbase finds the ## Stories heading, extracts each ### sub-heading as a child document, and creates typed model instances:
const epic = collection.getModel("epics/authentication", Epic);
const stories = epic.relationships.stories.fetchAll();
stories.length; // 2
stories[0].title; // "User can register"
const first = epic.relationships.stories.first();
const last = epic.relationships.stories.last();belongsTo
A belongsTo relationship resolves a parent via a foreign key in frontmatter.
# stories/authentication/user-can-register.mdx
---
status: created
epic: authentication
---const Story = defineModel("Story", {
meta: z.object({
status: z.enum(["created", "in-progress", "complete"]).default("created"),
epic: z.string().optional(),
}),
relationships: {
epic: belongsTo(() => Epic, {
foreignKey: (doc) => doc.meta.epic as string,
}),
},
});
const story = collection.getModel(
"stories/authentication/user-can-register",
Story
);
const epic = story.relationships.epic.fetch();
epic.title; // "Authentication"Relationship targets use thunks (() => Epic) so you can define circular references without import ordering issues.
Querying
The query API filters typed model instances with a fluent builder:
// Simple equality
const epics = await collection
.query(Epic)
.where("meta.priority", "high")
.fetchAll();
// Object shorthand
const drafts = await collection
.query(Story)
.where({ "meta.status": "created" })
.fetchAll();
// Comparison operators
const urgent = await collection
.query(Story)
.where("meta.points", "gte", 5)
.fetchAll();
// Chainable methods
const results = await collection
.query(Story)
.whereIn("meta.status", ["created", "in-progress"])
.whereExists("meta.epic")
.sort("meta.status", "asc")
.limit(10)
.offset(0)
.fetchAll();
// Convenience accessors
const first = await collection.query(Epic).first();
const count = await collection.query(Epic).count();Available operators: eq, neq, in, notIn, gt, lt, gte, lte, contains, startsWith, endsWith, regex, exists.
Queries filter by model type before creating instances, so you only pay the parsing cost for matching documents.
JSON Query DSL
For querying over the wire (REST API, MCP server) without executing arbitrary code, Contentbase provides a MongoDB-style JSON query DSL:
{
"model": "Epic",
"where": {
"meta.status": "created",
"meta.priority": { "$gt": 3 }
},
"sort": { "meta.priority": "desc" },
"select": ["id", "title", "meta.status"],
"limit": 10,
"offset": 0
}Where value shortcuts:
| Value type | Interpretation | Example |
|------------|---------------|---------|
| Literal (string, number, boolean, null) | Implicit $eq | "meta.status": "active" |
| Array | Implicit $in | "meta.tags": ["a", "b"] |
| Operator object | Explicit operator | "meta.priority": { "$gt": 5 } |
| Multiple operators | AND on same field | "meta.priority": { "$gte": 3, "$lte": 8 } |
Available operators: $eq, $neq, $in, $notIn, $gt, $lt, $gte, $lte, $contains, $startsWith, $endsWith, $regex, $exists
Options:
| Field | Type | Default | Description |
|-------|------|---------|-------------|
| model | string | (required) | Model name or prefix |
| where | object | -- | MongoDB-style filter conditions |
| sort | object or array | -- | { "field": "asc"\|"desc" } or [{ "path": "...", "direction": "asc" }] |
| select | string[] | -- | Fields to include in output |
| limit | number | -- | Maximum results to return |
| offset | number | -- | Number of results to skip |
| method | string | "fetchAll" | "fetchAll", "first", "last", or "count" |
Use via POST /api/query with a JSON body, or through the MCP query tool. The DSL parser is also available programmatically:
import { executeQueryDSL, queryDSLSchema } from "contentbase";
const result = await executeQueryDSL(collection, {
model: "Epic",
where: { "meta.status": "created" },
sort: { title: "asc" },
limit: 5,
});Semantic Search
Contentbase includes built-in semantic search that combines vector embeddings, BM25 keyword ranking, and your content models' metadata — so you can ask natural-language questions and scope results to specific models and frontmatter values in the same query.
Setup
Generate embeddings for your collection:
# Uses OpenAI embeddings by default (requires OPENAI_API_KEY)
cnotes embed
# Or use local embeddings (no API key needed — auto-installs node-llama-cpp)
cnotes embed --local
# Check index health
cnotes embed --statusThe index is stored in .contentbase/search.sqlite. Documents are split into chunks at H2 section boundaries and each chunk is embedded independently, so search results can point to specific sections within a document. Only changed documents are re-embedded on subsequent runs (tracked via content hashes).
Searching
# Hybrid search (default) — combines keyword + vector for best results
cnotes search "authentication patterns"
# Keyword-only (BM25 ranking) — fast, good for exact terms
cnotes search "deploymentConfig" --mode keyword
# Vector-only (semantic similarity) — understands meaning, not just words
cnotes search "how do deployments work" --mode vectorCombining Search with Model Metadata
The real power is combining semantic understanding with your content models' structured metadata. The --model flag scopes results to a specific model, and --where filters on frontmatter fields — so your Zod schemas double as search facets:
# Find approved plans related to infrastructure
cnotes search "infrastructure" --model Plan --where "status=approved"
# Search only within epics that are in progress
cnotes search "user onboarding" --model Epic --where "status=in-progress"
# Combine model filtering with result limits
cnotes search "auth" --model Story -n 5This means the same schema you use for validation and querying also powers search filtering. A model with status, priority, or category in its meta schema automatically becomes filterable in search — no extra configuration.
Search Modes
| Mode | Algorithm | Best for |
|------|-----------|----------|
| hybrid (default) | BM25 + vector cosine similarity | General-purpose queries |
| keyword | BM25 full-text ranking | Exact terms, code identifiers, names |
| vector | Embedding cosine similarity | Conceptual queries, "how does X work" |
CLI Reference
cnotes search <query> [options]
--mode hybrid|keyword|vector Search mode (default: hybrid)
--model <name> Filter by content model
--where <filter> Metadata filter, e.g. "status=approved"
-n <number> Max results (default: 10)
--json Output as JSON
--full Include full document content in output
--bootstrap Build index if missing, then search
cnotes embed [options]
--force Re-embed everything (ignore content hashes)
--provider openai|local Embedding provider (default: openai)
--status Show index health without embedding
--local Use local embeddings (auto-installs if needed)
--install-local Only install node-llama-cpp, then exitREST API
When running cnotes serve, search is also available over HTTP:
GET /api/search?q=<query>&mode=hybrid&model=Epic&limit=10
POST /api/search { "query": "...", "mode": "hybrid", "model": "Epic", "where": { "status": "approved" } }Validation
Every model instance can be validated against its Zod schemas:
const instance = collection.getModel("epics/authentication", Epic);
const result = await instance.validate();
result.valid; // true
result.errors; // ZodIssue[]Validation checks:
- Meta against the model's Zod schema (with defaults applied)
- Sections against any section-level schemas
if (instance.hasErrors) {
for (const [path, issue] of instance.errors) {
console.log(`${path}: ${issue.message}`);
}
}The standalone validateDocument function is also available for lower-level use.
Serialization
const json = instance.toJSON();
// { id, title, meta }
const full = instance.toJSON({
sections: ["acceptanceCriteria"],
computed: ["isComplete"],
related: ["stories"],
});
// { id, title, meta, acceptanceCriteria: [...], isComplete: false, stories: [...] }Export an entire collection:
const data = await collection.export();Document API
Documents expose a powerful AST manipulation layer built on the unified/remark ecosystem.
const doc = collection.document("epics/authentication");
// Read
doc.title; // "Authentication"
doc.slug; // "authentication"
doc.meta; // { priority: "high", status: "created" }
doc.content; // raw markdown (without frontmatter)
doc.rawContent; // full file content with frontmatter
// AST querying
const headings = doc.astQuery.selectAll("heading");
const h2s = doc.astQuery.headingsAtDepth(2);
const storiesHeading = doc.astQuery.findHeadingByText("Stories");
// Node shortcuts
doc.nodes.headings; // all headings
doc.nodes.links; // all links
doc.nodes.tables; // all table nodes
doc.nodes.tablesAsData; // tables as { headers, rows } objects
doc.nodes.codeBlocks; // all code blocks
// Section operations (immutable by default)
const trimmed = doc.removeSection("Stories"); // new Document
const updated = doc.replaceSectionContent("Stories", newMarkdown);
const expanded = doc.appendToSection("Stories", "### New Story\n\nDetails...");
// Mutable when you need it
doc.removeSection("Stories", { mutate: true });
// Persistence
await doc.save();
await doc.reload();Standalone Parsing
The parse() function gives you a queryable document from a file path or raw markdown string, without needing a Collection:
import { parse } from "contentbase";
const doc = await parse("./content/my-post.mdx");
doc.title; // first heading text
doc.meta; // frontmatter
doc.astQuery.selectAll("heading"); // AST querying
doc.nodes.links; // node shortcuts
doc.querySection("Introduction").selectAll("paragraph");
// Also works with raw markdown
const doc2 = await parse("# Hello\n\nWorld");Extracting Sections Across Documents
extractSections() pulls named sections from multiple documents into a single combined document, with heading depths adjusted automatically:
import { extractSections } from "contentbase";
const combined = extractSections([
{ source: doc1, sections: "Acceptance Criteria" },
{ source: doc2, sections: ["Acceptance Criteria", "Mockups"] },
], {
title: "All Acceptance Criteria",
});This produces:
# All Acceptance Criteria
## Authentication
### Acceptance Criteria
- Users can sign up with email and password
- ...
## Searching And Browsing
### Acceptance Criteria
- Users can search by category
- ...Modes
Grouped (default) -- each source document gets a heading (its title), with extracted sections nested underneath:
extractSections(entries, { mode: "grouped" });Flat -- sections are placed sequentially with no source grouping:
extractSections(entries, { mode: "flat" });
// ## Acceptance Criteria <- from doc1
// - ...
// ## Acceptance Criteria <- from doc2
// - ...Options
| Option | Default | Description |
| --- | --- | --- |
| title | -- | Optional h1 title for the combined document |
| mode | "grouped" | "grouped" nests under source titles, "flat" places sections sequentially |
| onMissing | "skip" | "skip" silently omits missing sections, "throw" raises an error |
The return value is a ParsedDocument -- fully queryable with astQuery, nodes, extractSection(), querySection(), and stringify().
Sources can be any mix of Document and ParsedDocument instances.
Table of Contents Generation
Generate a markdown table of contents for a collection with links that work on GitHub:
const toc = collection.tableOfContents({ title: "Project Docs" });Output:
# Project Docs
## Epic
- [Authentication](./epics/authentication.mdx)
- [Searching And Browsing](./epics/searching-and-browsing.mdx)
## Story
- [A User should be able to register.](./stories/authentication/a-user-should-be-able-to-register.mdx)If models are registered, documents are grouped by model. Without models, a flat list is produced. Use basePath to control the link prefix:
collection.tableOfContents({ basePath: "./content" });
// links become: ./content/epics/authentication.mdxFile Tree
Render an ASCII file tree of all documents in the collection:
const tree = collection.renderFileTree();epics/
├── authentication.mdx
└── searching-and-browsing.mdx
stories/
└── authentication/
└── a-user-should-be-able-to-register.mdxModel Summary
Generate comprehensive documentation of all registered models, including schema fields, sections, relationships, and defaults:
const summary = await collection.generateModelSummary();
// Returns markdown documenting each model's schema, sections, relationshipsComputed Properties
Derived values that are lazily evaluated from instance data:
const Epic = defineModel("Epic", {
meta: z.object({
status: z.enum(["created", "in-progress", "complete"]).default("created"),
}),
computed: {
isComplete: (self) => self.meta.status === "complete",
storyCount: (self) => self.relationships.stories.fetchAll().length,
},
});
const epic = collection.getModel("epics/authentication", Epic);
epic.computed.isComplete; // false
epic.computed.storyCount; // 2Plugins and Actions
// Register named actions on the collection
collection.action("publish", async (coll, instance, opts) => {
// your publish logic
});
await instance.runAction("publish", { target: "production" });
// Plugin system
function timestampPlugin(collection, options) {
collection.action("touch", async (coll, instance) => {
// update timestamps
});
}
collection.use(timestampPlugin, { format: "iso" });CLI
Contentbase ships with a CLI available as both cnotes and contentbase. See CLI.md for the full reference with examples for every command.
bun add contentbase
# Then use it via bunx, or in package.json scripts
bunx cnotes inspectCommands
cnotes init [name] # scaffold a new project
cnotes create <Model> --title "..." # scaffold a new document (uses templates if available)
cnotes inspect # show models, sections, relationships, doc counts
cnotes validate [target] # validate documents ('all', a model name, or a path ID)
cnotes export # export collection as JSON
cnotes extract <glob> --sections "A, B" # extract specific sections from matching documents
cnotes summary # generate MODELS.md and TABLE-OF-CONTENTS.md
cnotes teach # output combined documentation for LLM context
cnotes action <name> # run a named action
cnotes text-search <pattern> # search file contents with pattern matching
cnotes serve # start HTTP server with REST API and doc serving
cnotes mcp # start MCP server for AI agent integration
cnotes console # interactive REPL with collection in scope
cnotes help # list available commandsAll commands accept --contentFolder to specify which folder contains your content. Defaults to ./docs. You can also set it in package.json:
{
"contentbase": {
"contentFolder": "content"
}
}serve
Start an HTTP server that exposes a full REST API for the collection. Documents are available as JSON, rendered HTML, or raw markdown.
# Start on default port 8000
cnotes serve
# Custom port, specific content folder
cnotes serve --port 9000 --contentFolder ./sdlc
# Read-only mode: disables all write endpoints (POST/PUT/PATCH/DELETE)
cnotes serve --read-onlyBuilt-in endpoints:
| Path | Description |
|------|-------------|
| GET /api/inspect | Collection overview |
| GET /api/models | All model definitions |
| GET /api/documents | List documents (filter with ?model=) |
| GET/POST/PUT/PATCH/DELETE /api/documents/:pathId | Document CRUD |
| GET /api/query?model=&where=&select= | Query model instances (flat condition format) |
| POST /api/query | Query with JSON DSL body (MongoDB-style) |
| GET /api/search?pattern= | Full-text regex search across documents |
| GET /api/text-search?pattern= | File-level text search (expanded=true for line detail) |
| GET /api/validate?pathId= | Validate against schema |
| GET/POST /api/actions | List or execute actions |
| GET /docs/:path.json\|.md\|.html | Content-negotiated doc serving |
| GET /openapi.json | Auto-generated OpenAPI 3.1 spec |
When --read-only is passed, all mutating endpoints return 403 Forbidden. This includes POST /api/documents, PUT/PATCH/DELETE /api/documents/:pathId, and POST /api/actions. Read endpoints are unaffected.
You can also add your own endpoints by placing files in an endpoints/ directory. See CLI.md for details.
mcp
Start a Model Context Protocol server for AI agent integration. Exposes tools, resources, and prompts for the collection.
cnotes mcp # stdio transport (for Claude Desktop, etc.)
cnotes mcp --transport http --port 3003 # HTTP transportextract
The extract command outputs document titles, leading content, and only the requested sections -- combined into a single document suitable for creating new content:
# Extract Acceptance Criteria from all stories
cnotes extract "stories/**/*" --sections "Acceptance Criteria"
# Combine epics into a single document with a title
cnotes extract "epics/*" -s "Stories" --title "All Stories"
# Multiple sections, include frontmatter, raw heading depths
cnotes extract "epics/*" -s "Stories, Notes" --frontmatter --no-normalize-headingsGlob patterns are matched against document path IDs using picomatch. Sections that don't exist in a document are silently skipped.
By default, heading depths are normalized so each document's content nests properly in the combined output. When --title is provided, it becomes the h1 and document titles shift to h2. Use --no-normalize-headings to preserve original heading depths.
create
The create command scaffolds new documents with smart defaults:
cnotes create Story --title "User can logout"
cnotes create Epic --title "Payments" --meta.priority highIf a template exists at templates/<model>.md (or .mdx) in your content directory, it's used as the base. Meta values are merged with this priority: Zod defaults < model defaults < template frontmatter < CLI --meta.* flags.
Model Discovery
The CLI uses a 3-tier system to find your models:
Tier 1 — index.ts (recommended): If your content directory has an index.ts that exports a Collection with models registered, the CLI uses it directly. This is what contentbase init scaffolds.
// docs/index.ts
import { Collection, defineModel, z } from "contentbase";
const Post = defineModel("Post", {
meta: z.object({ draft: z.boolean().default(false) }),
});
export const collection = new Collection({ rootPath: import.meta.dir });
collection.register(Post);Tier 2 — models.ts: If no index.ts exists but a models.ts is found, the CLI imports it, detects model definitions from exports, and auto-registers them on a new Collection.
Tier 3 — Auto-discovery: If neither file exists, the CLI scans top-level subdirectories for markdown files and generates bare models from folder names (epics/ → Epic). These models have no schema validation — useful for quick inspection, but you'll want a models.ts or index.ts for real use.
API Reference
Top-level exports
| Export | Description |
| --- | --- |
| Collection | Loads and manages a directory of documents |
| Document | A single Markdown/MDX file with AST operations |
| defineModel() | Create a typed model definition (accepts optional description, auto-generated if omitted) |
| generateDescription() | Generate a human-readable model description from its schema |
| section() | Declare a section extraction |
| hasMany() | Declare a one-to-many relationship |
| belongsTo() | Declare a many-to-one relationship |
| parse() | Parse a file path or markdown string into a queryable ParsedDocument |
| extractSections() | Combine sections from multiple documents into one |
| CollectionQuery | Fluent query builder for model instances |
| queryDSLSchema | Zod schema for validating JSON query DSL input |
| parseWhereClause() | Parse MongoDB-style where object into internal conditions |
| executeQueryDSL() | Execute a JSON query DSL against a collection |
| AstQuery | MDAST query wrapper (select, visit, find) |
| NodeShortcuts | Convenience getters for common AST nodes |
| createModelInstance() | Low-level factory for model instances |
| validateDocument() | Standalone validation function |
| matchPattern() | Express-style path pattern matching (:param syntax) |
| matchPatterns() | Try multiple patterns against a path, first match wins |
| introspectMetaSchema() | Extract field info (name, type, required, default) from a Zod schema |
| z | Re-exported from Zod (no extra dependency needed) |
| toString | Re-exported from mdast-util-to-string |
License
MIT
