@awesome-pages/parser
v1.1.0
Published
A powerful parser for Awesome Lists that converts markdown to structured domain objects and generates various artifacts
Downloads
39
Maintainers
Readme
Awesome Pages Parser
awesome-pages/parser is a modular TypeScript pipeline that converts Markdown-based awesome lists into structured JSON and other reusable artifacts.
It transforms one or multiple README.md files into machine-readable formats — making them ready for:
- Static sites (Jamstack)
- Local or client-side search
- Bookmark import/export
- Feed generation (RSS / JSON Feed)
- SEO tools and sitemaps
Installation
Install the package from NPM:
npm install @awesome-pages/parserOr using pnpm:
pnpm add @awesome-pages/parserOr using yarn:
yarn add @awesome-pages/parserQuick Start
Basic Usage
Parse a markdown file and generate artifacts:
import { parse } from '@awesome-pages/parser';
const results = await parse({
sources: [
{
from: 'path/to/awesome-list.md',
outputs: [
{
artifact: 'domain',
to: 'output/domain.json',
},
{
artifact: 'bookmarks',
to: 'output/bookmarks.html',
},
],
},
],
});
console.log(`Generated ${results.length} artifacts`);Parse from GitHub
Parse directly from a GitHub repository:
import { parse } from '@awesome-pages/parser';
await parse({
githubToken: process.env.GITHUB_TOKEN, // optional, for higher rate limits
sources: [
{
from: 'github://sindresorhus/awesome@main:README.md',
outputs: [
{
artifact: ['domain', 'index', 'bookmarks'],
to: 'dist/{repo}.{artifact}.{ext}',
},
],
},
],
});Generate Artifacts Programmatically
Use individual artifact generators:
import {
parse,
generateBookmarksHtml,
buildIndex
} from '@awesome-pages/parser';
// Parse to get domain object
const results = await parse({
sources: [
{
from: 'awesome.md',
outputs: [{ artifact: 'domain', to: 'domain.json' }],
},
],
});
// Load the domain JSON
import { readFile } from 'fs/promises';
const domain = JSON.parse(await readFile('domain.json', 'utf-8'));
// Generate bookmarks HTML
const bookmarksHtml = generateBookmarksHtml(domain);
// Build search index
const searchIndex = buildIndex(domain);Generate JSON Schema
Get the TypeScript-generated JSON Schema for the Domain v1 format:
import { generateDomainV1JsonSchema } from '@awesome-pages/parser';
const schema = generateDomainV1JsonSchema();
console.log(JSON.stringify(schema, null, 2));Overview
The parser reads Markdown and outputs a validated domain model (DomainV1) via Zod.
From that core model, multiple artifacts can be generated — each designed for a different consumer.
README.md
↓ parse()
DomainV1 JSON
↓ artifacts
├── index.json (inverted index for search)
├── bookmarks.html (browser import)
├── sitemap.xml (SEO discovery)
├── rss.json (modern JSON Feed)
└── rss.xml (classic RSS 2.0)Architecture Diagram
flowchart LR
%% === NODES ===
subgraph Input["Input Sources"]
A1["Local README.md"]
A2["GitHub (via API)"]
A3["HTTP(S) Remote URL"]
end
subgraph Core["Parser Core"]
B1["markdownToAst()"]
B2["extractMetadata()"]
B3["mdastToDomain()"]
B4["validate(DomainV1Schema)"]
end
subgraph Outputs["Output Artifacts"]
C1["domain.json"]
C2["index.json"]
C3["bookmarks.html"]
C4["sitemap.xml"]
C5["rss.json"]
C6["rss.xml"]
C7["data.csv"]
end
%% === FLOW ===
A1 & A2 & A3 --> B1 --> B2 --> B3 --> B4 --> C1 & C2 & C3 & C4 & C5 & C6 & C7
%% === STYLING ===
classDef input fill:#E3F2FD,stroke:#2196F3,stroke-width:2px,color:#0D47A1;
classDef core fill:#E8F5E9,stroke:#4CAF50,stroke-width:2px,color:#1B5E20;
classDef output fill:#FFF8E1,stroke:#FFC107,stroke-width:2px,color:#795548;
class A1,A2,A3 input;
class B1,B2,B3,B4 core;
class C1,C2,C3,C4,C5,C6,C7 output;Examples
Example README.md files are available in the src/tests/fixtures/readmes/ directory. You can test the parser on them, e.g.:
tsx src/cli.ts src/tests/fixtures/readmes/awesome-click-and-use.md output.jsonAvailable Artifacts
The parser can generate multiple types of output artifacts:
1. domain (JSON)
The complete domain model with all metadata, sections, and items in a structured JSON format.
2. index (JSON)
A simplified index of the content, useful for building navigation or search functionality.
3. bookmarks (HTML)
A browser-compatible bookmarks file in the Netscape Bookmark File Format. Can be imported directly into Chrome, Firefox, Edge, and other modern browsers.
4. sitemap (XML)
An XML sitemap following the Sitemap Protocol. Includes all items with valid URLs and can be submitted to search engines like Google and Bing for better indexing.
5. rss-json (JSON Feed)
A feed in JSON Feed v1.1 format. Modern, JSON-based alternative to RSS/Atom, easier to parse in JavaScript applications. Each item with a URL becomes a feed entry.
6. rss-xml (RSS 2.0)
A classic RSS 2.0 XML feed compatible with traditional feed readers like Feedly, Inoreader, and Thunderbird. Each item with a URL becomes a feed entry.
API Reference
parse(options: ParseOptions): Promise<ParseResultFile[]>
Main entry point for parsing awesome lists and generating artifacts.
Parameters:
options.sources: Array of source specificationsoptions.githubToken: Optional GitHub token for API accessoptions.cache: Enable/disable caching (default: true)options.cachePath: Custom cache directoryoptions.concurrency: Number of concurrent operationsoptions.strict: Fail on validation errors
Returns: Array of generated files with metadata
generateDomainV1JsonSchema()
Generates the JSON Schema definition for the Domain v1 format.
generateBookmarksHtml(domain: DomainV1): string
Converts a domain object into browser-compatible bookmarks HTML.
buildIndex(domain: DomainV1): SearchIndex
Builds an inverted search index from a domain object.
TypeScript Support
The library is written in TypeScript and includes complete type definitions. All types are exported for your convenience:
import type {
DomainV1,
SectionV1,
ItemV1,
ParseOptions,
SourceSpec,
OutputTarget,
Artifact,
ParseResultFile,
SearchIndex
} from '@awesome-pages/parser';
// Use types in your code
const source: SourceSpec = {
from: 'awesome.md',
outputs: [
{
artifact: 'domain',
to: 'output.json'
}
]
};
// Domain model types
const section: SectionV1 = {
id: 'tools',
title: 'Tools',
parentId: null,
depth: 1,
order: 0,
path: 'tools',
descriptionHtml: null
};Advanced Usage Example
import { parse } from '@awesome-pages/parser';
await parse({
sources: [
{
from: ['github://user/repo@main:README.md'],
outputs: [
{
artifact: ['domain', 'index'],
to: 'dist/{repo}.{artifact}.json',
},
{
artifact: 'bookmarks',
to: 'dist/{repo}.bookmarks.html',
},
{
artifact: 'sitemap',
to: 'dist/{repo}.sitemap.xml',
},
{
artifact: 'rss-json',
to: 'dist/{repo}.rss.json',
},
{
artifact: 'rss-xml',
to: 'dist/{repo}.rss.xml',
},
],
},
],
});Development
Local Development
# Clone the repository
git clone https://github.com/awesome-pages/parser.git
cd parser
# Install dependencies
pnpm install
# Run tests
pnpm test
# Run tests in watch mode
pnpm run dev:test
# Build the library
pnpm build
# Lint and format
pnpm run lint
pnpm run formatScripts
pnpm test— runs the tests (Vitest)pnpm build— builds ESM and CJS bundles with TypeScript declarationspnpm run dev:test— runs tests in watch modepnpm parse— runs CLI:tsx src/cli.ts src/tests/fixtures/readmes/awesome-click-and-use.md readme.domain.jsonpnpm run lint— lints code with Biomepnpm run format— formats code with Biome
Publishing
This package uses semantic-release for automated versioning and publishing. When commits are merged to the main branch:
- Commit messages are analyzed to determine the version bump (major/minor/patch)
- CHANGELOG.md is automatically generated
- Package version is bumped in package.json
- GitHub release is created with release notes
- Package is published to NPM
Commit Message Format:
Follow Conventional Commits:
feat:— new feature (minor version bump)fix:— bug fix (patch version bump)feat!:orBREAKING CHANGE:— breaking change (major version bump)docs:,chore:,style:,refactor:,perf:,test:— no version bump
Example:
git commit -m "feat: add support for custom cache strategies"
git commit -m "fix: handle malformed markdown sections"
git commit -m "feat!: change parse() API to accept options object"Part of the Awesome Pages ecosystem
This parser powers the Awesome Pages toolchain:
awesome-pages/parser: converts Markdown to structured dataawesome-pages/site: static site generator using parser artifactsawesome-pages/schema: publishes JSON Schema definitions for validation and interoperability
License
MIT
