@larkiny/astro-github-loader
v0.11.2
Published
Load content from GitHub repositories into Astro content collections with asset management and content transformations
Maintainers
Readme
Astro GitHub Loader
Load content from GitHub repositories into Astro content collections with flexible pattern-based import, asset management, content transformations, and intelligent change detection.
Features
- 🎯 Pattern-Based Import - Use glob patterns to selectively import content with per-pattern configuration
- 🖼️ Asset Management - Automatically download and transform asset references in markdown files
- 🛠️ Content Transforms - Apply custom transformations to content during import, with pattern-specific transforms
- ⚡ Intelligent Change Detection - Ref-aware commit tracking that only triggers re-imports when your target branch/tag actually changes
- 🔒 Stable Imports - Non-destructive approach that preserves local content collections
- 🚀 Optimized Performance - Smart directory scanning to minimize GitHub API calls
Quick Start
import { defineCollection } from "astro:content";
import { docsLoader } from "@astrojs/starlight/loaders";
import { docsSchema } from "@astrojs/starlight/schema";
import {
githubLoader,
createOctokitFromEnv,
type ImportOptions,
type LoaderContext,
} from "@larkiny/astro-github-loader";
const REMOTE_CONTENT: ImportOptions[] = [
{
name: "Documentation",
owner: "your-org",
repo: "your-docs-repo",
ref: "main",
includes: [
{
pattern: "docs/**/*.md",
basePath: "src/content/docs/imported",
},
],
},
];
// Automatically uses GitHub App or Personal Access Token based on env vars
const octokit = createOctokitFromEnv();
export const collections = {
docs: defineCollection({
loader: {
name: "docs",
load: async (context) => {
await docsLoader().load(context);
for (const config of REMOTE_CONTENT) {
await githubLoader({
octokit,
configs: [config],
clear: config.clear,
dryRun: false, // Set to true for change detection only
}).load(context as LoaderContext);
}
},
},
schema: docsSchema(),
}),
};Authentication
The loader supports two authentication methods with different rate limits:
| Method | Rate Limit | Best For | |--------|-----------|----------| | GitHub App (Recommended) | 15,000 requests/hour | Production, large imports, organizational use | | Personal Access Token | 5,000 requests/hour | Development, small imports |
Option 1: GitHub App Authentication (Recommended - 3x Rate Limit)
Step 1: Create a GitHub App
- Go to GitHub Settings → Developer settings → GitHub Apps → New GitHub App
- Fill in the required fields:
- GitHub App name:
your-org-docs-loader(or any name) - Homepage URL: Your documentation site URL
- Webhook: Uncheck "Active" (not needed)
- GitHub App name:
- Set Repository permissions:
- Contents: Read-only
- Click Create GitHub App
Step 2: Generate Private Key
- In your GitHub App settings, scroll to "Private keys"
- Click Generate a private key
- Save the downloaded
.pemfile securely
Step 3: Install the App
- In your GitHub App settings, click Install App
- Select your organization or personal account
- Choose All repositories or Only select repositories
- Note the Installation ID from the URL:
https://github.com/settings/installations/{installation_id}
Step 4: Configure Environment Variables
# .env
GITHUB_APP_ID=123456
GITHUB_APP_INSTALLATION_ID=12345678
# For the private key, you have two options:
# Option A: Direct PEM content (multiline)
GITHUB_APP_PRIVATE_KEY="-----BEGIN RSA PRIVATE KEY-----
MIIEpAIBAAKCAQEA...
...
-----END RSA PRIVATE KEY-----"
# Option B: Base64 encoded (single line - easier for .env files)
# Run: cat your-app.private-key.pem | base64 | tr -d '\n'
GITHUB_APP_PRIVATE_KEY="LS0tLS1CRUdJTiBSU0EgUFJJVkFURSBLRVktLS0..."Step 5: Use in Your Config
import { createOctokitFromEnv } from "@larkiny/astro-github-loader";
// Automatically uses GitHub App if env vars are set
const octokit = createOctokitFromEnv();Option 2: Personal Access Token (PAT)
Step 1: Create a Token
- Go to GitHub Settings → Developer settings → Personal access tokens → Tokens (classic)
- Click Generate new token (classic)
- Select scopes:
public_repo(for public repositories)repo(for private repositories)
- Generate and copy the token
Step 2: Configure Environment Variable
# .env
GITHUB_TOKEN=ghp_your_token_hereStep 3: Use in Your Config
import { createOctokitFromEnv } from "@larkiny/astro-github-loader";
// Automatically falls back to PAT if GitHub App vars aren't set
const octokit = createOctokitFromEnv();Manual Authentication (Advanced)
For more control, you can manually create the Octokit instance:
import { createAuthenticatedOctokit } from "@larkiny/astro-github-loader";
// GitHub App (explicit)
const octokit = createAuthenticatedOctokit({
appId: process.env.GITHUB_APP_ID!,
privateKey: process.env.GITHUB_APP_PRIVATE_KEY!,
installationId: process.env.GITHUB_APP_INSTALLATION_ID!,
});
// Personal Access Token (explicit)
const octokit = createAuthenticatedOctokit({
token: process.env.GITHUB_TOKEN!,
});Multi-Ref Configuration Example
Track multiple git references from the same repository independently:
import { defineCollection } from "astro:content";
import { docsLoader } from "@astrojs/starlight/loaders";
import { docsSchema } from "@astrojs/starlight/schema";
import {
githubLoader,
createOctokitFromEnv,
type ImportOptions,
} from "@larkiny/astro-github-loader";
const MULTI_REF_CONTENT: ImportOptions[] = [
{
name: "Stable Docs",
owner: "myorg",
repo: "docs",
ref: "v2.0.0", // Immutable tag - never re-imports
includes: [
{
pattern: "docs/**/*.md",
basePath: "src/content/docs/v2",
},
],
},
{
name: "Latest Docs",
owner: "myorg",
repo: "docs",
ref: "main", // Live branch - re-imports only on main commits
includes: [
{
pattern: "docs/**/*.md",
basePath: "src/content/docs/latest",
},
],
},
{
name: "Beta Features",
owner: "myorg",
repo: "docs",
ref: "beta", // Feature branch - ignores main/other branch commits
includes: [
{
pattern: "experimental/**/*.md",
basePath: "src/content/docs/beta",
},
],
},
];
const octokit = createOctokitFromEnv();
export const collections = {
docs: defineCollection({
loader: {
name: "docs",
load: async (context) => {
await docsLoader().load(context);
// Each config is tracked independently by ref
for (const config of MULTI_REF_CONTENT) {
await githubLoader({
octokit,
configs: [config],
dryRun: false,
}).load(context);
}
},
},
schema: docsSchema(),
}),
};In this example:
- Stable docs (v2.0.0 tag): Never re-imports, provides stable reference
- Latest docs (main branch): Only re-imports when main branch changes
- Beta features (beta branch): Only re-imports when beta branch changes
Commits to develop, feature-xyz, or any other branches are completely ignored by all three configs.
Processing Pipeline
The astro-github-loader processes files through a well-defined pipeline with clear order of operations:
To understand more about the content processing flow, see the detailed guide.
Order of Operations
- File Discovery and Collection: Scan repository using include patterns and fetch file contents
- Individual File Processing: For each file:
- Apply asset processing (download and transform asset references)
- Apply path mappings to determine target paths
- Apply content transformations (global transforms, then pattern-specific transforms)
- Global Link Transformation: Process all markdown links across all imported files using
linkMappings - File Storage: Write processed files to Astro content store
Path vs Link Transformations
Understanding when and why to use each type of transformation:
pathMappings: Controls where files are imported to (changes file system paths)- Applied during import process
- Affects the final location of files on disk
- Use when: You need to restructure the imported files differently than they exist in the source repository
- Example:
'docs/capabilities/': 'docs/'moves files from capabilities folder up one level
linkMappings: Controls how markdown links are transformed (changes URLs in content)- Applied after all content is imported
- Affects links within markdown content
- Use when: You have restructured files (with
pathMappings) OR need to handle links to files outside the imported document set - Example: Transform
../cli/index.mdto/reference/algokit-cli/(external reference)
Pattern-Based Import System
The includes system allows you to define multiple import patterns, each with its own destination path and transforms:
const REMOTE_CONTENT: ImportOptions[] = [
{
name: "Multi-Pattern Import",
owner: "your-org",
repo: "your-docs-repo",
includes: [
// Import main documentation with path restructuring
{
pattern: "docs/**/*.md",
basePath: "src/content/docs/guides",
pathMappings: {
// Move files from capabilities subfolder up one level
"docs/capabilities/": "docs/",
// Rename specific files
"docs/README.md": "docs/overview.md",
},
transforms: [addGuideMetadata],
},
// Import API reference to different location
{
pattern: "api-reference/**/*.md",
basePath: "src/content/docs/api",
pathMappings: {
// Flatten API structure
"api-reference/v1/": "api-reference/",
},
transforms: [addApiMetadata, formatApiDocs],
},
// Import specific files
{
pattern: "README.md",
basePath: "src/content/docs",
transforms: [convertReadmeToOverview],
},
],
},
];Pattern Features
- Glob patterns: Use
**/*.md,docs/guides/*.md, specific files, etc. - Per-pattern basePath: Each pattern can target a different local directory
- Per-pattern transforms: Apply different transformations to different content types
- Per-pattern pathMappings: Restructure file paths within each pattern
- Directory structure preservation: Relative paths within patterns are preserved
Path Mappings
Use pathMappings to restructure files during import.
Common use cases:
- Flatten nested folder structures (e.g., move
docs/capabilities/files todocs/) - Rename specific files (e.g.,
README.md→overview.md) - Reorganize content for better site structure
- Remove unwanted path segments from imported files
{
pattern: "docs/**/*.md",
basePath: "src/content/docs/guides",
pathMappings: {
// File mappings (exact paths)
'docs/README.md': 'docs/overview.md',
'docs/getting-started.md': 'docs/quickstart.md',
// Folder mappings (require trailing slash)
'docs/capabilities/': 'docs/', // Move all files up one level
'docs/legacy/guides/': 'docs/archive/', // Move to different folder
},
}Important: Folder mappings require trailing slashes to distinguish from file mappings:
- ✅
'docs/capabilities/': 'docs/'(folder mapping - moves all files) - ❌
'docs/capabilities': 'docs/'(treated as exact file match)
Common Pattern Examples
- **
**/\*.md** - All markdown files in the repository - **
docs/**/\*** - All files in the docs directory and subdirectories guides/*.md- Only markdown files directly in the guides directory- **
api-reference/**/\*.{md,mdx}** - Markdown and MDX files in api-reference README.md- Specific file at repository rootdocs/getting-started.md- Specific file at specific path
Content & Link Transformations
The loader supports both content transformations (modifying file contents) and link transformations (fixing cross-references):
Content Transformations
Apply content transformations globally or per-pattern.
Use content transforms when you need to:
- Add frontmatter (metadata) to imported files
- Convert H1 headings to frontmatter titles
- Add import tracking information
- Modify content structure or formatting
- Add badges, labels, or other metadata specific to your site
import { githubLoader } from "@larkiny/astro-github-loader";
import type { TransformFunction } from "@larkiny/astro-github-loader";
// Global transform functions
const addImportMetadata: TransformFunction = (content, context) => {
return `---
imported_from: ${context.options.owner}/${context.options.repo}
original_path: ${context.path}
imported_at: ${new Date().toISOString()}
---
${content}`;
};
// Pattern-specific transform
const addApiDocsBadge: TransformFunction = (content, context) => {
const lines = content.split("\n");
const frontmatterEnd = lines.findIndex((line, i) => i > 0 && line === "---");
if (frontmatterEnd > 0) {
lines.splice(frontmatterEnd, 0, "sidebar:", ' badge: "API"');
}
return lines.join("\n");
};
// Convert H1 to title frontmatter
const convertH1ToTitle: TransformFunction = (content, context) => {
const h1Match = content.match(/^#\s+(.+)$/m);
if (h1Match) {
const title = h1Match[1];
// Remove the H1 from content
content = content.replace(/^#\s+.+$/m, "").trim();
// Add to frontmatter
const frontmatterMatch = content.match(/^---\n([\s\S]*?)\n---/);
if (frontmatterMatch) {
const existingFrontmatter = frontmatterMatch[1];
const newFrontmatter = `---\ntitle: "${title}"\n${existingFrontmatter}\n---`;
content = content.replace(/^---\n[\s\S]*?\n---/, newFrontmatter);
} else {
content = `---\ntitle: "${title}"\n---\n\n${content}`;
}
}
return content;
};Link Transformations
Configure link transformations to handle cross-repository links and restructured file references.
Use link mappings when:
- You've restructured files with
pathMappingsand need to update internal links - Links reference files outside the imported document set (external repositories, different sections)
- Links need to be transformed for your site's URL structure (e.g., Starlight routing)
- You need to handle broken or outdated links in the source content
import { createStarlightLinkMappings } from "./transforms/links.js";
const REMOTE_CONTENT: ImportOptions[] = [
{
name: "Docs with Full Transformations",
owner: "your-org",
repo: "docs-repo",
// Global content transforms applied to all includes
transforms: [addImportMetadata, convertH1ToTitle],
includes: [
{
pattern: "docs/**/*.md",
basePath: "src/content/docs/guides",
pathMappings: {
"docs/capabilities/": "docs/",
"docs/README.md": "docs/overview.md",
},
// Pattern-specific content transforms
transforms: [addGuideFormatting],
},
{
pattern: "api/**/*.md",
basePath: "src/content/docs/api",
transforms: [addApiDocsBadge, formatApiContent],
},
],
// Link transformations (applied after content transforms)
linkTransform: {
stripPrefixes: ["src/content/docs"],
linkMappings: [
// Apply Starlight-specific link transformations
...createStarlightLinkMappings(),
// Custom link mappings for external references
{
pattern: /^\.\.\/cli\/?$/,
replacement: (match: string, anchor: string) => {
return `/reference/algokit-cli`;
},
global: true,
description: "Map CLI reference links to reference section",
},
// Transform README links to introduction
{
pattern: /^\.\.\/\.\.\/README\.md$/,
replacement: (match: string, anchor: string) => {
return `/introduction`;
},
global: true,
description: "Map README links to introduction page",
},
],
},
},
];Link Transformation Utilities
Handle markdown links with anchor fragments using built-in utilities:
import {
createLinkTransform,
extractAnchor,
removeMarkdownExtension,
} from "@larkiny/astro-github-loader";
const linkTransform = createLinkTransform({
baseUrl: "/docs/imported",
pathTransform: (path, context) => {
const { path: cleanPath, anchor } = extractAnchor(path);
// Custom link handling logic
if (cleanPath === "README.md") {
return `/docs/imported/overview${anchor}`;
}
// Use utility to remove .md extension and preserve anchors
return `/docs/imported/${removeMarkdownExtension(path)}`;
},
});Link Transform Utilities
extractAnchor(path)- Returns{path, anchor}separating the anchor fragmentremoveMarkdownExtension(path)- Removes.md/.mdxextensions while preserving anchorscreateLinkTransform(options)- Main transform with custom path handling
Asset Import and Management
Automatically detect, download, and transform asset references:
const REMOTE_CONTENT_WITH_ASSETS: ImportOptions[] = [
{
name: "Docs with Assets",
owner: "your-org",
repo: "docs-repo",
includes: [
{
pattern: "documentation/**/*.md",
basePath: "src/content/docs/imported",
},
],
// Asset configuration
assetsPath: "src/assets/imported",
assetsBaseUrl: "~/assets/imported", // or "/assets/imported"
assetPatterns: [".png", ".jpg", ".jpeg", ".gif", ".svg", ".webp"],
},
];Asset Management Features
- Automatic detection: Finds image references in markdown
- Smart downloading: Only downloads assets that have changed
- Path transformation: Updates markdown to use local asset paths
- Multiple formats: Supports various image formats
File Management Strategy
⚠️ Important: Do not use
clear: trueThe
clear: trueoption should not be used with the current implementation due to how Astro content collection syncing works. Mass file deletions can cause Astro to invalidate entire content collections, leading to 404 errors and build instability.Instead: If you need to handle file deletions, renames, or path restructuring from the source repository:
- Manually delete the local import folders (e.g.,
src/content/docs/imported)- Re-run the import process
- Fresh content will be imported with the new structure
This approach ensures your site remains stable while handling structural changes.
Change Detection & Dry-Run Mode
Check for repository changes without importing:
// In your content config
await githubLoader({
octokit,
configs: REMOTE_CONTENT,
clear: false,
dryRun: process.env.IMPORT_DRY_RUN === "true",
}).load(context);Setting up Dry-Run Scripts
Add to your package.json:
{
"scripts": {
"import:check": "IMPORT_DRY_RUN=true astro sync"
}
}Dry-Run Output
npm run import:check
# Output:
📊 Repository Import Status:
✅ Documentation: Up to date
Last imported: 2 hours ago
🔄 API Reference: Needs re-import
Latest commit: Add new endpoints
Committed: 30 minutes ago
Last imported: 1 day agoHow Change Detection Works
The loader uses intelligent, ref-aware change detection:
- Per-ref tracking: Each
owner/repo@refcombination is tracked separately - Branch isolation: Commits to other branches are completely ignored
- Tag immutability: Fixed tags (e.g.,
v1.0.0) never trigger re-imports - Efficient checking: Only the latest commit of your target ref is checked
Examples:
- Config tracking
mainbranch → onlymaincommits trigger re-import - Config tracking
v2.1.0tag → never re-imports (tags are immutable) - Config tracking
feature-branch→ ignores commits tomain,develop, etc. - Multiple configs for same repo with different refs → tracked independently
This means you can safely track multiple refs from the same repository without unnecessary re-imports when unrelated branches change.
Configuration Options
ImportOptions Interface
interface ImportOptions {
/** Display name for this configuration (used in logging) */
name?: string;
/** GitHub repository owner */
owner: string;
/** GitHub repository name */
repo: string;
/** Git reference (branch, tag, or commit SHA) */
ref?: string; // defaults to "main"
/** Whether this configuration is enabled */
enabled?: boolean; // defaults to true
/** Whether to clear content store (recommend: false) */
clear?: boolean; // defaults to false
/** Array of transform functions applied to all includes */
transforms?: TransformFunction[];
/** Pattern-based import configuration */
includes?: IncludePattern[];
/** Asset management options */
assetsPath?: string; // Local directory for downloaded assets
assetsBaseUrl?: string; // Base URL for asset references
assetPatterns?: string[]; // File extensions to treat as assets
/** Link transformation options (applied after all content transforms) */
linkTransform?: ImportLinkTransformOptions;
}
interface ImportLinkTransformOptions {
/** Base paths to strip from final URLs (e.g., ["src/content/docs"]) */
stripPrefixes: string[];
/** Link mappings to transform URLs in markdown links */
linkMappings?: LinkMapping[];
}
interface LinkMapping {
/** Pattern to match (string or regex) */
pattern: string | RegExp;
/** Replacement string or function */
replacement:
| string
| ((match: string, anchor: string, context: any) => string);
/** Apply to all links, not just unresolved internal links (default: false) */
global?: boolean;
/** Description for debugging (optional) */
description?: string;
}
interface IncludePattern {
/** Glob pattern to match files (relative to repository root) */
pattern: string;
/** Local base path where matching files should be imported */
basePath: string;
/** Transforms to apply only to files matching this pattern */
transforms?: TransformFunction[];
/**
* Map of source paths to target paths for controlling where files are imported.
*
* Supports two types of mappings:
* - **File mapping**: `'docs/README.md': 'docs/overview.md'` - moves a specific file to a new path
* - **Folder mapping**: `'docs/capabilities/': 'docs/'` - moves all files from source folder to target folder
*
* **Important**: Folder mappings require trailing slashes to distinguish from file mappings.
* - ✅ `'docs/capabilities/': 'docs/'` (folder mapping - moves all files)
* - ❌ `'docs/capabilities': 'docs/'` (treated as exact file match)
*/
pathMappings?: Record<string, string>;
}Transform Function Interface
interface TransformContext {
/** Generated ID for the content */
id: string;
/** File path within the repository */
path: string;
/** Full configuration options */
options: ImportOptions;
/** Information about which include pattern matched (if any) */
matchedPattern?: MatchedPattern;
}
type TransformFunction = (content: string, context: TransformContext) => string;Performance Optimizations
The loader includes several optimizations:
- Smart directory scanning: Only scans directories that match include patterns
- Efficient API usage: Minimizes GitHub API calls through targeted requests
- Ref-aware change detection: Tracks commit SHA for specific git references (branches/tags) to avoid unnecessary downloads when unrelated branches change
- Concurrent processing: Downloads and processes files in parallel
Installation & Setup
npm install @larkiny/astro-github-loaderSet up your authentication in .env:
# Option 1: GitHub App (recommended - 15,000 requests/hour)
GITHUB_APP_ID=123456
GITHUB_APP_INSTALLATION_ID=12345678
GITHUB_APP_PRIVATE_KEY="-----BEGIN RSA PRIVATE KEY-----..."
# Option 2: Personal Access Token (5,000 requests/hour)
GITHUB_TOKEN=ghp_your_token_hereSee the Authentication section for detailed setup instructions.
License
MIT - See LICENSE file for details.
