llms-full-unbind
v0.1.2
Published
Unbind llms-full.txt into individual pages.
Downloads
329
Readme
llms-full-unbind
Unbind
llms-full.txtinto individual pages programmatically.
A specialized parser designed to extract pages from the monolithic llms-full.txt format.
This library is primarily intended for use in the llms-full-unbind-mcp package, but can be used in other projects as well.
Usage
Installation
npm install llms-full-unbindBasic Usage
Fetch the text content and unbind it in one go.
import { unbind } from "llms-full-unbind";
// 1. Fetch the remote llms-full.txt
const response = await fetch("https://example.com/llms-full.txt");
const text = await response.text();
// 2. Unbind into pages
const pages = Array.from(unbind(text));
console.log(`Extracted ${pages.length} pages.`);Streaming Usage (Recommended)
For large llms-full.txt files, use unbindStream to process data chunk-by-chunk directly from the network response. This minimizes memory usage.
import { unbindStream } from "llms-full-unbind";
const response = await fetch("https://example.com/llms-full.txt");
if (!response.body) {
throw new Error("Response body is empty");
}
// Pipe the Web Stream directly into the parser
for await (const page of unbindStream(response.body)) {
console.log(`Processed: ${page.title}`);
// e.g. Save to DB or display immediately
}Supported Formats
This library automatically detects and parses five common llms-full.txt formats:
<doc> Tag Based Format
This format wraps each page in the <doc> tag with optional metadata attributes.
Generated by the llms_txt2ctx CLI from the llms-txt package.
Used by fastht.ml and similar projects.
<doc title="Page Title" desc="Optional description">
Content of the page...
</doc><page> Tag Based Format
This format wraps each page in the <page> tag.
Used by cloudflare.com project.
<page>
Content of the page...
</page>Frontmatter-separated Format
Pages are separated by frontmatter-style metadata blocks. Generated by vitepress-plugin-llms. Used by vuejs.org, vitejs.dev, vitepress.dev and similar VitePress-based projects.
# Page Title {#optional-anchor}
Content of the page...
---
url: /optional/metadata.md
---
# Another Page
More content...Header and Source URL Format
Each page starts with a markdown header followed by a Source: line indicating the page URL.
Generated by Mintlify.
Used by modelcontextprotocol.io and bun.sh.
# Page Title
Source: https://example.com/path/to/page
Content of the page...
# Another Page
Source: https://example.com/path/to/another
More content...H1 Header Based Format
Pages are separated by H1 headers (# Title).
Used by projects like svelte.dev, nuxt.com, and docs.astro.build.
# Page Title
Content of the page...
# Another Page
More content...API
unbind(content: string): Iterable<Page>
Parses the entire string synchronously and returns an iterable (Generator) of pages. Use Array.from(unbind(content)) to get an array.
unbindStream(stream: ReadableStream | AsyncIterable): AsyncIterable<Page>
Accepts a standard Web ReadableStream (returned by fetch) or any Async Iterable. Yields pages as soon as they are parsed.
Type Definition: Page
export interface Page {
title: string;
content: string; // The extracted text content
metadata?: Record<string, unknown>;
}License
MIT
