llms-full-unbind

v0.1.3

Published

16 days ago

Unbind llms-full.txt into individual pages.

0High
0Medium
0Low

ota-meshi

llms-full.txt llms-full

llms-full-unbind

Unbind llms-full.txt into individual pages programmatically.

A specialized parser designed to extract pages from the monolithic llms-full.txt format.

This library is primarily intended for use in the llms-full-unbind-mcp package, but can be used in other projects as well.

Usage

Installation

npm install llms-full-unbind

Basic Usage

Fetch the text content and unbind it in one go.

import { unbind } from "llms-full-unbind";

// 1. Fetch the remote llms-full.txt
const response = await fetch("https://example.com/llms-full.txt");
const text = await response.text();

// 2. Unbind into pages
const pages = Array.from(unbind(text));

console.log(`Extracted ${pages.length} pages.`);

Streaming Usage (Recommended)

For large llms-full.txt files, use unbindStream to process data chunk-by-chunk directly from the network response. This minimizes memory usage.

import { unbindStream } from "llms-full-unbind";

const response = await fetch("https://example.com/llms-full.txt");

if (!response.body) {
  throw new Error("Response body is empty");
}

// Pipe the Web Stream directly into the parser
for await (const page of unbindStream(response.body)) {
  console.log(`Processed: ${page.title}`);
  // e.g. Save to DB or display immediately
}

Supported Formats

This library automatically detects and parses five common llms-full.txt formats:

`<doc>` Tag Based Format

This format wraps each page in the <doc> tag with optional metadata attributes. Generated by the llms_txt2ctx CLI from the llms-txt package. Used by fastht.ml and similar projects.

<doc title="Page Title" desc="Optional description">
Content of the page...
</doc>

`<page>` Tag Based Format

This format wraps each page in the <page> tag. Used by cloudflare.com project.

<page>
Content of the page...
</page>

Frontmatter-separated Format

Pages are separated by frontmatter-style metadata blocks. Generated by vitepress-plugin-llms. Used by vuejs.org, vitejs.dev, vitepress.dev and similar VitePress-based projects.

# Page Title {#optional-anchor}

Content of the page...

---
url: /optional/metadata.md
---

# Another Page

More content...

Header and Source URL Format

Each page starts with a markdown header followed by a Source: line indicating the page URL. Generated by Mintlify. Used by modelcontextprotocol.io and bun.sh.

# Page Title
Source: https://example.com/path/to/page

Content of the page...

# Another Page
Source: https://example.com/path/to/another

More content...

H1 Header Based Format

Pages are separated by H1 headers (# Title). Used by projects like svelte.dev, nuxt.com, and docs.astro.build.

# Page Title

Content of the page...

# Another Page

More content...

API

`unbind(content: string): Iterable<Page>`

Parses the entire string synchronously and returns an iterable (Generator) of pages. Use Array.from(unbind(content)) to get an array.

`unbindStream(stream: ReadableStream | AsyncIterable): AsyncIterable<Page>`

Accepts a standard Web ReadableStream (returned by fetch) or any Async Iterable. Yields pages as soon as they are parsed.

Type Definition: `Page`

export interface Page {
  title: string;
  content: string; // The extracted text content
  metadata?: Record<string, unknown>;
}

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

llms-full-unbind

Usage

Installation

Basic Usage

Streaming Usage (Recommended)

Supported Formats

<doc> Tag Based Format

<page> Tag Based Format

Frontmatter-separated Format

Header and Source URL Format

H1 Header Based Format

API

unbind(content: string): Iterable<Page>

unbindStream(stream: ReadableStream | AsyncIterable): AsyncIterable<Page>

Type Definition: Page

License

`<doc>` Tag Based Format

`<page>` Tag Based Format

`unbind(content: string): Iterable<Page>`

`unbindStream(stream: ReadableStream | AsyncIterable): AsyncIterable<Page>`

Type Definition: `Page`