npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

breadchunks

v0.2.2

Published

Heading-aware, token-budgeted semantic chunker for Markdown — for RAG and embedding pipelines.

Readme

breadchunks

CI npm

Heading-aware, token-budgeted semantic chunker for Markdown — for RAG and embedding pipelines.

Native Node.js addon (N-API). Prebuilt binaries for macOS arm64/x64, Linux glibc arm64/x64, and Windows x64. Alpine/musl and Windows arm64 must build from source (requires cloning the repository) via napi build --release.

Install

npm install breadchunks

Usage

import { chunk } from 'breadchunks'

// Chunk a single document
const [chunks] = await chunk([Buffer.from(markdown)], { minLength: 400, maxLength: 2000 })
for (const c of chunks) {
  console.log(`[${c.breadcrumb}]`, c.text.slice(0, 80))
}

// Chunk multiple documents in one call
const results = await chunk([docA, docB, docC])
// results[0] → Chunk[] for docA, results[1] → Chunk[] for docB, …

TypeScript types are included (index.d.ts).

chunk accepts a batch of Buffer | string inputs and returns a Promise<Chunk[][]> — one Chunk[] per input. Buffer is preferred; it avoids a round-trip UTF-8 re-encode from the JS heap.

How it works

Three-phase pipeline:

  1. Phase 1 — Split: Split at header boundaries. Each section becomes its own chunk tagged with a full heading breadcrumb (H1 > H2 > H3). Backtick-fenced code blocks are protected — # comment inside a backtick fence is never treated as a heading.
  2. Phase 2 — Merge same-breadcrumb: Merge adjacent chunks that share the same breadcrumb and are below minLength.
  3. Phase 3 — Parent absorption (bottom-up, h6→h1): Absorb small child sections into their parent when the combined size stays under maxLength.

Supported Markdown: ATX headers only (# H1 through ###### H6). Backtick-fenced code blocks are protected. Setext headers, tilde fences (~~~), and 4-space-indented code are not recognized as structural boundaries — # inside them is treated as a header.

API

chunk(inputs, options?)

function chunk(
  inputs: Array<Buffer | string>,
  options?: ChunkOptions,
): Promise<Array<Array<Chunk>>>

ChunkOptions

| Field | Type | Default | Description | |---|---|---|---| | minLength | number? | 512 | Target minimum chunk size (chars) | | maxLength | number? | 3072 | Hard maximum chunk size (chars) | | phase | number? | 3 | Stop after this phase (1, 2, or 3) | | title | string? | undefined | Document title — prepended to every breadcrumb |

Chunk

| Field | Type | Description | |---|---|---| | level | number | Heading depth (0 = preface, 1–6 = h1–h6) | | header | string? | Text of the nearest heading | | headers | (string \| undefined \| null)[] | Full 6-slot heading stack (h1–h6) | | breadcrumb | string | Human-readable path: "H1 > H2 > H3" | | text | string | Chunk body (without the heading line or breadcrumb). To get the full string an embedding model sees, prepend breadcrumb + "\n\n" when breadcrumb is non-empty. | | length | number | Character count of the full string (including breadcrumb if present) after whitespace collapse. |

Note on length: it counts breadcrumb + "\n\n" + text (or just text if breadcrumb is empty) after collapsing all whitespace runs to a single space — not text.length.

License

MIT