@exellix/narrix-adapter-docs
v2.0.0
Published
Documents adapter: structured docs to CNI v1.1
Readme
@exellix/narrix-adapter-docs
Adapter that converts multi-page documents into CNI v1.1 output deterministically. Translator only (no facts or signals). Part of the Narrix adapters ecosystem.
Where This Package Fits
@exellix/narrix-cni Schema & types (CNI v1.1)
▲
│
@exellix/narrix-adapters-core Shared algorithms
▲
│
@exellix/narrix-adapter-docs ◄── YOU ARE HERE
Multi-page docs → CNI v1.1 (byPage, bySection, byLength)Depends on @exellix/narrix-cni and @exellix/narrix-adapters-core. Golden tests D1–D6 live in this package.
Install
npm install @exellix/narrix-adapter-docsRegistry: This package is published to GitHub Packages. Ensure your .npmrc includes:
@exellix:registry=https://npm.pkg.github.com
//npm.pkg.github.com/:_authToken=<YOUR_TOKEN>Features
- Default: one content item per page (
byPage) - Optional:
bySection— split by Markdown headings (# ...) and ALL CAPS heading lines - Optional:
byLength— chunk long pages with overlap and a global chunk index across all pages - Truncation: drop whole pages from the end (never mid-page) via
maxTotalChars - Stable IDs:
docId/pageIdand deterministic fingerprint whendocIdis missing - Reference dedup: URL vs hostname (hostname dropped when contained in a URL span)
Constraints
- Deterministic — same input + same options ⇒ byte-identical output
- No randomness — no
Date.now(), noMath.random() - No timestamps in IDs or hashes (only in optional metadata if caller provides)
Usage
import { toCni } from "@exellix/narrix-adapter-docs";
const input = {
docId: "my-doc-1",
title: "My Document",
pages: [
{ pageId: "p1", pageNumber: 1, text: "Page one content." },
{ pageId: "p2", pageNumber: 2, text: "Page two content." },
],
};
const result = toCni(input, {
adapterId: "@exellix/narrix-adapter-docs",
adapterVersion: "1.0.0",
kind: "docs",
docs: { strategy: "byPage" },
});
console.log(result.cni.schema); // "cni.v1.1"
console.log(result.cni.content); // one item per page
console.log(result.diagnostics);API
toCni(input: DocInput, options?: DocsAdapterOptions): AdapterResult— main entry pointadapter: NarrixAdapter<DocInput>— adapter object withkind,adapterId,version,toCniDocInput,DocPage,DocsAdapterOptions— exported types
Options
| Option | Default | Description |
|--------|---------|-------------|
| docs.strategy | "byPage" | "byPage" | "bySection" | "byLength" |
| docs.maxTotalChars | — | Truncate by dropping pages from the end until total chars ≤ this |
| docs.enableMarkdownHeadings | true | Split on # Heading when mime === "text/markdown" |
| docs.enableAllCapsHeadings | true | Split on ALL CAPS heading lines (4–80 chars, ≥60% letters) |
| docs.maxChunkChars | 4000 | For byLength: max chars per chunk |
| docs.overlapChars | 200 | For byLength: overlap between chunks |
| docs.maxEntities | — | Cap extracted references per content (uses core default if unset) |
Input types
- DocInput:
docId?,title?,pages: DocPage[],sourceMeta?,meta? - DocPage:
pageId?,pageNumber?,index?,text,mime?("text/plain"|"text/markdown"),meta?
Output
- CNI v1.1:
schema,subject,content,references?,facts: [],signals: [],meta - subject.type:
"document" - sourceRef:
page:X,page:X:section:Y, orpage:X:chunk:Y(withglobalChunkIndexfor byLength) - diagnostics.stats:
contentItems,totalChars,chunks,entitiesExtracted
Scripts
npm run build # compile TypeScript to dist/
npm test # build + run golden tests (D1–D6)
npm run generate-expected # regenerate expected fixtures (after changing adapter)Golden tests
Golden tests follow input → toCni() → deepStrictEqual(expected). No tolerance. Exact JSON equality.
| Case | Description | |------|-------------| | D1 | Simple 2-page doc; CVE, IP, URL; hostname deduped when nested in URL | | D2 | No docId → hash-based subject.id + DOCS_NO_DOC_ID warning | | D3 | Empty pages → content: [] + DOCS_EMPTY_PAGES warning | | D4 | bySection → Markdown headings split into 3 sections | | D5 | byLength → short page stays page, long page chunked into 2 | | D6 | pageId → stable contentId via sha256(docId + "|" + pageId) |
Fixtures: test/fixtures/case-Dn.input.json, case-Dn.options.json (optional), case-Dn.expected.json
Publishing (private npm via GitHub Packages)
- Create a
.npmrcin the project root with@exellix:registryand_authToken..npmrcis gitignored. - Run
npm run buildthennpm publish. The package is published to GitHub Packages withaccess: "restricted".
License
UNLICENSED (private).
