npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

ai-file-router

v0.2.2

Published

Auto-detect file type and route through optimal parsing pipeline

Readme

ai-file-router

Auto-detect file type and route through the optimal parsing pipeline.

npm version npm downloads license node TypeScript

Accepts any file -- PDF, DOCX, HTML, CSV, JSON, YAML, Markdown, source code, images, and dozens more -- detects the format via magic bytes, MIME type, and file extension, selects the best parser, and returns clean text or markdown ready for AI consumption. Zero runtime dependencies. Built for RAG pipelines, document processing APIs, and knowledge base construction.

Installation

npm install ai-file-router

Quick Start

import { route } from "ai-file-router";

// Route a file by path
const result = await route("./report.csv");
console.log(result.content);
// | Name | Age | City |
// | --- | --- | --- |
// | Alice | 30 | NYC |

console.log(result.format);       // "markdown"
console.log(result.sourceFormat);  // "csv"
console.log(result.durationMs);   // 2

Route inline content or Buffers just as easily:

// Inline string with a format hint
const json = await route('{"key": "value"}', { format: "json" });
console.log(json.content);
// ```json
// {
//   "key": "value"
// }
// ```

// Buffer with a fileName hint
const buf = Buffer.from("Name,Age\nAlice,30");
const csv = await route(buf, { mimeType: "text/csv" });
console.log(csv.sourceFormat); // "csv"

Features

  • Automatic format detection -- magic bytes, MIME types, file extensions, and content heuristics, scored by confidence.
  • 40+ file formats -- text, markdown, code (35+ languages), JSON, YAML, TOML, CSV, TSV, HTML, XML, SVG, and binary formats (PDF, DOCX, PPTX, XLSX, images).
  • AI-ready output -- every format converts to clean markdown or plain text, ready to drop into an LLM context window.
  • Batch and directory processing -- routeBatch and routeDirectory process many files with configurable concurrency.
  • Extensible parser registry -- register custom parsers that take priority over built-ins.
  • Zero runtime dependencies -- only Node.js built-in modules.
  • Full TypeScript -- strict mode, exported types for every interface.

API Reference

route(source, options?)

Route a single file through the parsing pipeline.

Signature:

function route(source: string | Buffer, options?: RouteOptions): Promise<RouteResult>

Parameters:

| Parameter | Type | Description | | --- | --- | --- | | source | string \| Buffer | A file path, inline string content, or a Buffer. When a string is provided that exists on disk (or looks like a path), it is read as a file. Otherwise it is treated as inline content. | | options | RouteOptions | Optional configuration (see below). |

RouteOptions:

| Field | Type | Default | Description | | --- | --- | --- | --- | | format | string | auto-detect | Override auto-detected format (e.g. "json", "csv", "code"). Sets confidence to 1.0. | | outputFormat | "markdown" \| "text" | parser default | Preferred output format. When set to "text" on a markdown file, markdown syntax is stripped. | | mimeType | string | undefined | MIME type hint for detection (confidence 0.8). | | maxSize | number | 0 (unlimited) | Maximum output size in characters. Content exceeding this limit is truncated and a warning is added. | | fileName | string | undefined | File name hint used for extension-based detection when input is a Buffer or inline string. | | codeOptions.includeHeader | boolean | false | When true, prepends a ## File: <name> (<language>) header to code output. |

Returns: Promise<RouteResult>

interface RouteResult {
  content: string;                // Extracted text or markdown
  format: "markdown" | "text";   // Output format
  sourceFormat: string;           // Detected source format (e.g. "csv", "json", "code", "image")
  mimeType?: string;              // MIME type of the source, if known
  metadata: {
    filePath?: string;            // Original file path
    fileSize?: number;            // File size in bytes
    wordCount?: number;           // Approximate word count of output
    fileName?: string;            // Original file name
  };
  warnings: string[];             // Warnings generated during parsing
  durationMs: number;             // Processing duration in milliseconds
}

Example:

import { route } from "ai-file-router";

const result = await route("./src/index.ts", {
  codeOptions: { includeHeader: true },
});

console.log(result.content);
// ## File: index.ts (typescript)
//
// ```typescript
// import { readFileSync } from 'fs';
// ...
// ```

console.log(result.metadata.wordCount); // 42

routeBatch(sources, options?)

Route multiple files through the parsing pipeline with configurable concurrency.

Signature:

function routeBatch(
  sources: Array<string | Buffer>,
  options?: BatchOptions
): Promise<BatchRouteResult[]>

Parameters:

| Parameter | Type | Description | | --- | --- | --- | | sources | Array<string \| Buffer> | Array of file paths, inline strings, or Buffers. | | options | BatchOptions | All RouteOptions fields plus concurrency. |

BatchOptions (extends RouteOptions):

| Field | Type | Default | Description | | --- | --- | --- | --- | | concurrency | number | 5 | Maximum number of files processed simultaneously. | | include | string[] | all files | Glob patterns to include (used by routeDirectory). | | exclude | string[] | none | Glob patterns to exclude (used by routeDirectory). | | recursive | boolean | true | Whether to scan directories recursively (used by routeDirectory). |

Returns: Promise<BatchRouteResult[]>

interface BatchRouteResult {
  source: string;          // Source path or "(buffer)" for Buffer inputs
  result?: RouteResult;    // Present on success
  error?: string;          // Present on failure
}

Example:

import { routeBatch } from "ai-file-router";

const results = await routeBatch(
  ["./data.csv", "./config.json", "./README.md"],
  { concurrency: 3 }
);

for (const r of results) {
  if (r.result) {
    console.log(`${r.source}: ${r.result.sourceFormat} (${r.result.metadata.wordCount} words)`);
  } else {
    console.error(`${r.source}: ${r.error}`);
  }
}

routeDirectory(dirPath, options?)

Recursively scan a directory and route all files through the parsing pipeline.

Signature:

function routeDirectory(dirPath: string, options?: BatchOptions): Promise<BatchRouteResult[]>

Automatically skips node_modules, .git, __pycache__, .next, dist, and build directories.

Parameters:

| Parameter | Type | Description | | --- | --- | --- | | dirPath | string | Directory path to scan. | | options | BatchOptions | All batch options apply. include/exclude filter file names with glob patterns (* and ** supported). |

Example:

import { routeDirectory } from "ai-file-router";

const results = await routeDirectory("./docs", {
  recursive: true,
  include: ["*.md", "*.txt"],
  exclude: ["*.log"],
  concurrency: 10,
});

console.log(`Processed ${results.length} files`);

detectFormat(options)

Detect a file's format without parsing it. Useful for pre-filtering or routing decisions.

Signature:

function detectFormat(options: {
  content?: Buffer | string;
  filePath?: string;
  mimeType?: string;
  format?: string;
  fileName?: string;
}): FormatInfo

Returns: FormatInfo

interface FormatInfo {
  format: string;              // Format identifier (e.g. "pdf", "csv", "code", "image", "text")
  confidence: number;          // Confidence score from 0 to 1
  method: DetectionMethod;     // "explicit" | "magic-bytes" | "mime-type" | "extension" | "content-heuristic"
  mimeType?: string;           // MIME type if known
  extension?: string;          // File extension with leading dot (e.g. ".csv")
  language?: string;           // Programming language for code files (e.g. "typescript", "python")
  subtype?: string;            // Image subtype (e.g. "png", "jpeg", "svg")
}

Example:

import { detectFormat } from "ai-file-router";

// Detect from file extension
detectFormat({ filePath: "report.pdf" });
// { format: "pdf", confidence: 0.6, method: "extension", mimeType: "application/pdf", extension: ".pdf" }

// Detect from magic bytes (highest confidence for binary formats)
detectFormat({ content: Buffer.from("%PDF-1.7...") });
// { format: "pdf", confidence: 1.0, method: "magic-bytes", mimeType: "application/pdf" }

// Detect from MIME type
detectFormat({ mimeType: "text/csv" });
// { format: "csv", confidence: 0.8, method: "mime-type", mimeType: "text/csv" }

// Detect from content heuristics
detectFormat({ content: '{"key": "value"}' });
// { format: "json", confidence: 0.5, method: "content-heuristic" }

registerParser(parser)

Register a custom parser in the global registry. Custom parsers take priority over all built-in parsers, so you can override default behavior or add support for new formats.

Signature:

function registerParser(parser: Parser): void

Parser interface:

interface Parser {
  name: string;                                              // Human-readable name
  formats: string[];                                         // Format identifiers this parser handles
  canParse(input: ParserInput): boolean;                     // Return true if this parser can handle the input
  parse(input: ParserInput, options?: RouteOptions): Promise<ParserOutput>;  // Parse and return content
}

interface ParserInput {
  content: Buffer | string;     // File content
  formatInfo: FormatInfo;       // Detected format info
  filePath?: string;            // Original file path
  fileName?: string;            // File name
}

interface ParserOutput {
  content: string;              // Parsed content
  format: "markdown" | "text";  // Output format
  warnings?: string[];          // Optional warnings
}

Example:

import { registerParser, route } from "ai-file-router";

registerParser({
  name: "ini-parser",
  formats: ["ini"],
  canParse: (input) => input.formatInfo.format === "ini",
  parse: async (input) => {
    const text = typeof input.content === "string"
      ? input.content
      : input.content.toString("utf-8");

    const lines = text.split("\n");
    const md: string[] = [];
    for (const line of lines) {
      const trimmed = line.trim();
      if (trimmed.startsWith("[") && trimmed.endsWith("]")) {
        md.push(`## ${trimmed.slice(1, -1)}`);
      } else if (trimmed.includes("=")) {
        const [key, ...rest] = trimmed.split("=");
        md.push(`- **${key.trim()}**: ${rest.join("=").trim()}`);
      }
    }
    return { content: md.join("\n"), format: "markdown" };
  },
});

const result = await route("[db]\nhost=localhost\nport=5432", { format: "ini" });
console.log(result.content);
// ## db
// - **host**: localhost
// - **port**: 5432

getRegistry()

Get the global ParserRegistry instance. Useful for inspecting registered parsers and supported formats.

Signature:

function getRegistry(): ParserRegistry

ParserRegistry methods:

| Method | Returns | Description | | --- | --- | --- | | register(parser) | void | Register a custom parser. | | findParser(input) | Parser \| null | Find the best parser for a given input. Checks custom parsers first, then built-ins. | | getAllParsers() | Parser[] | Get all registered parsers (custom first, then built-in). | | getSupportedFormats() | string[] | Get all supported format identifiers. |

Example:

import { getRegistry, ParserRegistry } from "ai-file-router";

// Inspect the global registry
const registry = getRegistry();
console.log(registry.getSupportedFormats());
// ["text", "code", "markdown", "html", "xml", "json", "csv", "tsv", "yaml", "toml", "pdf", "docx", ...]

// Or create a standalone registry
const custom = new ParserRegistry();
custom.register(myParser);

Built-in Parsers

The following parsers are exported for advanced use (e.g. composing custom parsers or testing):

import {
  textParser,
  codeParser,
  markdownParser,
  htmlParser,
  jsonParser,
  csvParser,
  yamlParser,
  binaryParser,
  imageParser,
} from "ai-file-router";

Each implements the Parser interface. See the Supported Formats table for which formats each parser handles.

Supported Formats

| Category | Formats | Output | | --- | --- | --- | | Text | .txt, .log | Plain text (passthrough with whitespace normalization) | | Markdown | .md, .markdown | Markdown (passthrough with cleanup; optional strip to plain text via outputFormat: "text") | | Code | .js, .mjs, .cjs, .ts, .mts, .cts, .jsx, .tsx, .py, .rs, .go, .java, .c, .h, .cpp, .cc, .cxx, .hpp, .hxx, .cs, .rb, .php, .swift, .kt, .kts, .scala, .sh, .bash, .zsh, .sql, .r, .lua, .pl, .pm, .ex, .exs, .erl, .hs, .dart, .vue, .svelte, .css, .scss, .sass, .less, .graphql, .gql, .proto, .tf, Dockerfile, Makefile | Markdown (fenced code block with language tag) | | Data | .json | Markdown (pretty-printed in code fence; large files get a structure summary) | | Data | .yaml, .yml, .toml | Markdown (in code fence) | | Tabular | .csv, .tsv | Markdown (GFM table with RFC 4180 parsing) | | Web | .html, .htm | Markdown (headings, lists, tables, links, bold/italic, blockquotes preserved; scripts and styles removed) | | Web | .xml | Markdown (in code fence) | | Documents | .pdf, .docx, .pptx, .xlsx, .doc, .ppt, .xls, .odt, .rtf | Descriptive message with suggested external packages | | Images | .png, .jpg, .jpeg, .gif, .webp, .bmp, .tiff, .tif | Descriptive message with OCR/multimodal LLM suggestions | | Images | .svg | Markdown (in code fence) | | Archives | .zip, .gzip | Descriptive message with suggested packages |

Binary formats (PDF, DOCX, PPTX, XLSX, images) return a helpful message with suggested external packages. Register a custom parser to add full binary content extraction -- for example, use docling-node-ts for document conversion.

Configuration

Format Detection Priority

Detection uses five signals in strict priority order:

| Priority | Signal | Confidence | Example | | --- | --- | --- | --- | | 1 | Explicit options.format | 1.0 | { format: "json" } | | 2 | Magic bytes (file signature) | 0.9 -- 1.0 | %PDF, PNG header, ZIP header | | 3 | MIME type (options.mimeType) | 0.8 | "text/csv" | | 4 | File extension | 0.6 | .json, .py, .csv | | 5 | Content heuristic | 0.3 -- 0.5 | JSON braces, HTML tags, YAML --- |

The application/octet-stream MIME type is ignored as uninformative.

ZIP magic bytes are disambiguated using the file extension: .docx, .pptx, .xlsx, and .odt are recognized as their respective Office formats rather than generic ZIP.

Output Truncation

Set maxSize to limit output length. Content exceeding the limit is truncated and a warning is added to warnings:

const result = await route(largeContent, { format: "text", maxSize: 5000 });
// result.warnings: ["Output truncated to 5000 characters"]

Directory Scanning

routeDirectory skips these directories by default:

  • node_modules
  • .git
  • __pycache__
  • .next
  • dist
  • build

Error Handling

route never throws. On failure, it returns a RouteResult with an empty content string and error details in warnings:

// Non-existent file
const result = await route("/path/to/missing-file.txt");
console.log(result.content);    // ""
console.log(result.warnings);   // ["Failed to read file: ENOENT: no such file or directory..."]

// Undetectable format (binary without magic bytes or hints)
const result2 = await route(Buffer.from([0x00, 0x01, 0x02]));
console.log(result2.warnings);  // ["Unable to detect file format. Provide a format hint..."]

// No parser for a detected format
const result3 = await route("content", { format: "unknown_format" });
console.log(result3.warnings);  // ["No parser registered for format: unknown_format"]

routeBatch isolates errors per file. Each BatchRouteResult has either a result or an error:

const results = await routeBatch(["./valid.txt", "./missing.txt"]);
for (const r of results) {
  if (r.error) {
    console.error(`Failed: ${r.source} -- ${r.error}`);
  }
}

Invalid JSON content is still returned inside a code fence, with a parse warning:

const result = await route("{broken json}", { format: "json" });
// result.content contains the raw text in a ```json fence
// result.warnings: ["Invalid JSON: ..."]

Advanced Usage

Overriding a Built-in Parser

Custom parsers registered via registerParser take priority over built-ins. To override how JSON is handled, for example:

import { registerParser, route } from "ai-file-router";

registerParser({
  name: "compact-json",
  formats: ["json"],
  canParse: (input) => input.formatInfo.format === "json",
  parse: async (input) => {
    const text = typeof input.content === "string"
      ? input.content
      : input.content.toString("utf-8");
    const parsed = JSON.parse(text);
    const keys = Object.keys(parsed);
    return {
      content: `JSON object with ${keys.length} keys: ${keys.join(", ")}`,
      format: "text",
    };
  },
});

const result = await route('{"a":1,"b":2,"c":3}', { format: "json" });
// result.content: "JSON object with 3 keys: a, b, c"

Pre-filtering with detectFormat

Use detectFormat to inspect files before committing to a full parse:

import { detectFormat, route } from "ai-file-router";

const files = ["report.pdf", "data.csv", "image.png", "notes.md"];

for (const file of files) {
  const info = detectFormat({ filePath: file });
  if (info.format === "image" || info.confidence < 0.5) {
    console.log(`Skipping ${file} (${info.format}, confidence ${info.confidence})`);
    continue;
  }
  const result = await route(file);
  console.log(`${file}: ${result.metadata.wordCount} words`);
}

Processing Buffers from HTTP Responses

import { route } from "ai-file-router";

const response = await fetch("https://example.com/api/report.csv");
const buffer = Buffer.from(await response.arrayBuffer());

const result = await route(buffer, {
  mimeType: response.headers.get("content-type") || undefined,
  fileName: "report.csv",
});

console.log(result.sourceFormat); // "csv"
console.log(result.content);     // Markdown table

Using a Standalone ParserRegistry

For isolated environments (tests, plugins), create a separate registry:

import { ParserRegistry } from "ai-file-router";

const registry = new ParserRegistry();
console.log(registry.getSupportedFormats());
// All built-in formats are available

registry.register(myCustomParser);
const parser = registry.findParser(input);

TypeScript

All types are exported from the package entry point:

import type {
  RouteResult,
  RouteOptions,
  OutputFormat,
  FormatInfo,
  DetectionMethod,
  FileMetadata,
  Parser,
  ParserInput,
  ParserOutput,
  CodeOptions,
  BatchRouteResult,
  BatchOptions,
} from "ai-file-router";

The package is compiled with strict: true and ships declaration files (.d.ts) and declaration maps.

License

MIT