@ambicuity/any-to-markdown
v1.0.0
Published
Convert common documents, web pages, archives, media metadata, and text formats to Markdown.
Readme
any-to-markdown
@ambicuity/any-to-markdown is a TypeScript and Node.js package for converting common files, web content, archives, and selected media metadata into Markdown for indexing, retrieval, text analysis, and LLM workflows.
The package exposes both a command-line tool and a strict TypeScript API.
Install
npm install @ambicuity/any-to-markdownpnpm add @ambicuity/any-to-markdownyarn add @ambicuity/any-to-markdownCommand Line
any-to-markdown path-to-file.pdf > document.mdany-to-markdown path-to-file.docx -o document.mdcat path-to-file.html | any-to-markdown --extension htmlUseful options:
--output <file>writes Markdown to a file.--extension <extension>provides a file-extension hint for stdin.--mime-type <mimeType>provides a MIME-type hint.--charset <charset>provides a text decoding hint.--keep-data-urispreserves full data URIs in HTML-derived Markdown.--llm-caption-imagesenables LLM captioning of embedded images in DOCX/PPTX/XLSX/EPUB (requires a programmaticllmClient).--llm-pdf-pagessends the PDF buffer to the configured LLM when extracted text is sparse.--llm-audio-model <model>sets the audio transcription model id (e.g.whisper-1).--mcpruns the package as a Model Context Protocol stdio server.--versionprints the package version.
TypeScript API
import { AnyToMarkdown } from "@ambicuity/any-to-markdown";
const converter = new AnyToMarkdown();
const result = await converter.convert("report.docx");
console.log(result.markdown);JavaScript API
import { AnyToMarkdown } from "@ambicuity/any-to-markdown";
const converter = new AnyToMarkdown();
const result = await converter.convert("report.xlsx");
console.log(result.textContent);Supported Inputs
The built-in converter set includes support for:
- Plain text, Markdown, JSON, XML, YAML, and similar text formats
- CSV tables
- HTML and XHTML
- DOCX
- XLSX and XLS
- PPTX
- PDF text extraction
- EPUB
- ZIP archives
- RTF
- Image and audio metadata when ExifTool is available
- Optional LLM image captioning through a caller-provided compatible client
- Optional LLM audio transcription, embedded-image captioning, and PDF augmentation
- Optional OCR helper converters through a caller-provided OCR service
- Optional MCP server with
convert_uri,convert_local,convert_stream, andlist_converterstools
Some formats depend on the fidelity of the underlying Node.js ecosystem libraries. Where exact output layout differs from another implementation, the goal is stable Markdown with equivalent content and behavior.
API Overview
import {
AnyToMarkdown,
StreamInfo,
type DocumentConverter
} from "@ambicuity/any-to-markdown";
const engine = new AnyToMarkdown();
await engine.convert("file.pdf");
await engine.convertLocal("file.pdf");
await engine.convertUri("data:text/plain,hello");
await engine.convertStream(Buffer.from("hello"), {
streamInfo: new StreamInfo({ extension: ".txt" })
});
const customConverter: DocumentConverter = {
accepts: (_input, info) => info.extension === ".custom",
convert: () => ({ markdown: "custom markdown", textContent: "custom markdown", toString: () => "custom markdown" })
};
engine.registerConverter(customConverter);Security Considerations
any-to-markdown performs I/O with the privileges of the current process. Validate untrusted file paths and URLs before converting them. Prefer the narrowest API that fits your workflow: use convertLocal for local files, convertStream for already-opened content, and convertUri only when URI fetching is intended.
Package Metadata
- Package:
@ambicuity/any-to-markdown - Version:
1.0.0 - Author: Ritesh Rana
- Email: [email protected]
License
MIT
