docx-file-parser

v1.0.1

Published

6 days ago

Simple DOCX/DOC document parser powered by LlamaParse v2

0High
0Medium
0Low

hexapode

docx doc word microsoft-word parser document llamaparse ocr markdown converter

docx-file-parser

Parse DOCX and DOC (Microsoft Word) files to markdown or text. Powered by LlamaParse v2. Zero dependencies, Node 18+.

Handles .docx, .doc, .dotx, .dot, .docm, .dotm, .rtf, and .odt files out of the box.

Install

npm install docx-file-parser

Quick Start

Set your API key:

export LLAMA_CLOUD_API_KEY=llx-...

Parse a DOCX file:

import { parse } from "docx-file-parser";

const result = await parse("./document.docx");
console.log(result.markdown);

Advanced Usage

import { docx-file-parser } from "docx-file-parser";

const parser = new docx-file-parser({ apiKey: "llx-..." });

// Parse a DOCX file with options
const result = await parser.parse("./report.docx", {
  tier: "agentic",
  processing_options: { language: "fr" },
});

// Parse a buffer (e.g. from an upload)
const buffer = fs.readFileSync("./contract.docx");
const result = await parser.parse(buffer, {
  fileName: "contract.docx",
});

Supported Formats

| Extension | Format | |-----------|--------| | .docx | Microsoft Word (OpenXML) | | .doc | Microsoft Word (Legacy) | | .dotx | Microsoft Word Template (OpenXML) | | .dot | Microsoft Word Template (Legacy) | | .docm | Microsoft Word Macro-Enabled | | .dotm | Microsoft Word Macro-Enabled Template | | .rtf | Rich Text Format | | .odt | OpenDocument Text | | .pdf | PDF |

API

`parse(input, options?)`

Uploads a document, waits for parsing to complete, and returns the result.

Input: file path (string) or file contents (Buffer | Uint8Array)

Options:

| Option | Type | Default | Description | |--------|------|---------|-------------| | tier | string | "fast" | Parsing tier: fast, cost_effective, agentic, agentic_plus | | version | string | "latest" | API version | | apiKey | string | env var | Override API key | | expand | string[] | ["markdown_full", "text_full"] | Fields to expand | | pollIntervalMs | number | 1000 | Polling interval in ms | | timeoutMs | number | 300000 | Max wait time in ms | | fileName | string | "document.docx" | Filename hint for buffer input | | mimeType | string | auto-detected | MIME type for buffer input | | signal | AbortSignal | — | Cancellation signal | | processing_options | object | — | LlamaParse processing options (language, disable_ocr, etc.) | | agentic_options | object | — | Agentic options (custom_prompt) | | page_ranges | object | — | Page range options (max_pages, target_pages) | | disable_cache | boolean | — | Disable document caching |

Returns: ParseResult

interface ParseResult {
  markdown: string;     // Full markdown output
  text: string;         // Full text output
  job: JobResponse;     // Job metadata (id, status, etc.)
  _raw: object;         // Raw API response
}

`new docx-file-parser(config?)`

Create an instance with explicit configuration.

| Option | Type | Default | Description | |--------|------|---------|-------------| | apiKey | string | LLAMA_CLOUD_API_KEY | API key | | baseUrl | string | https://api.cloud.llamaindex.ai | API base URL |

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

docx-file-parser

Install

Quick Start

Advanced Usage

Supported Formats

API

parse(input, options?)

new docx-file-parser(config?)

License

`parse(input, options?)`

`new docx-file-parser(config?)`