docx-file-parser
v1.0.1
Published
Simple DOCX/DOC document parser powered by LlamaParse v2
Maintainers
Readme
docx-file-parser
Parse DOCX and DOC (Microsoft Word) files to markdown or text. Powered by LlamaParse v2. Zero dependencies, Node 18+.
Handles .docx, .doc, .dotx, .dot, .docm, .dotm, .rtf, and .odt files out of the box.
Install
npm install docx-file-parserQuick Start
Set your API key:
export LLAMA_CLOUD_API_KEY=llx-...Parse a DOCX file:
import { parse } from "docx-file-parser";
const result = await parse("./document.docx");
console.log(result.markdown);Advanced Usage
import { docx-file-parser } from "docx-file-parser";
const parser = new docx-file-parser({ apiKey: "llx-..." });
// Parse a DOCX file with options
const result = await parser.parse("./report.docx", {
tier: "agentic",
processing_options: { language: "fr" },
});
// Parse a buffer (e.g. from an upload)
const buffer = fs.readFileSync("./contract.docx");
const result = await parser.parse(buffer, {
fileName: "contract.docx",
});Supported Formats
| Extension | Format |
|-----------|--------|
| .docx | Microsoft Word (OpenXML) |
| .doc | Microsoft Word (Legacy) |
| .dotx | Microsoft Word Template (OpenXML) |
| .dot | Microsoft Word Template (Legacy) |
| .docm | Microsoft Word Macro-Enabled |
| .dotm | Microsoft Word Macro-Enabled Template |
| .rtf | Rich Text Format |
| .odt | OpenDocument Text |
| .pdf | PDF |
API
parse(input, options?)
Uploads a document, waits for parsing to complete, and returns the result.
Input: file path (string) or file contents (Buffer | Uint8Array)
Options:
| Option | Type | Default | Description |
|--------|------|---------|-------------|
| tier | string | "fast" | Parsing tier: fast, cost_effective, agentic, agentic_plus |
| version | string | "latest" | API version |
| apiKey | string | env var | Override API key |
| expand | string[] | ["markdown_full", "text_full"] | Fields to expand |
| pollIntervalMs | number | 1000 | Polling interval in ms |
| timeoutMs | number | 300000 | Max wait time in ms |
| fileName | string | "document.docx" | Filename hint for buffer input |
| mimeType | string | auto-detected | MIME type for buffer input |
| signal | AbortSignal | — | Cancellation signal |
| processing_options | object | — | LlamaParse processing options (language, disable_ocr, etc.) |
| agentic_options | object | — | Agentic options (custom_prompt) |
| page_ranges | object | — | Page range options (max_pages, target_pages) |
| disable_cache | boolean | — | Disable document caching |
Returns: ParseResult
interface ParseResult {
markdown: string; // Full markdown output
text: string; // Full text output
job: JobResponse; // Job metadata (id, status, etc.)
_raw: object; // Raw API response
}new docx-file-parser(config?)
Create an instance with explicit configuration.
| Option | Type | Default | Description |
|--------|------|---------|-------------|
| apiKey | string | LLAMA_CLOUD_API_KEY | API key |
| baseUrl | string | https://api.cloud.llamaindex.ai | API base URL |
License
MIT
