tinyweb-office-words
v1.0.0
Published
Lightweight free open-source document converter — convert DOCX, DOC, RTF to Markdown, text, and PDF
Maintainers
Readme
tinyweb-office-words
A lightweight, open-source Node.js library for converting DOCX, DOC, RTF, TXT, and MD files to Markdown, plain text, and PDF without requiring Microsoft Word.
Features
- DOCX Support: Pure Node.js reader using
adm-zipandfast-xml-parser - DOC Support: Word 97-2003 binary format reader (planned)
- RTF Support: Rich Text Format reader (planned, delegates to DOC)
- Plain Text & Markdown Input: Read
.txtand.mdfiles - Markdown Export: Rich formatting — headings, bold/italic/strikethrough/underline, ordered and unordered lists (including nested), tables, block quotes, code blocks, and hyperlinks
- PDF Export: Generate PDF output via
pdfkit - Plain Text Export: Extract document text content
Installation
npm install tinyweb-office-wordsQuick Start
Convert a document to Markdown
import { Document, SaveFormat } from 'tinyweb-office-words';
const doc = new Document('input.docx'); // or .doc, .rtf, .txt, .md
doc.save('output.md', SaveFormat.MARKDOWN);Export to PDF
import { Document, SaveFormat } from 'tinyweb-office-words';
const doc = new Document('input.docx');
doc.save('output.pdf', SaveFormat.PDF);Extract plain text
import { Document } from 'tinyweb-office-words';
const doc = new Document('input.docx');
const text = doc.getText();Save with options
import { Document } from 'tinyweb-office-words';
import { MarkdownSaveOptions, PdfSaveOptions, TableContentAlignment } from 'tinyweb-office-words/saving';
const doc = new Document('input.docx');
const mdOpts = new MarkdownSaveOptions();
mdOpts.export_underline_formatting = true;
doc.save('output.md', mdOpts);
const pdfOpts = new PdfSaveOptions();
doc.save('output.pdf', pdfOpts);Requirements
- Node.js >= 18.0.0
- Dependencies:
adm-zip,fast-xml-parser,pdfkit,zod
Development
# Install dependencies
npm install
# Build
npm run build
# Run tests
npm testLicense
MIT License - see the LICENSE file for details.
