npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@paulmeller/docflow

v0.0.28

Published

A developer-friendly transformation engine for programmatic document manipulation

Readme

DocFlow

A developer-friendly transformation engine built on SuperDoc

Transform documents programmatically with a clean, chainable API. Built on top of SuperDoc Editor, DocFlow makes it easy to load, query, transform, and export documents in multiple formats.

Features

  • 🔄 Format Conversion: DOCX ↔ Markdown ↔ HTML ↔ JSON ↔ Plain Text
  • 📊 Table Support: DOCX tables export cleanly to Markdown with formatting preserved
  • 🔍 Powerful Queries: CSS-like selectors and predicate functions
  • Batch Processing: Process multiple documents concurrently
  • 🎯 Type-Safe: Full TypeScript support
  • 🔗 Chainable API: Fluent, readable transformations
  • 🚫 Error Handling: Multiple error handling modes
  • 📦 Headless: Runs in Node.js without a browser

Installation

npm install @paulmeller/docflow

Quick Start

import DocFlow from '@paulmeller/docflow';

// Simple transformation
await new DocFlow()
  .load('document.docx')
  .transform('heading[level=2]', node => ({
    ...node,
    text: node.text.toUpperCase()
  }))
  .save('output.docx');

⚠️ Important: load() vs loadContent()

DO NOT confuse these two methods:

| Method | Purpose | First Parameter | When to Use | |--------|---------|----------------|-------------| | load() | Load from file path or buffer | File path string or Buffer | Reading files from disk | | loadContent() | Load from content string | Content string (markdown/HTML/JSON) | When you have content in memory |

Common Mistake ❌

// ❌ WRONG - This treats JSON string as a file path!
const jsonString = JSON.stringify(docJson);
await doc.load(jsonString, { format: 'json' });  
// Error: ENAMETOOLONG or "file not found"

// ❌ WRONG - This treats markdown as a file path!
const markdown = '# Hello\n\nWorld';
await doc.load(markdown, { format: 'markdown' });

Correct Usage ✅

// ✅ CORRECT - Use load() for file paths
await doc.load('document.docx');
await doc.load('data.json', { format: 'json' });

// ✅ CORRECT - Use loadContent() for content strings
const jsonString = JSON.stringify(docJson);
await doc.loadContent(jsonString);

const markdown = '# Hello\n\nWorld';
await doc.loadContent(markdown);

Core Concepts

1. Load Documents

Load from file, buffer, or content:

const doc = new DocFlow();

// From file path
await doc.load('document.docx');

// From buffer (binary data)
const buffer = fs.readFileSync('document.docx');
await doc.load(buffer);

// From JSON buffer with explicit format
const jsonBuffer = Buffer.from(jsonString, 'utf-8');
await doc.load(jsonBuffer, { format: 'json' });

// From content string (auto-detects markdown, HTML, JSON, or text)
await doc.loadContent('<h1>Title</h1><p>Content</p>');
await doc.loadContent('# Title\n\nContent');
await doc.loadContent(JSON.stringify(docJson));

2. Query Structure

Find nodes using selectors:

// CSS-like selectors
const headings = doc.query('heading[level=2]');
const paragraphs = doc.query('paragraph');

// Predicate functions
const longParagraphs = doc.query(node => 
  node.type === 'paragraph' && 
  node.content?.length > 100
);

// Work with results
console.log(headings.count);        // Number of matches
console.log(headings.text());       // Concatenated text
console.log(headings.first());      // First match
console.log(headings.last());       // Last match

3. Transform Content

Modify document structure:

import { Transforms } from '@paulmeller/docflow';

// Transform matching nodes
await doc.transform('heading[level=2]', node => ({
  ...node,
  text: Transforms.toTitleCase(node.text)
}));

// Transform with built-in helpers
await doc.transform('text', Transforms.replace(/foo/g, 'bar'));
await doc.transform('text', Transforms.addPrefix('> '));

// Transform entire document
await doc.transformDocument(json => {
  // Modify entire document structure
  return modifiedJSON;
});

4. Export & Save

Export to various formats:

// Export to buffer/string
const docx = await doc.export('docx');
const html = await doc.export('html');
const markdown = await doc.export('markdown');
const json = await doc.export('json');
const text = await doc.export('text');

// Save directly to file
await doc.save('output.docx');
await doc.save('output.md', 'markdown');
await doc.save('output.html', 'html');
await doc.save('output.json', 'json');
await doc.save('output.txt', 'text');

Examples

Example 1: Format Conversion

import DocFlow from '@paulmeller/docflow';

// DOCX to Markdown
await new DocFlow()
  .load('document.docx')
  .save('document.md', 'markdown');

// Markdown to DOCX
await new DocFlow()
  .load('document.md')
  .save('document.docx');

// DOCX with tables to Markdown (tables preserved with formatting)
await new DocFlow()
  .load('report-with-tables.docx')
  .save('report.md', 'markdown');

// Extract plain text for analysis
const plainText = await new DocFlow()
  .load('document.docx')
  .export('text');
console.log(plainText); // All text without formatting

Example 2: JSON Import/Export

import DocFlow from '@paulmeller/docflow';

// Export DOCX to JSON (for storage or transmission)
const doc1 = new DocFlow();
await doc1.load('report.docx');
const jsonData = await doc1.export('json');
await fs.writeFile('report.json', JSON.stringify(jsonData, null, 2));

// Import JSON and convert to Markdown
const doc2 = new DocFlow();
await doc2.load('report.json');  // Auto-detects .json extension
const markdown = await doc2.export('markdown');

// Load JSON from buffer (e.g., from API or virtual filesystem)
const jsonString = JSON.stringify(jsonData);
const buffer = Buffer.from(jsonString, 'utf-8');
const doc3 = new DocFlow();
await doc3.load(buffer, { format: 'json' });  // Explicit format for buffers
const html = await doc3.export('html');

// Load JSON from content string (when you have it in memory)
const doc4 = new DocFlow();
await doc4.loadContent(jsonString);  // Auto-detects JSON
const docx = await doc4.export('docx');

Example 3: Heading Standardization

import { DocFlow, Transforms } from '@paulmeller/docflow';

const doc = new DocFlow();
await doc
  .load('document.docx')
  .transform('heading[level=1]', node => ({
    ...node,
    text: Transforms.toTitleCase(node.text)
  }))
  .transform('heading[level=2]', node => ({
    ...node,
    text: Transforms.toSentenceCase(node.text)
  }))
  .save('standardized.docx');

Example 3: Template Filling

const doc = new DocFlow();
await doc
  .load('template.docx')
  .transform('text', Transforms.replace(/\{name\}/g, 'John Doe'))
  .transform('text', Transforms.replace(/\{date\}/g, new Date().toLocaleDateString()))
  .transform('text', Transforms.replace(/\{company\}/g, 'Acme Corp'))
  .save('filled-document.docx');

Example 4: Content Analysis

const doc = new DocFlow();
await doc.load('document.docx');

// Find all headings
const headings = doc.query('heading');
console.log(`Document has ${headings.count} headings`);

// Find specific content
const importantSections = doc.query(node =>
  node.type === 'heading' &&
  node.text?.toLowerCase().includes('important')
);

console.log('Important sections:');
importantSections.map(h => console.log(`- ${h.text}`));

Example 5: Batch Processing

import { BatchProcessor } from '@paulmeller/docflow';

const batch = new BatchProcessor({
  concurrency: 5
});

const results = await batch.process(
  ['doc1.docx', 'doc2.docx', 'doc3.docx'],
  async (doc, file) => {
    await doc
      .transform('heading[level=1]', node => ({
        ...node,
        attrs: { ...node.attrs, level: 2 }
      }))
      .save(file.replace('.docx', '-updated.docx'));

    return { processed: true };
  }
);

console.log(`✓ Processed ${results.successful} files`);
console.log(`✗ Failed ${results.failed} files`);

Example 6: Complex Pipeline

const doc = new DocFlow();

await doc
  .load('input.docx')
  // Step 1: Standardize heading levels
  .transform('heading[level=1]', node => ({
    ...node,
    text: node.text.toUpperCase()
  }))
  // Step 2: Remove empty paragraphs
  .transformDocument(json => {
    json.content = json.content.filter(node =>
      node.type !== 'paragraph' || node.content?.length > 0
    );
    return json;
  })
  // Step 3: Add metadata
  .transformDocument(json => ({
    ...json,
    attrs: {
      ...json.attrs,
      processed: true,
      timestamp: Date.now()
    }
  }))
  .save('processed.docx');

Example 7: Document Structure Report

const doc = new DocFlow();
await doc.load('document.docx');

const json = doc.toJSON();

// Analyze structure
const stats = {
  headings: doc.query('heading').count,
  paragraphs: doc.query('paragraph').count,
  h1: doc.query('heading[level=1]').count,
  h2: doc.query('heading[level=2]').count,
  h3: doc.query('heading[level=3]').count
};

console.log('Document Structure:');
console.log(JSON.stringify(stats, null, 2));

// Table of contents
const toc = doc.query('heading')
  .map(h => `${'  '.repeat(h.attrs.level - 1)}- ${h.text}`)
  .join('\n');

console.log('\nTable of Contents:');
console.log(toc);

Example 8: Error Handling

// Mode 1: Throw on error (default)
try {
  await new DocFlow()
    .load('missing.docx')
    .save('output.docx');
} catch (error) {
  console.error('Failed:', error.message);
}

// Mode 2: Collect errors
const doc = new DocFlow({ errorMode: 'collect' });
await doc
  .load('input.docx')
  .transform('invalid-selector', node => node)
  .save('output.docx');

const errors = doc.getErrors();
if (errors.length > 0) {
  console.error('Errors occurred:', errors);
}

// Mode 3: Silent (ignore errors)
const doc2 = new DocFlow({ errorMode: 'silent' });
await doc2.load('maybe-exists.docx').save('output.docx');

Example 9: Custom Transformations

// Define custom transformation
function addTimestamp(node) {
  if (node.type === 'paragraph') {
    return {
      ...node,
      attrs: {
        ...node.attrs,
        timestamp: new Date().toISOString()
      }
    };
  }
  return node;
}

// Apply it
await doc
  .load('document.docx')
  .transform('paragraph', addTimestamp)
  .save('timestamped.docx');

Example 10: Conditional Processing

const doc = new DocFlow();
await doc.load('document.docx');

// Only process if conditions are met
const validation = doc.validate();

if (validation.valid) {
  const headings = doc.query('heading');

  if (headings.count > 0) {
    await doc
      .transform('heading', node => ({
        ...node,
        text: `📌 ${node.text}`
      }))
      .save('processed.docx');
  }
}

Format Auto-Detection

DocFlow automatically detects formats to make your code simpler and more intuitive.

load() behavior:

  • Auto-detects format from file extension: .docx, .md, .html, .json, .txt
  • Use format option to override: await doc.load('file.txt', { format: 'markdown' })
  • For Buffers, format defaults to 'docx' unless you specify otherwise
// Auto-detected from extension
await doc.load('document.docx');    // → format: 'docx'
await doc.load('data.json');        // → format: 'json'
await doc.load('readme.md');        // → format: 'markdown'

// Explicit format for buffers
const buffer = Buffer.from(jsonString, 'utf-8');
await doc.load(buffer, { format: 'json' });

loadContent() behavior:

  • Auto-detects JSON: Looks for {"type": "doc"...} pattern
  • Auto-detects HTML: Looks for <html>, <h1>, <p>, etc.
  • Auto-detects Markdown: Everything else treated as markdown
  • No format parameter needed - detection is automatic
// All auto-detected
await doc.loadContent('# Heading\n\nText');           // → markdown
await doc.loadContent('<h1>Title</h1><p>Text</p>');   // → HTML
await doc.loadContent('{"type": "doc", ...}');        // → JSON

API Reference

DocFlow

Constructor

new DocFlow(options?: {
  headless?: boolean;        // Default: true
  validateSchema?: boolean;  // Default: true
  errorMode?: 'throw' | 'collect' | 'silent';  // Default: 'throw'
})

Methods

load(source, options?)

Load a document from file or buffer.

load(source: string | Buffer, options?: {
  format?: 'docx' | 'html' | 'markdown' | 'json' | 'text'
}): Promise<DocFlow>
loadContent(content)

Load content directly. Automatically detects if content is markdown, HTML, or plain text. Auto-initializes a blank document if none exists.

loadContent(content: string): Promise<DocFlow>
toJSON()

Get document as ProseMirror JSON.

toJSON(): ProseMirrorJSON
query(selector)

Query document structure.

query(selector: string | Function): QueryResult

Selectors:

  • "heading" - All heading nodes
  • "heading[level=2]" - H2 headings
  • "paragraph" - All paragraphs
  • node => condition - Custom predicate
transform(selector, transformer)

Transform matching nodes.

transform(
  selector: string | Function,
  transformer: (node) => node | Promise<node> | null
): Promise<DocFlow>
transformDocument(transformer)

Transform entire document.

transformDocument(
  transformer: (json) => json | Promise<json>
): Promise<DocFlow>
export(format?, options?)

Export to format.

export(
  format?: 'docx' | 'html' | 'markdown' | 'json' | 'text',
  options?: ExportOptions
): Promise<Buffer | string | Object>
save(filepath, format?)

Save to file.

save(filepath: string, format?: string): Promise<DocFlow>
validate()

Validate document.

validate(): {
  valid: boolean;
  errors: string[];
  warnings: string[];
  document: ProseMirrorJSON;
}
getHistory()

Get operation history.

getHistory(): Operation[]
getErrors()

Get collected errors (when errorMode: 'collect').

getErrors(): ErrorRecord[]

QueryResult

Properties

  • count: Number of matching nodes
  • nodes: Array of found nodes
  • selector: Selector used

Methods

first()

Get first match.

first(): Node | null
last()

Get last match.

last(): Node | null
map(fn)

Map over results.

map<T>(fn: (node, index) => T): T[]
filter(fn)

Filter results.

filter(fn: (node, index) => boolean): QueryResult
text()

Get concatenated text content.

text(): string
transform(transformer)

Transform all found nodes.

transform(transformer: Function): Promise<DocFlow>

BatchProcessor

Process multiple documents concurrently.

const batch = new BatchProcessor({
  concurrency: 5,
  errorMode: 'collect'
});

const results = await batch.process(
  ['file1.docx', 'file2.docx'],
  async (doc, file) => {
    // Process each document
    await doc.transform(...).save(...);
    return { success: true };
  }
);

Transforms

Built-in transformation helpers.

import { Transforms } from '@paulmeller/docflow';

Transforms.toTitleCase(text)        // "hello world" → "Hello World"
Transforms.toSentenceCase(text)     // "HELLO WORLD" → "Hello world"
Transforms.replace(pattern, repl)   // Replace matching text
Transforms.addPrefix(prefix)        // Add prefix to text
Transforms.remove(condition)        // Remove matching nodes

ProseMirror JSON Structure

Documents are represented as ProseMirror JSON:

{
  "type": "doc",
  "content": [
    {
      "type": "heading",
      "attrs": { "level": 1 },
      "content": [
        { "type": "text", "text": "Title" }
      ]
    },
    {
      "type": "paragraph",
      "content": [
        { "type": "text", "text": "Content" }
      ]
    }
  ]
}

Common Node Types

  • doc - Root document
  • heading - Heading (attrs: level)
  • paragraph - Paragraph
  • text - Text content
  • blockquote - Block quote
  • codeBlock - Code block
  • bulletList - Bullet list
  • orderedList - Numbered list
  • listItem - List item

Best Practices

1. Chain Operations

// ✓ Good - Chainable, readable
await doc
  .load('input.docx')
  .transform('heading', ...)
  .transform('paragraph', ...)
  .save('output.docx');

// ✗ Avoid - Verbose
await doc.load('input.docx');
await doc.transform('heading', ...);
await doc.transform('paragraph', ...);
await doc.save('output.docx');

2. Use Specific Selectors

// ✓ Good - Specific
doc.query('heading[level=2]')

// ✗ Avoid - Too broad
doc.query('heading').filter(h => h.attrs.level === 2)

3. Handle Errors

// ✓ Good - Handle errors
try {
  await doc.load('file.docx');
} catch (error) {
  console.error('Failed to load:', error);
}

// Or use collect mode
const doc = new DocFlow({ errorMode: 'collect' });
await doc.load('file.docx');
if (doc.getErrors().length > 0) {
  // Handle errors
}

4. Validate Complex Transformations

// ✓ Good - Validate after transformation
await doc.transform(...);
const validation = doc.validate();
if (!validation.valid) {
  console.error('Validation failed:', validation.errors);
}

5. Use Batch Processing for Multiple Files

// ✓ Good - Concurrent processing
const batch = new BatchProcessor({ concurrency: 5 });
await batch.process(files, pipeline);

// ✗ Avoid - Sequential processing
for (const file of files) {
  await new DocFlow().load(file).transform(...).save(...);
}

TypeScript Usage

Full TypeScript support included:

import DocFlow, { QueryResult, Transforms } from '@paulmeller/docflow';

const doc = new DocFlow({
  errorMode: 'collect'
});

await doc.load('document.docx');

const headings: QueryResult = doc.query('heading');
const count: number = headings.count;

Known Limitations

Due to limitations in the underlying SuperDoc library's DOCX conversion engine:

DOCX Round-Trip Limitations

  • Lists: DOCX export/import loses list items beyond the first item (confirmed SuperDoc limitation even with proper command API)
  • Tables: ✅ Work perfectly for DOCX round-trips
  • Recommendation: For list transformations, use Markdown↔HTML conversions instead of DOCX round-trips

Working Perfectly

  • Markdown → HTML → Markdown: Full fidelity for lists, tables, links, formatting
  • HTML → Markdown: Complete preservation of all content
  • DOCX → Markdown/HTML: One-way conversion works well for extracting content
  • JSON export: Perfect for analyzing document structure

Best Practices

// ✅ GOOD: Use HTML/Markdown for list transformations
await doc.createBlank();
await doc.loadContent('- Item 1\n- Item 2\n- Item 3');
const html = await doc.export('html');  // Preserves all items
const md = await doc.export('markdown');  // Preserves all items

// ⚠️ LIMITED: DOCX round-trips may lose list structure
await doc.load('document.docx');
const docx = await doc.export('docx');
await doc.load(docx);  // May lose list items 2+

For applications requiring full DOCX round-trip fidelity with complex lists and tables, consider using Microsoft Word's native APIs or alternative DOCX libraries.

Upstream Issue Tracker: These limitations originate from the SuperDoc library. You can track progress at SuperDoc GitHub Issues.

Performance Tips

  1. Use batch processing for multiple files
  2. Reuse DocFlow instances when possible
  3. Use specific selectors to minimize traversal
  4. Validate only when necessary (disable with validateSchema: false)
  5. Handle large documents with streaming (future feature)

Troubleshooting

Error: ENAMETOOLONG: name too long

Error message:

Error: ENAMETOOLONG: name too long, open '{"type": "doc", "content": [...]}'

Cause: You're passing content to load() instead of a file path.

Fix: Use loadContent() for content strings:

// ❌ WRONG - Treats JSON string as file path
const json = await doc.export('json');
await doc.load(JSON.stringify(json), { format: 'json' });

// ✅ CORRECT - Use loadContent() for content
const json = await doc.export('json');
await doc.loadContent(JSON.stringify(json));

// OR use load() with a buffer
const buffer = Buffer.from(JSON.stringify(json), 'utf-8');
await doc.load(buffer, { format: 'json' });

Document fails to load

// Check file exists
if (fs.existsSync('document.docx')) {
  await doc.load('document.docx');
}

// Check format
const format = path.extname('document.docx');
await doc.load('document.docx', { format: 'docx' });

Transformation not applying

// Verify selector matches
const matches = doc.query('heading[level=2]');
console.log(`Found ${matches.count} matches`);

// Check transformation logic
await doc.transform('heading', node => {
  console.log('Transforming:', node);
  return { ...node, text: node.text.toUpperCase() };
});

Export format issues

// Explicitly specify format
await doc.save('output.docx', 'docx');

// Check supported formats
const formats = ['docx', 'html', 'markdown', 'json', 'text'];

License

MIT

Contributing

Contributions welcome! Please read CONTRIBUTING.md for guidelines.

Credits

Built on SuperDoc by Harbour Enterprises.

Support