npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

docx2js

v0.1.0-alpha.5

Published

Docx parser for JavaScript/TypeScript

Readme

docx2js

Simple docx parser and transformer for JavaScript/TypeScript

** THIS IS EARLY DAYS WORK IN PROGRESS, NOT READY FOR WIDE-SPREAD USE **

Does the world need another docx parser?

Good question - maybe? What I found was that I needed to be able extract more information in a structured way for further processing than the up-to-date (or maintained, as it were) packages I could find. So here we are.

At the heart, the docx - or rather, the OpenXML format - is a zip containing a bunch of XML files, so this package is essentially a glorified XML traverser. There are quite a few intricacies to the format though, and as such this is not a feature complete parser (nor is it intended to be).

Features

  • Converts DOCX to a JSON structure with the following information:
    • Paragraphs
    • Tables
    • Suggestions (inserts and deletions)
    • Comments
    • Basic styling for runs
  • Uses fast-xml-parser for parsing the XML, so it's fairly fast

Installation

yarn add docx2js

Usage, CLI

You can convert a DOCX file to JSON using the CLI:

docx2js path/to/docx/file [path/to/output/file]

If you don't specify an output file, stdout will be used instead.

Usage, API

For a simple demo on how to use the API, take a look at markdown.ts which contains a very silly and simple markdown transformer.

Loading and parsing from a filename

import { Parse } from 'docx2js';

const main = async () => {
  const doc = await Parse('path/to/docx/file');
  console.log(doc);
};

Loading and parsing from a buffer

import { readFile } from 'fs/promises';
import { ParseBuffer } from 'docx2js';

const main = async () => {
  const buffer = await readFile('path/to/docx/file');
  const doc = await ParseBuffer(buffer);
  console.log(doc);
};

The parse functions return a document consisting of some meta information, and the actual contents. The content is in sequence, so iterating through it makes it possible to reproduce the text of the original document.

There are two kinds of content - Paragraph and Table.

Table content is an object containing the following properties:

  • type - the type of content, in this case table
  • rows - an array of rows, each row being an array of cells, and each cell being an array of paragraph objects
  • caption - the caption of the table - fetched from the first preceeding paragraph contents

Paragraph content is an object containing the following properties:

  • type - the type of content, one of:
    • paragraph - a regular paragraph
    • paragraph-deletion - a paragraph that's tracked as "deleted"
    • paragraph-insertion - a paragraph that's tracked as "inserted"
    • paragraph-comment - a paragraph that has a comment attached to all of it
  • contents - an array of runs (see below)
  • properties - paragraph properties
    • style - the style of the paragraph
    • alignment - the alignment of the paragraph
    • indent - the indentation of the paragraph
    • spacing - the spacing of the paragraph
    • shading - the shading of the paragraph
    • outlineLevel - the outline level of the paragraph
    • bidi - the bidi of the paragraph

License

MIT. See LICENSE for the full license text.