npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

docling-book-translator

v0.2.2

Published

Local-first CLI to convert English PDF books to Portuguese (Brazil) EPUB files using Docling and Hugging Face.

Readme

Docling Book Translator

Local-first CLI to convert PDF books in English to Portuguese (Brazil) EPUB files, using Docling and an open-source Neural Machine Translation model from Hugging Face.

The goal is to provide an end-to-end pipeline that runs on a typical desktop machine (CPU only), without paid APIs, and with a simple interactive workflow:

PDF in the current folder → answer a few questions → get a translated EPUB ready for Kindle.


Features

  • PDF → structured text using Docling (layout-aware parsing).
  • English → Brazilian Portuguese translation via a Marian NMT model.
  • EPUB generation from translated Markdown.
  • Interactive CLI:
    • Asks for the input PDF (or auto-detects a PDF in the current folder).
    • Asks where to save the output (default: same folder as the PDF).
    • Asks for EPUB title and author (with sensible defaults).
    • Shows a progress bar during translation.
  • CPU-only by default; no paid APIs or external cloud services.

Requirements

  • Python 3.10+ (tested with 3.13).
  • A machine with at least:
    • Quad-core CPU
    • 16 GB RAM recommended (32 GB preferred) for large PDFs.
  • Internet only for the first run to download:
    • Docling models,
    • Hugging Face translation model:
      • Helsinki-NLP/opus-mt-tc-big-en-pt

All models are cached locally (Hugging Face cache and Docling artifacts), so subsequent runs can be offline.


Installation

Clone the repository:

git clone https://github.com/EduardoXavier16/docling-book-translator.git
cd docling-book-translator

Create and activate a virtual environment (recommended):

python -m venv .venv
.venv\Scripts\activate  # Windows
# source .venv/bin/activate  # Linux/macOS

Install dependencies:

python -m pip install --upgrade pip
python -m pip install -r requirements.txt

Install via npm / npx

After publishing the package to npm, you (or other users) can run the CLI directly with npx (without cloning):

npx docling-book-translator

Or install it globally:

npm install -g docling-book-translator
docling-book-translator

Both commands invoke the Node wrapper, which:

  • Verifies that a Python 3 executable is available (python or python3, or the value of the DBT_PYTHON environment variable).
  • Delegates the interactive workflow to the underlying cli.py script.

Usage (Python CLI)

Open a terminal in a folder that contains at least one PDF, for example:

cd C:\books\qa-testing

Then run the CLI from the project:

cd C:\projects\docling-translater
python cli.py

The CLI will:

  1. Ask for the input PDF:
    • If you press ENTER and there is a single .pdf in the current folder, it will use that file.
  2. Ask for the output folder:
    • ENTER = same folder as the PDF.
  3. Ask for title and author:
    • ENTER = <PDF name> (PT-BR) and Unknown.
  4. Show a short summary and ask for confirmation.
  5. Run the full pipeline:
    • PDF → Docling → document.md
    • Translation (streaming) → document_translated.md
    • Export → book_translated.epub

The final EPUB will be in:

<output-folder>/<book-id>/book_translated.epub

Where <book-id> defaults to the PDF file name without extension.


CLI design (v2 – streaming markdown)

The entry point is cli.py, which orchestrates the pipeline in three steps:

  1. PDF preparationprepare_pdf.py
    • Uses Docling to convert the PDF into document.md (plus HTML/JSON artifacts).
  2. Translation (streaming)translation.py
    • Reads document.md block by block and writes the translated text to document_translated.md as it goes (no giant in‑memory buffer).
  3. EPUB exportexport_epub.py
    • Converts document_translated.md into book_translated.epub.

Internally, some legacy scripts (like export_translated.py) may still exist for experimentation, but the official CLI v2 flow is:

document.mddocument_translated.mdbook_translated.epub


Planned npm package

The long-term goal is to publish a Node.js wrapper on npm, named:

  • Package: docling-book-translator
  • CLI command: npx docling-book-translator

The npm CLI would:

  • Ask the same questions as cli.py (input PDF, output folder, title, author).
  • Internally call the Python CLI, ensuring that:
    • Python is installed,
    • This project (and its dependencies) are available.

This repository contains the Python core. A separate TypeScript/Node wrapper can import or shell out to cli.py to expose the same UX on npm.


Licensing and third‑party components

  • This project is released under the MIT License (see LICENSE).
  • It uses third‑party components:
    • Docling – MIT License.
    • Hugging Face models (e.g. Helsinki-NLP/opus-mt-tc-big-en-pt) – see each model’s page on Hugging Face for specific license terms.
    • transformers, sentencepiece, ebooklib, markdown, tqdm, etc.

When publishing to npm or deploying in production, make sure that your usage of these components complies with their respective licenses.


Limitations

  • Currently focuses on English → Brazilian Portuguese translation.
  • Images and figures from the original PDF are not preserved in the EPUB. The output is text‑only.
  • Some very complex PDFs (heavy graphics, scanned pages, etc.) may produce warnings from Docling about memory allocation or failing stages. In most cases, text extraction still succeeds, but portions of some pages might be incomplete.

These limitations are acceptable for a first version focused on reading technical/QA books in Portuguese on Kindle. Future versions can extend support for additional languages and image handling.


Contributing

Contributions are welcome. Suggested areas for improvement:

  • Better handling of images and figures in the EPUB output.
  • Additional language pairs, configurable via CLI flags.
  • A robust Node.js wrapper and npm packaging.
  • More detailed progress reporting (per chapter or per page).

Before opening a pull request, please:

  • Run the existing scripts locally on at least one sample book.
  • Keep Python code readable and small, with single‑responsibility modules.
  • Avoid introducing breaking changes to the cli.py UX without discussion.