npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

pdf-engine-tools

v1.1.0

Published

Processamento otimizado de PDFs com workers paralelos, extração de texto e split.

Readme

pdf-engine-tools

npm version License: MIT

Processamento otimizado de PDFs com workers paralelos, extração de texto, detecção de assinaturas e split inteligente.

Características

  • Workers paralelos com ajuste automático de concorrência baseado em CPU
  • Clean Architecture — core puro sem dependências externas
  • Extração de texto otimizada via pdf2json
  • Split de PDFs grandes em chunks via pdf-lib
  • Detecção de assinaturas digitais
  • Facade PdfEngine para API simplificada
  • TypeScript com tipos completos

Instalação

npm install pdf-engine-tools

Uso rápido

import { NodePdfEngine } from 'pdf-engine-tools';
import { readFileSync } from 'fs';

const engine = new NodePdfEngine();

const buffer = readFileSync('documento.pdf');
const result = await engine.process(buffer, {
  pageLimit: 12,
  maxChunkSize: 3000,
  timeout: 120000,
});

console.log(result.text);
console.log(result.pageCount);
console.log(result.isSigned);

await engine.shutdown();

Fluxos de Funcionamento

Processamento Principal (Sequência)

sequenceDiagram
    participant Cliente
    participant NodePdfEngine
    participant WorkerPool
    participant WorkerThread
    participant PdfPipeline

    Cliente->>NodePdfEngine: process(buffer, config)
    NodePdfEngine->>WorkerPool: run('full-pipeline.worker', data)
    WorkerPool->>WorkerThread: postMessage(data)
    activate WorkerThread
    WorkerThread->>PdfPipeline: execute(buffer)
    activate PdfPipeline
    PdfPipeline-->>WorkerThread: PipelineResult (text, chunks, etc)
    deactivate PdfPipeline
    WorkerThread-->>WorkerPool: postMessage(result)
    deactivate WorkerThread
    WorkerPool-->>NodePdfEngine: resolve(result)
    NodePdfEngine-->>Cliente: ProcessResult

Pipeline de Extração (Fluxo Interno)

flowchart TD
    A[Buffer PDF] --> B(ParsePdfTask)
    B --> C{Páginas > pageLimit?}
    C -- Sim --> D(Truncar PDF)
    C -- Não --> E(Manter Intacto)
    D --> F(ExtractTextTask)
    E --> F
    F --> G(ChunkTextTask)
    G --> H(SplitPdfTask)
    H --> I[Resultado Final]
    
    style A fill:#f9f,stroke:#333,stroke-width:2px
    style I fill:#bbf,stroke:#333,stroke-width:2px

API

NodePdfEngine

Facade principal — instancia adapters e worker pool internamente.

const engine = new NodePdfEngine(logger?);

// Processar um PDF
const result = await engine.process(buffer, config);

// Combinar múltiplos resultados
const combined = await engine.processMultiple(results);

// Dividir PDF em partes
const split = await engine.split(buffer, 'upload-id', { chunkSize: 10 });

// Contar páginas (via worker)
const pages = await engine.getPageCount(buffer);

// Estatísticas do worker pool
const stats = engine.getStats();

// Shutdown
await engine.shutdown();

Workers diretos

import { NodeWorkerPool } from 'pdf-engine-tools';

const pool = new NodeWorkerPool();
const result = await pool.run('parse-pdf.worker.js', { buffer }, 60000);
await pool.shutdown();

Split de PDF

import { PdfLibSplitter } from 'pdf-engine-tools';

const splitter = new PdfLibSplitter();
const { chunks, totalParts } = await splitter.split(buffer, { chunkSize: 10 });

Extração de texto

import { Pdf2JsonExtractor } from 'pdf-engine-tools';

const extractor = new Pdf2JsonExtractor();
const result = await extractor.extract(buffer, { pageLimit: 20 });
console.log(result.text, result.isSigned, result.signatureDates);

Configuração

Variáveis de ambiente

| Variável | Padrão | Descrição | |---|---|---| | PDF_CPU_USAGE_LIMIT | 80 | Limite de CPU (%) | | PDF_MAX_WORKERS | CPUs | Máximo de workers | | PDF_MAX_CONCURRENCY | min(CPUs-1, 4) | Concorrência inicial | | PDF_QUEUE_MAX | 100 | Tamanho da fila |

PdfProcessingConfig

{
  pageLimit?: number;        // Limite de páginas (default: 12)
  enablePageLimit?: boolean; // Ativar limite (default: true)
  maxChunkSize?: number;     // Tamanho do chunk de texto (default: 3000)
  timeout?: number;          // Timeout em ms (default: 120000)
  debug?: boolean;           // Logs de debug (default: false)
}

Arquitetura (Clean Architecture)

graph TD
    subgraph Node[Node.js Environment]
        Facade[NodePdfEngine]
        Pool[NodeWorkerPool]
        Workers[Worker Threads]
    end

    subgraph Adapters[Adapters / Infra]
        P2J[Pdf2JsonExtractor]
        PLib[PdfLibSplitter]
    end

    subgraph Core[Core / Regras Puras]
        Pipeline[PdfPipeline]
        Tasks[Tasks: Parse, Extract, Split]
        Contracts[Interfaces]
    end

    Facade -->|usa| Pool
    Pool -->|spawna| Workers
    Workers -->|injeta adapters| Adapters
    Workers -->|executa| Pipeline
    Pipeline -->|orquestra| Tasks
    Tasks -->|dependem de| Contracts
    Adapters -.->|implementam| Contracts

    classDef core fill:#d4edda,stroke:#28a745,color:#333,stroke-width:2px;
    classDef adapter fill:#fff3cd,stroke:#ffc107,color:#333,stroke-width:2px;
    classDef node fill:#cce5ff,stroke:#007bff,color:#333,stroke-width:2px;

    class Pipeline,Tasks,Contracts core;
    class P2J,PLib adapter;
    class Facade,Pool,Workers node;

Estrutura de diretórios:

src/
├── core/           # Regras puras — sem deps externas
│   ├── contracts/  # Interfaces: PdfParser, PdfSplitter, PdfTextExtractor, PdfChunker
│   ├── errors/     # PdfEngineError, PdfParseError, PdfWorkerError
│   ├── pipeline/   # PdfPipeline, PipelineExecutor
│   ├── tasks/      # ParsePdfTask, ExtractTextTask, SplitPdfTask, ChunkTextTask
│   └── types.ts
├── adapters/       # Implementações concretas
│   ├── pdf-lib/    # PdfLibSplitter (usa pdf-lib)
│   └── pdf2json/   # Pdf2JsonExtractor (usa pdf2json)
├── node/           # Node.js specific
│   ├── NodePdfEngine, NodeWorkerPool, NodeFsAdapter
│   └── buffer-utils
├── workers/        # Worker threads
│   ├── parse-pdf.worker.ts
│   ├── extract-text.worker.ts
│   └── full-pipeline.worker.ts
├── pdf-engine.ts   # Interface PdfEngine
└── index.ts

Regra: core/ não conhece pdf-lib, pdf2json, worker_threads nem fs.

Tratamento de erros

import { PdfEngineError, PdfParseError, PdfWorkerError } from 'pdf-engine-tools';

try {
  await engine.process(buffer);
} catch (error) {
  if (error instanceof PdfParseError) {
    console.error(`Parse error [${error.code}]:`, error.message);
  } else if (error instanceof PdfWorkerError) {
    console.error('Worker error:', error.message);
  }
}

Performance

| PDF | Páginas | Sequencial | Paralelo | Melhoria | |---|---|---|---|---| | 5MB | 50 | 2.5s | 0.8s | 3.1x | | 20MB | 200 | 12.3s | 3.2s | 3.8x | | 100MB | 1000 | 45.7s | 8.9s | 5.1x |

Licença

MIT — LICENSE

Autor

Amilton Brune@amiltonbrune