doclens

v1.0.0

Published

2 months ago

Framework-agnostic universal document viewer with search, highlight, zoom, and deep customization. Supports PDF, XLSX, CSV, PPTX, DOCX, XML, JSON, HTML, Markdown, images, video, audio, and plain text.

Downloads

0High
0Medium
0Low

amit641

document-viewer pdf-viewer xlsx-viewer csv-viewer file-viewer react search highlight zoom pptx docx xml json markdown

npm install doclens

import { DocViewer } from 'doclens';

<DocViewer document={{ uri: '/report.pdf' }} />

That's it. Renders PDFs, spreadsheets, presentations, code, media, and more — with search, zoom, and dark mode built in.

Why doclens?

| | doclens | react-doc-viewer | @cyntler/react-doc-viewer | |---|---|---|---| | Formats | 14+ (PDF, XLSX, CSV, PPTX, DOCX, XML, JSON, HTML, MD, images, video, audio, text) | 7 | 7 | | Search | Unified across all formats with highlighting & keyboard nav | None | None | | OCR | Tesseract.js for scanned PDFs and images | None | None | | PDF engine | PDFium via pdfnova (Chrome-grade rendering) | PDF.js | Google Docs iframe | | Pre-search | Auto-highlight terms on load | None | None | | Zoom | Keyboard, pinch, fit-to-width | Basic | None | | Theming | CSS variables, light/dark presets | Limited | Limited | | Multi-document | Tabbed UI with switcher | None | None | | TypeScript | First-class, fully typed | Partial | Yes | | Tree-shakeable | Dynamic imports per format | No | No | | Bundle impact | ~6 KB core + format-specific chunks | Monolithic | Monolithic |

Install

npm install doclens    # npm
pnpm add doclens       # pnpm
yarn add doclens       # yarn

Optional peer dependencies

Install only the renderers you need — doclens lazy-loads them:

npm install pdfnova          # PDF (PDFium-powered, Chrome-grade)
npm install xlsx             # Excel (XLSX/XLS)
npm install papaparse        # CSV
npm install mammoth          # Word (DOCX)
npm install marked           # Markdown
npm install tesseract.js     # OCR (search text in images / scanned PDFs)

Quick Start

React

import { DocViewer } from 'doclens';

function App() {
  return (
    <DocViewer
      document={{ uri: '/report.pdf', fileName: 'Q4 Report.pdf' }}
      initialSearchTerms={['revenue', 'growth']}
      theme="dark"
      height={700}
      onDocumentLoad={(meta) => console.log('Loaded:', meta)}
    />
  );
}

Vanilla JavaScript

import { DocViewerEngine } from 'doclens/core';

const engine = new DocViewerEngine(document.getElementById('viewer')!, {
  document: { uri: '/file.pdf' },
  theme: 'dark',
});

await engine.load();
engine.search('quarterly');
engine.nextMatch();
engine.setZoom(1.5);

engine.destroy();

Document Sources

// Remote URL (S3, CDN, any HTTP endpoint)
<DocViewer document={{ uri: 'https://bucket.s3.amazonaws.com/report.xlsx' }} />

// Authenticated endpoint
<DocViewer
  document={{
    uri: 'https://api.example.com/documents/123/download',
    requestInit: { headers: { Authorization: 'Bearer eyJ...' } },
  }}
/>

// File from <input> or drag-and-drop
<DocViewer document={{ fileData: file, fileName: file.name, fileType: file.type }} />

// Raw ArrayBuffer
<DocViewer document={{ fileData: arrayBuffer, fileType: 'application/pdf' }} />

// Base64 data URI
<DocViewer document={{ uri: 'data:application/pdf;base64,JVBERi0xLjQ...' }} />

Multi-Document Tabs

<DocViewer
  documents={[
    { uri: '/report.pdf', fileName: 'Report.pdf' },
    { uri: '/data.xlsx', fileName: 'Data.xlsx' },
    { uri: '/notes.md', fileName: 'Notes.md' },
  ]}
  activeDocument={0}
  onDocumentChange={(index, doc) => console.log('Switched to', doc.fileName)}
/>

Pre-Search & Auto-Highlight

Pass initialSearchTerms to highlight terms on load. The viewer scrolls to the first match automatically.

<DocViewer
  document={{ uri: '/contract.pdf' }}
  initialSearchTerms={['liability', 'indemnification', 'termination']}
/>

Each term gets a distinct highlight color (5 built-in, customizable via CSS).

OCR

doclens supports OCR via Tesseract.js for finding text inside images and scanned PDFs.

<DocViewer document={{ uri: '/scanned-invoice.pdf' }} enableOCR />

How it works:

OCR processing starts eagerly when the document loads — by the time you search, results are usually ready
Text-layer matches and OCR matches are merged and sorted by document position (page, then top-to-bottom)
If OCR is still processing when you search, the count shows "1 of 27+" with a spinner, then updates when ready
Spatial deduplication prevents double-highlighting text found by both the PDF engine and OCR

npm install tesseract.js  # required for OCR

If tesseract.js is not installed, OCR is skipped silently — everything else works normally.

Supported Formats

| Format | Extension | Library | Search | |--------|-----------|---------|--------| | PDF | .pdf | pdfnova (PDFium) | Yes (text layer + OCR) | | Excel | .xlsx, .xls | xlsx (SheetJS) | Yes (cell matching) | | CSV | .csv | papaparse | Yes | | PowerPoint | .pptx | Built-in | Yes (slide text) | | Word | .docx | mammoth | Yes | | XML | .xml | Built-in DOMParser | Yes | | JSON | .json | Built-in | Yes (tree view) | | HTML | .html | Shadow DOM | Yes | | Markdown | .md | marked | Yes | | Images | .png, .jpg, .gif, .webp, .svg | Native + OCR | Yes (via OCR) | | Video | .mp4, .webm, .ogg | Native <video> | No | | Audio | .mp3, .wav, .flac | Native <audio> | No | | Plain Text | .txt, .log, .cfg | None | Yes |

Theming

Presets

<DocViewer theme="dark" />
<DocViewer theme="light" />

CSS Custom Properties

.my-viewer {
  --dv-bg: #1a1a2e;
  --dv-text: #eee;
  --dv-primary: #e94560;
  --dv-highlight-color: #ffeb3b;
  --dv-toolbar-bg: #16213e;
}

Theme Object

<DocViewer
  theme={{
    preset: 'dark',
    variables: {
      '--dv-primary': '#e94560',
      '--dv-highlight-color': '#ffd700',
    },
  }}
/>

Available Variables

| Variable | Description | |----------|-------------| | --dv-bg | Background color | | --dv-text | Text color | | --dv-primary | Primary accent color | | --dv-toolbar-bg | Toolbar background | | --dv-highlight-color | Search highlight color | | --dv-highlight-active-color | Active match highlight | | --dv-highlight-term-1 to --dv-highlight-term-5 | Per-term colors | | --dv-border | Border color | | --dv-font-family | Font family | | --dv-font-size | Base font size | | --dv-border-radius | Border radius |

Custom Renderers

import { registerRenderer, BaseRenderer } from 'doclens';

class MyRenderer extends BaseRenderer {
  async render(ctx) {
    const text = new TextDecoder().decode(ctx.data);
    ctx.container.innerHTML = `<div>${text}</div>`;
    this.highlighter.setContainer(ctx.container);
  }
}

registerRenderer('custom-type', MyRenderer);

Keyboard Shortcuts

| Shortcut | Action | |----------|--------| | Enter | Next search match | | Shift + Enter | Previous search match | | Escape | Clear search | | Ctrl/Cmd + + | Zoom in | | Ctrl/Cmd + - | Zoom out | | Ctrl/Cmd + 0 | Reset zoom |

API Reference

`<DocViewer>` Props

| Prop | Type | Default | Description | |------|------|---------|-------------| | document | DocumentSource | — | Single document source | | documents | DocumentSource[] | — | Multiple documents (renders tabs) | | activeDocument | number | 0 | Active document index | | initialSearchTerms | string[] | — | Terms to highlight on load | | theme | 'light' \| 'dark' \| ThemeConfig | 'light' | Theme | | header | boolean \| HeaderConfig | true | Toolbar configuration | | defaultZoom | number | 1 | Initial zoom level | | minZoom / maxZoom | number | 0.25 / 5 | Zoom bounds | | zoomStep | number | 0.25 | Zoom increment | | disableSelection | boolean | false | Disable text selection | | disablePrint | boolean | false | Disable printing | | enableOCR | boolean | false | Enable OCR (requires tesseract.js) | | height | string \| number | '100%' | Viewer height | | width | string \| number | '100%' | Viewer width | | onDocumentLoad | (meta) => void | — | Document loaded callback | | onDocumentChange | (index, doc) => void | — | Tab switch callback | | onError | (error) => void | — | Error callback | | onSearchChange | (query, results) => void | — | Search results callback | | onDrop | (files) => void | — | Drag-and-drop callback | | renderHeaderLeft/Center/Right | () => ReactNode | — | Header slot overrides | | className | string | — | Root class name | | style | CSSProperties | — | Root inline styles |

`DocumentSource`

interface DocumentSource {
  uri?: string;
  fileData?: ArrayBuffer | Blob | Uint8Array | File;
  fileName?: string;
  fileType?: string;
  requestInit?: RequestInit;
}

`DocViewerEngine` (Vanilla JS)

const engine = new DocViewerEngine(container, options);

await engine.load(source?);
engine.search(query);
engine.nextMatch();
engine.prevMatch();
engine.clearSearch();
engine.setZoom(level);
engine.zoomIn();
engine.zoomOut();
engine.goToPage(page);
engine.rotate(degrees);
engine.destroy();

engine.on('load', (meta) => { });
engine.on('search', ({ query, results }) => { });
engine.on('zoom', ({ level }) => { });
engine.on('error', (error) => { });
engine.on('ocrProgress', ({ processing }) => { });

React Hooks

import { useDocViewerContext, useSearch, useZoom } from 'doclens';

function CustomToolbar() {
  const { query, count, activeIndex, search, nextMatch, prevMatch } = useSearch();
  const { level, zoomIn, zoomOut, percentage } = useZoom();

  return (
    <div>
      <input value={query} onChange={(e) => search(e.target.value)} />
      <span>{activeIndex + 1} / {count}</span>
      <button onClick={zoomOut}>-</button>
      <span>{percentage}%</span>
      <button onClick={zoomIn}>+</button>
    </div>
  );
}

Architecture

┌─────────────────────────────────────────────────┐
│           DocViewer (React) / Engine (Vanilla)   │  ← Public API
├─────────────────────────────────────────────────┤
│  SearchEngine  │  ZoomManager  │  SourceResolver │  ← Core services
├───────┬────────┬────────┬──────┬────────────────┤
│  PDF  │ XLSX   │ DOCX   │ JSON │ 10 more...     │  ← Renderers (lazy)
├───────┴────────┴────────┴──────┴────────────────┤
│  pdfnova │ xlsx │ mammoth │ marked │ tesseract   │  ← Optional deps
└─────────────────────────────────────────────────┘

Design principles:

Framework-agnostic core — vanilla TypeScript engine, React wrapper for convenience
Dynamic imports — format-specific deps load only when needed
Plugin architecture — register custom renderers without modifying core
CSS custom properties — full visual customization without JS
SSR-safe — no browser API access during initialization

Running the Demo

git clone https://github.com/amit641/doclens.git
cd doclens
npm install
npm run demo

Open http://localhost:3200 — interactive demo with sample files for every format.

License

MIT