npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@heripo/model

v0.1.2

Published

Document models and type definitions for heripo engine

Downloads

408

Readme

@heripo/model

Document models and type definitions

npm version Node.js License

English | 한국어

Note: Please check the root README first for project overview, installation instructions, and roadmap.

@heripo/model provides data models and TypeScript type definitions used in heripo engine.

Table of Contents

Overview

heripo engine's data processing pipeline:

DoclingDocument (Docling SDK raw output)
    ↓
ProcessedDocument (LLM-optimized intermediate model)
    ↓
(Various models to be added per roadmap)

@heripo/model defines data models currently used in the PDF parsing and document structure extraction stages. Various domain-specific models for archaeological data analysis, standardization, semantic modeling, etc. will be added in the future.

Installation

# Install with npm
npm install @heripo/model

# Install with pnpm
pnpm add @heripo/model

# Install with yarn
yarn add @heripo/model

Data Models

DoclingDocument

Raw output format from Docling SDK.

import type { DoclingDocument } from '@heripo/model';

Key Fields:

  • type: Document type (e.g., "pdf")
  • item_index: Item index
  • json_content: Document content (JSON object)

ProcessedDocument

Intermediate data model optimized for LLM analysis.

import type { ProcessedDocument } from '@heripo/model';

interface ProcessedDocument {
  reportId: string; // Report ID
  pageRangeMap: PageRange[]; // PDF page → document page mapping
  chapters: Chapter[]; // Hierarchical chapter structure
  images: ProcessedImage[]; // Extracted image metadata
  tables: ProcessedTable[]; // Extracted table data
}

Chapter

Hierarchical section structure of the document.

import type { Chapter } from '@heripo/model';

interface Chapter {
  id: string; // Chapter ID
  title: string; // Chapter title
  level: number; // Hierarchy level (1, 2, 3, ...)
  pageNo?: number; // Start page number
  textBlocks: TextBlock[]; // Text blocks
  imageIds: string[]; // Image ID references
  tableIds: string[]; // Table ID references
  children: Chapter[]; // Sub-chapters
}

TextBlock

Atomic text unit.

import type { TextBlock } from '@heripo/model';

interface TextBlock {
  text: string; // Text content
  pageNo?: number; // Page number
}

ProcessedImage

Image metadata and reference information.

import type { ProcessedImage } from '@heripo/model';

interface ProcessedImage {
  id: string; // Image ID
  caption?: Caption; // Caption (optional)
  pdfPageNo?: number; // PDF page number
  filePath: string; // Image file path
}

ProcessedTable

Table structure and data.

import type { ProcessedTable } from '@heripo/model';

interface ProcessedTable {
  id: string; // Table ID
  caption?: Caption; // Caption (optional)
  pdfPageNo?: number; // PDF page number
  data: ProcessedTableCell[][]; // 2D grid data
  numRows: number; // Row count
  numCols: number; // Column count
}

ProcessedTableCell

Table cell metadata.

import type { ProcessedTableCell } from '@heripo/model';

interface ProcessedTableCell {
  text: string; // Cell text
  rowspan: number; // Row span
  colspan: number; // Column span
  isHeader: boolean; // Is header cell
}

Caption

Image and table captions.

import type { Caption } from '@heripo/model';

interface Caption {
  num?: number; // Caption number (e.g., 1 in "Figure 1")
  fullText: string; // Full caption text
}

PageRange

PDF page to document page mapping.

import type { PageRange } from '@heripo/model';

interface PageRange {
  pdfPageNo: number; // PDF page number
  pageNo: number; // Document logical page number
}

Usage

Reading ProcessedDocument

import type { Chapter, ProcessedDocument } from '@heripo/model';

function analyzeDocument(doc: ProcessedDocument) {
  console.log('Report ID:', doc.reportId);

  // Iterate chapters
  doc.chapters.forEach((chapter) => {
    console.log(`Chapter: ${chapter.title} (level ${chapter.level})`);
    console.log(`  Text blocks: ${chapter.textBlocks.length}`);
    console.log(`  Images: ${chapter.imageIds.length}`);
    console.log(`  Tables: ${chapter.tableIds.length}`);
    console.log(`  Sub-chapters: ${chapter.children.length}`);
  });

  // Check images
  doc.images.forEach((image) => {
    console.log(`Image ${image.id}:`);
    if (image.caption) {
      console.log(`  Caption: ${image.caption.fullText}`);
    }
    console.log(`  Path: ${image.filePath}`);
  });

  // Check tables
  doc.tables.forEach((table) => {
    console.log(`Table ${table.id}:`);
    console.log(`  Size: ${table.numRows} x ${table.numCols}`);
    if (table.caption) {
      console.log(`  Caption: ${table.caption.fullText}`);
    }
  });
}

Recursive Chapter Traversal

import type { Chapter } from '@heripo/model';

function traverseChapters(chapter: Chapter, depth: number = 0) {
  const indent = '  '.repeat(depth);
  console.log(`${indent}- ${chapter.title}`);

  // Recursively traverse sub-chapters
  chapter.children.forEach((child) => {
    traverseChapters(child, depth + 1);
  });
}

// Usage
doc.chapters.forEach((chapter) => traverseChapters(chapter));

Type Guards

import type { ProcessedImage, ProcessedTable } from '@heripo/model';

function hasCaption(
  resource: ProcessedImage | ProcessedTable,
): resource is ProcessedImage | ProcessedTable {
  return resource.caption !== undefined;
}

// Usage
const resourcesWithCaptions = [...doc.images, ...doc.tables].filter(hasCaption);

Related Packages

License

This package is distributed under the Apache License 2.0.

Contributing

Contributions are always welcome! Please see the Contributing Guide.

Project-Wide Information

For project-wide information not covered in this package, see the root README:

  • Citation and Attribution: Academic citation (BibTeX) and attribution methods
  • Contributing Guidelines: Development guidelines, commit rules, PR procedures
  • Community: Issue tracker, discussions, security policy
  • Roadmap: Project development plans

heripo lab | GitHub | heripo engine