npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

structurecc

v4.0.0

Published

Claude Code plugin for extracting structured data from documents using native vision and parallel Task agents

Readme

structurecc

Document Structure Extraction for Claude Code

Extract structured data from PDFs, Word documents, and images using Claude's native vision capabilities and parallel Task agents.

Installation

npx structurecc

This installs the plugin to ~/.claude/plugins/structurecc/.

Usage

Single Document

/structure document.pdf
/structure lab_image.png
/structure report.docx

Batch Processing

/structure:batch ./documents/
/structure:batch ./patient_files/ --output ./extracted/

Supported Formats

| Format | Extension | Notes | |--------|-----------|-------| | PDF | .pdf | Multi-page supported, chunked for large documents | | Word | .docx, .doc | Text and embedded images extracted | | Images | .png, .jpg, .jpeg, .tiff, .bmp | Single-page extraction |

Output

For each document, structurecc generates:

document_extracted/
├── chunks/              # Individual chunk extractions (for debugging)
├── structure.json       # Complete structured extraction
└── STRUCTURE.md         # Human-readable markdown summary

structure.json

{
  "source": "/path/to/document.pdf",
  "extracted": "2026-01-30T14:30:22Z",
  "pages": [
    {
      "page": 1,
      "elements": [
        {
          "id": "element_1",
          "type": "table",
          "title": "Table 1. Lab Results",
          "data": {
            "headers": ["Test", "Result", "Units", "Reference"],
            "rows": [
              ["Glucose", "126", "mg/dL", "70-100"]
            ]
          },
          "confidence": 0.98
        }
      ]
    }
  ],
  "summary": {
    "total_pages": 5,
    "tables": 3,
    "figures": 4,
    "equations": 1,
    "average_confidence": 0.94
  }
}

Architecture

structurecc uses a chunk-based parallel processing approach:

  1. Document Analysis - Determine page count and split into chunks (5 pages each)
  2. Parallel Extraction - Launch one Task agent per chunk for parallel processing
  3. Chunk Merge - Combine chunk results with page offset correction
  4. Output Generation - Create JSON and Markdown outputs
Document (20 pages)
       │
       ├── Chunk 1 (Pages 1-5)  → Agent 1
       ├── Chunk 2 (Pages 6-10) → Agent 2
       ├── Chunk 3 (Pages 11-15)→ Agent 3
       └── Chunk 4 (Pages 16-20)→ Agent 4
               │
               ▼
         Merged Output

This approach:

  • Maximizes throughput via parallel processing
  • Preserves context within chunks (figures and captions stay together)
  • Uses Claude's native vision (no external APIs)
  • Each agent has 200K context for thorough extraction

Element Types

Tables

Extracted with:

  • Headers and all rows
  • Cell values with exact formatting
  • Flags (H, L, *, †)
  • Footnotes
  • Merged cell information

Figures

Supports various figure types:

  • Charts/Graphs: Line, bar, scatter, pie with data series and axes
  • Scientific Images: Western blots, gels, micrographs
  • Diagrams: Flowcharts, illustrations, photographs

Each figure includes:

  • Title and caption
  • Data points (when visible)
  • Axis labels and ranges
  • Annotations and legends

Equations

Extracted as:

  • LaTeX representation
  • Plain text fallback
  • Variable definitions

Text Blocks

Captured with:

  • Full content
  • Type (header, paragraph, caption, footnote)
  • Formatting information

Confidence Scores

Every element includes a confidence score (0.0-1.0):

| Score | Meaning | |-------|---------| | 0.95-1.00 | Crystal clear extraction | | 0.85-0.94 | Clear with minor uncertainty | | 0.70-0.84 | Readable but some ambiguity | | < 0.70 | Needs manual verification |

Low confidence items are flagged in the output for review.

Use Cases

  • Medical Lab Results: Extract patient data from PDF reports
  • Research Papers: Structure tables and figures from publications
  • Scientific Images: Transcribe gel/blot data for documentation
  • Patient Records: Batch process document folders
  • Data Digitization: Convert scanned documents to structured data

Requirements

  • Claude Code CLI
  • No external dependencies (uses Claude's native capabilities)

How It Works

structurecc leverages Claude's multimodal capabilities:

  1. Claude Vision: Reads PDFs and images natively without OCR
  2. Parallel Agents: Task tool spawns chunk agents for parallel processing
  3. Structured Output: JSON schema ensures consistent, parseable output
  4. Markdown Summary: Human-readable format for quick review

No web searches, no external APIs, no Python dependencies. Just Claude + document = structured data.

Limitations

  • Very large documents (100+ pages) may require multiple runs
  • Handwritten content has lower accuracy than printed text
  • Low-resolution images may have reduced confidence scores
  • Complex nested tables may require manual verification

License

MIT