npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@evomap/pdf2gep

v1.2.0

Published

Convert a PDF into GEP (Genome Evolution Protocol) retrieval bundles for EvoMap

Readme

pdf2gep

Convert a PDF document into GEP (Genome Evolution Protocol) assets suitable for retrieval inside the EvoMap network.

pdf2gep fetches a PDF (local path or URL), splits the text into chunks, and writes one GEP bundle per chunk:

  • A Gene of category knowledge_reference -- a compact retrieval pointer.
  • A KnowledgeCapsule with source_type = "pdf_knowledge" -- the chunk text itself, carried as reference material.

Honest scope note (please read before using)

pdf2gep is a retrieval-oriented protocol adapter. It does not produce the kind of Capsule that proves a Gene works.

  • A standard GEP Capsule is an auditable record of one real execution of a Gene (execution_trace with exit codes, non-zero blast_radius, etc.). PDFs contain knowledge, not executions, so pdf2gep deliberately emits a different variant: source_type = "pdf_knowledge", outcome.status = "knowledge_reference", and an empty execution_trace. Treating these as proof-of-validation is a misuse.
  • The paper that motivates GEP -- Wang, Ren, Zhang, "From Procedural Skills to Strategy Genes: Towards Experience-Driven Test-Time Evolution" (arXiv:2604.15097) -- validates Gene-as-control-interface on 45 scientific code-solving tasks with Gemini 3.1 Pro and Flash Lite. That result does not carry over automatically to retrieval-style knowledge Genes. The Gene emitted by this tool is explicitly a retrieval pointer, not a control interface.
  • Chunk quality is naive: fixed-width ~4000-char slices. This is fine for retrieval-by-topic, but it is not a structured extraction. Do not expect the output to replace a proper RAG ingestion pipeline.

Downstream consumers (EvoMap hub, local agents) should filter on source_type and treat pdf_knowledge Capsules as reference material only.

Install

Recommended: from npm

npm install -g @evomap/pdf2gep

This installs the pdf2gep CLI globally. Requires Node.js 18+ (for built-in fetch).

For one-off use, npx works without a global install:

npx @evomap/pdf2gep "https://arxiv.org/pdf/2604.15097.pdf"

Alternative: from source

git clone https://github.com/EvoMap/pdf2gep.git
cd pdf2gep
npm install

Usage

After npm install -g @evomap/pdf2gep:

# From a URL (arXiv, etc.)
pdf2gep "https://arxiv.org/pdf/2604.15097.pdf"

# From a local file
pdf2gep "./manual.pdf"

When working from a source checkout, the equivalent is node index.js "<url-or-path>".

Bundles are written to temp/evomap_assets/batch_<timestamp>.json under the current working directory. Each entry in the batch is { gene, capsule }.

Library API

pdf2gep also exposes its building blocks for programmatic use:

const {
  chunkText,
  createGene,
  createKnowledgeCapsule,
  processChunk,
  SCHEMA_VERSION,
} = require('@evomap/pdf2gep');

This is useful when embedding the adapter inside a custom ingestion pipeline. The exported helpers are documented inline in index.js.

Output schema (GEP 1.6.0)

Gene

{
  "type": "Gene",
  "id": "gene_pdf2gep_<slug>_chunk<N>_<sha8>",
  "category": "knowledge_reference",
  "summary": "Retrieval pointer for <slug> chunk #<N>",
  "signals_match": ["knowledge_lookup", "pdf_reference", "<slug>"],
  "preconditions": ["Agent needs to consult the source document to answer or plan."],
  "strategy": [
    "Retrieve the backing KnowledgeCapsule (source_type=pdf_knowledge) to read the chunk verbatim.",
    "Do NOT treat the chunk as a validated procedure. It is reference material only."
  ],
  "constraints": { "max_files": 0, "forbidden_paths": [".git", "node_modules"] },
  "validation": [],
  "schema_version": "1.6.0",
  "_source": {
    "kind": "pdf2gep",
    "source_type": "pdf_knowledge",
    "source_ref": "<url or absolute path>",
    "source_sha256": "<sha256 of the whole pdf>",
    "chunk_index": 0,
    "chunk_sha256": "<sha256 of this chunk>",
    "claims_outside_scope": "knowledge_extraction",
    "paper_scope_note": "Gene-as-control-interface was validated by arXiv:2604.15097 on code-science tasks. A knowledge_reference Gene is NOT a control interface; it is a retrieval pointer."
  }
}

KnowledgeCapsule

{
  "type": "Capsule",
  "id": "cap_pdf2gep_<chunk_sha12>_<idkey>",
  "gene": "<gene.id>",
  "source_type": "pdf_knowledge",
  "trigger": ["knowledge_lookup", "pdf_reference", "<slug>"],
  "summary": "PDF chunk #<N> from <name>",
  "confidence": null,
  "blast_radius": { "files": 0, "lines": 0, "chunk_chars": 4000 },
  "outcome": { "status": "knowledge_reference", "score": null },
  "env_fingerprint": { "platform": "...", "node": "..." },
  "content": "<chunk text verbatim>",
  "execution_trace": [],
  "schema_version": "1.6.0",
  "_source": {
    "source_ref": "<url or path>",
    "source_sha256": "<sha256 of the whole pdf>",
    "chunk_index": 0,
    "chunk_sha256": "<sha256 of this chunk>",
    "claims_outside_scope": "knowledge_extraction"
  }
}

Key invariants validators can rely on:

  • outcome.status === "knowledge_reference" -- never "success" / "failed".
  • execution_trace is empty.
  • blast_radius.files === 0 && lines === 0.
  • source_type === "pdf_knowledge" on both the Gene _source and the Capsule.

Publishing to EvoMap

Use evolver (the GEP reference runtime) to publish a bundle:

evolver publish --bundle temp/evomap_assets/batch_<ts>.json

The EvoMap hub routes pdf_knowledge Capsules to the retrieval index, separately from execution Capsules. Installation and consumption is done via the usual evolver run / gep_install_gene flow; agents that match a knowledge_lookup signal will pick the retrieval Gene and fetch the backing Capsule for citation.

See also:

Relationship to other tools

  • skill2gep -- protocol adapter that converts SKILL.md into Gene+ExecutionCapsule bundles. That tool is for procedural knowledge where the Capsule's execution_trace comes from real runs. pdf2gep is complementary: it covers reference knowledge and deliberately does not fabricate execution evidence.
  • kitchen-engineer42/pdf2skills -- prior art that inspired this tool. pdf2skills targets Claude Code's SKILL.md format; pdf2gep targets the GEP protocol and is explicit about being retrieval-only.

License

MIT. See LICENSE.