parseleaf
v0.1.1
Published
Parseleaf CLI converts EPUB books into semantically structured markdown for downstream AI workflows.
Maintainers
Readme
Parseleaf
Parseleaf turns ebooks into clean, semantically structured Markdown workspace so that you can operationalise its content, themes and ideas with AI.
If you give an AI an ebook as one giant blob of text, it does not really encounter the book as a book. Chapters, prefaces, appendices, notes, and internal references all get flattened together. That makes it much harder for the model to follow the structure of the argument, answer precise questions, or stay grounded in where an idea came from.
Parseleaf preserves that structure instead of destroying it. It reads the EPUB properly, splits it into meaningful sections, keeps notes and internal links intact, extracts relevant assets, and writes everything out in a form an AI can work with directly. The output is essentially the book behaving like a markdown wiki.
The result is easier to read, easier to search, and much easier to feed into embeddings, retrieval pipelines, agent tools, and other downstream systems.
That makes a bunch of real use cases much better: building structured notes, extracting methods or concepts, turning the content into reusable skills and workflows, using the principles from the book for decision intelligence, etc.
Right now Parseleaf ships as a CLI tool that creates a markdown workspace from your EPUBs. The broader project direction is to take any form of media and make it AI-usable.
Why Parseleaf
- Preserves chapter and section boundaries instead of emitting one monolithic file
- Extracts assets and keeps links stable in the generated workspace
- Writes YAML frontmatter and a machine-readable
manifest.json - Produces output that works for both humans and downstream software
- Handles EPUB 3 navigation, EPUB 2 NCX fallbacks, and degraded books with weak structure
Install
NPM:
npm install -g parseleafHomebrew:
brew install shomoD9/parseleaf/parseleafQuick Start
Convert an EPUB:
parseleaf convert path/to/book.epubWrite the output to a custom directory:
parseleaf convert path/to/book.epub --out ./my-outputWithout --out, Parseleaf writes to:
./output/<book-slug>/What You Get
A successful run produces a structured workspace that looks roughly like this:
output/the-book/
01-contents.md
02-preface.md
03-chapter-1.md
04-appendix-a.md
assets/
manifest.jsonEach Markdown file contains YAML frontmatter with source metadata. manifest.json records the generated order, section titles, source mappings, and extracted assets.
Example
parseleaf convert ./books/the-idea.epub --out ./parsed/the-ideaExample result:
parsed/the-idea/
01-contents.md
02-introduction.md
03-chapter-1-the-river.md
04-notes.md
assets/
manifest.json
