@behagoras/chat-with-pdf

v0.0.4

Published

4 months ago

Local PDF QA CLI (RAG) with OpenAI chat + embeddings.

0High
0Medium
0Low

behagoras

rag pdf openai cli chat

@behagoras/chat-with-pdf

Local PDF QA CLI (RAG) with OpenAI chat + embeddings. Ingest PDFs to a JSON index on disk and ask grounded questions with inline citations. Includes an interactive chat mode with short-term conversation memory.

npm: @behagoras/chat-with-pdf
Repo: https://github.com/behagoras/chat-with-pdf

Installation

Global (recommended):

npm i -g @behagoras/chat-with-pdf@latest
# ensure npm global bin is on PATH

Set your OpenAI key (env or flag):

export OPENAI_API_KEY=sk-...

or pass per-command

# --openai-key sk-...

or persist your key once:

chat-with-pdf auth:set sk-...

For managing your key, you can use the following commands:

chat-with-pdf auth:set sk-...
chat-with-pdf auth:show ## We won't show the key, but it will be stored in ~/.config/chat-with-pdf/config.json
# to remove it:
chat-with-pdf auth:clear

Quick Start

Ingest one or more PDFs

chat-with-pdf ingest ./my.pdf -o ./pdfqa.index.json
# or with a specific openai key
chat-with-pdf ingest ./my.pdf -o ./pdfqa.index.json --openai-key sk-...
# multiple pdf's to be indexed
chat-with-pdf ingest ./a.pdf ./b.pdf -o ./pdfqa.index.json

Ask one-off

# one-off question
chat-with-pdf ask -i ./pdfqa.index.json "Where do I start?"
# with a specific openai key
chat-with-pdf ask -i ./pdfqa.index.json --openai-key sk-... "Where do I start?"

Interactive chat

# interactive chat
chat-with-pdf chat -i ./pdfqa.index.json
# then type follow-ups; quit with q, quit, :q or exit (or Ctrl+C)

Options:

--model (default gpt-4o-mini) gpt5 is not supported yet
--embed-model (default text-embedding-3-small)
--top-k retrieval size (default 5)
--json for structured output
--openai-key specific openai key
--save <path> save conversation JSON to path
--index <path> index path (default ./pdfqa.index.json)

Architecture

Functional core, IO at edges for easy swapping:
- PDF extraction: pdf-parse first, fallback pdfjs-dist (legacy build) for page text
- Chunking: simple character-based with overlap
- Embeddings: OpenAI text-embedding-3-*
- Retrieval: in-memory cosine over JSON index
- Chat: OpenAI chat completions (gpt-4o-mini default)
- ModelProvider interface allows swapping providers
Conversation memory:
- Interactive chat keeps a small rolling window of (Q, A) turns
- Retrieval is conditioned on recent turns to resolve referents
- Prompt includes PRIOR_CONVERSATION and retrieved CONTEXT
Clean seams to grow:
- Replace JSON with SQLite + sqlite-vec (swap persistence + retrieval)
- Add streaming responses
- Per-page citations

Commands

ingest [options] <files...>
- -o, --out index path (default ./pdfqa.index.json)
- --chunk target characters (default 3500)
- --overlap characters (default 200)
- --embed-model embedding model
ask [options] <question...>
- One-off Q&A; prints answer and citations
chat [options] [question...]
- Interactive loop; optional initial question
- Quit with q, quit, :q, or exit (or Ctrl+C)

Release/publish (maintainers)

Release a new version (defaults to patch):

npm run release
npm run release patch
npm run release minor
npm run release major

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@behagoras/chat-with-pdf

Installation

Quick Start

Architecture

Commands

Release/publish (maintainers)

Requirements

Repository

License