@behagoras/chat-with-pdf
v0.0.4
Published
Local PDF QA CLI (RAG) with OpenAI chat + embeddings.
Readme
@behagoras/chat-with-pdf
Local PDF QA CLI (RAG) with OpenAI chat + embeddings. Ingest PDFs to a JSON index on disk and ask grounded questions with inline citations. Includes an interactive chat mode with short-term conversation memory.
- npm:
@behagoras/chat-with-pdf - Repo: https://github.com/behagoras/chat-with-pdf
Installation
- Global (recommended):
npm i -g @behagoras/chat-with-pdf@latest
# ensure npm global bin is on PATHSet your OpenAI key (env or flag):
export OPENAI_API_KEY=sk-...or pass per-command
# --openai-key sk-...or persist your key once:
chat-with-pdf auth:set sk-...For managing your key, you can use the following commands:
chat-with-pdf auth:set sk-...
chat-with-pdf auth:show ## We won't show the key, but it will be stored in ~/.config/chat-with-pdf/config.json
# to remove it:
chat-with-pdf auth:clearQuick Start
- Ingest one or more PDFs
chat-with-pdf ingest ./my.pdf -o ./pdfqa.index.json
# or with a specific openai key
chat-with-pdf ingest ./my.pdf -o ./pdfqa.index.json --openai-key sk-...
# multiple pdf's to be indexed
chat-with-pdf ingest ./a.pdf ./b.pdf -o ./pdfqa.index.json- Ask one-off
# one-off question
chat-with-pdf ask -i ./pdfqa.index.json "Where do I start?"
# with a specific openai key
chat-with-pdf ask -i ./pdfqa.index.json --openai-key sk-... "Where do I start?"- Interactive chat
# interactive chat
chat-with-pdf chat -i ./pdfqa.index.json
# then type follow-ups; quit with q, quit, :q or exit (or Ctrl+C)Options:
--model(defaultgpt-4o-mini) gpt5 is not supported yet--embed-model(defaulttext-embedding-3-small)--top-kretrieval size (default 5)--jsonfor structured output--openai-keyspecific openai key--save <path>save conversation JSON to path--index <path>index path (default./pdfqa.index.json)
Architecture
Functional core, IO at edges for easy swapping:
- PDF extraction:
pdf-parsefirst, fallbackpdfjs-dist(legacy build) for page text - Chunking: simple character-based with overlap
- Embeddings: OpenAI
text-embedding-3-* - Retrieval: in-memory cosine over JSON index
- Chat: OpenAI chat completions (
gpt-4o-minidefault) - ModelProvider interface allows swapping providers
- PDF extraction:
Conversation memory:
- Interactive
chatkeeps a small rolling window of (Q, A) turns - Retrieval is conditioned on recent turns to resolve referents
- Prompt includes PRIOR_CONVERSATION and retrieved CONTEXT
- Interactive
Clean seams to grow:
- Replace JSON with SQLite + sqlite-vec (swap persistence + retrieval)
- Add streaming responses
- Per-page citations
Commands
ingest [options] <files...>-o, --outindex path (default./pdfqa.index.json)--chunktarget characters (default 3500)--overlapcharacters (default 200)--embed-modelembedding model
ask [options] <question...>- One-off Q&A; prints answer and citations
chat [options] [question...]- Interactive loop; optional initial question
- Quit with
q,quit,:q, orexit(or Ctrl+C)
Release/publish (maintainers)
Release a new version (defaults to patch):
npm run release
npm run release patch
npm run release minor
npm run release majorRequirements
- Node.js >= 18
OPENAI_API_KEYin environment
Repository
- Source and issues: https://github.com/behagoras/chat-with-pdf
License
MIT © 2025 David Behar
