@deltaxy/loop
v1.0.0
Published
AI that learns from every correction — document intelligence CLI
Maintainers
Readme
Loop 🔄
AI that learns from every correction.
Loop is a CLI tool that ingests documents (PDF, Excel, CSV), answers questions with citations, and improves accuracy through an eval feedback loop.
loop ingest ./portfolio/ # Parse documents into searchable text
loop chat # Ask questions, get cited answers
loop eval --benchmark custom # Measure accuracy
loop eval --improve # Learn from mistakes
loop eval --history # Track the curve ↗The Curve
This is the product. Not the chat. Not the search. The curve.
Accuracy over time
──────────────────────────────────────────────────
Run 1 ████████████████░░░░░░░░░░░░░░░░ 52% baseline
Run 2 ████████████████████████░░░░░░░░ 68% +16% fix: numerical extraction
Run 3 ██████████████████████████████░░ 81% +13% fix: cross-doc references
Run 4 ████████████████████████████████ 89% +8% fix: date parsing
Every cycle: measure → analyze failures → improve → measure again.Loop generates QA benchmarks from your documents, measures accuracy, analyzes failure patterns, and suggests improvements. Accuracy goes up with every cycle.
Install
npm install -g loop-aiRequires Node.js 18+. Python 3.10+ needed only for PDF parsing (pip install pymupdf4llm). Excel and CSV work without Python.
LLM Provider Setup
Loop uses Pi SDK for LLM reasoning. Configure your provider:
# Anthropic (recommended)
export ANTHROPIC_API_KEY=sk-ant-...
# Or OpenRouter (for model variety)
export OPENROUTER_API_KEY=sk-or-...Quick Start
Option 1: Interactive demo (no setup needed)
loop demoDownloads 10 real SEC filings (Best Buy, AMD, Boeing, Microsoft, PepsiCo, Nike, Pfizer, American Express, General Mills, J&J — 1,792 pages) from FinanceBench. Interactive walkthrough with 5 suggested query types.
Option 2: Your own documents
loop ingest ./your-docs/ # Ingest a folder of PDFs, Excel, CSV
loop chat # Start asking questionsCommands
loop ingest <source>
Parse and ingest documents. Supports PDF, Excel (.xlsx/.xls), and CSV.
loop ingest report.pdf # Single file
loop ingest ./portfolio/ # Folder (recursive)
loop ingest data.xlsx # Excel spreadsheetRe-running skips already-ingested files. Each document is auto-classified by type (lease, amendment, financial report, etc.) and indexed.
loop chat
Interactive multi-turn conversation with citations.
loop chat # Default persona
loop chat --persona finance # Finance-tuned answers
loop chat --persona legal # Legal analysis styleFeatures:
- Multi-turn context (follow-up questions work)
- Tool progress in real-time (
▸ Reading...,▸ Searching...) - Page-level citations on every answer
- Session logging for later analysis
/newto start fresh,/quitto exit
loop query <question>
One-shot query for scripting and pipelines.
loop query "What is the lease term for MSN 4521?"
loop query "Total revenue?" --json # Structured JSON output
loop query "Risk factors?" --persona executive # Brief styleExit code 0 on success, 1 on error. Progress goes to stderr, answer to stdout — pipe-safe.
loop generate-qa
Generate QA benchmark from your corpus.
loop generate-qa --count 30 # Generate 30 QA pairs
loop generate-qa --export csv # Export for human review
# ... edit the CSV: keep, edit, or discard each pair ...
loop generate-qa --import reviewed.csv # Import as benchmark v1
loop generate-qa --from-chats # Mine QA pairs from chat historyEach pair is tagged with dimensions (question type, difficulty, source format, edge cases) for granular error analysis.
loop eval
The eval loop — measure, analyze, improve, repeat.
loop eval --benchmark custom # Run benchmark, get accuracy
loop eval --analyze # Failure analysis by dimension
loop eval --judge-create # Build domain-specific LLM judge
loop eval --improve # AI suggests prompt improvement
loop eval --history # Show THE CURVE
loop eval --benchmark custom@v1 # Run specific versionloop demo
Interactive walkthrough with real SEC filings.
loop demo # Full demo (ingest + interactive queries)
loop demo --quick # Ingest only, skip queriesloop status
Corpus stats, persona, session history, signal counts.
loop config
loop config show # Current settings
loop config set persona finance # Change default personaThe Eval Loop (Step by Step)
This is how Loop gets better over time:
# 1. Ingest your documents
loop ingest ./docs/
# 2. Generate a benchmark
loop generate-qa --count 30
loop generate-qa --export csv
# Human reviews CSV → keeps good pairs, fixes bad ones, discards noise
loop generate-qa --import reviewed.csv # → benchmark v1
# 3. Measure baseline
loop eval --benchmark custom
# → Accuracy: 52%
# 4. Analyze failures
loop eval --analyze
# → Worst: numerical questions at 31% accuracy
# → Pattern: misreading tabular data
# 5. Improve
loop eval --improve
# → Suggests: "When answering numerical questions, read the full table row
# and verify units before responding."
# → Applied to system prompt
# 6. Re-measure
loop eval --benchmark custom
# → Accuracy: 68% (+16%)
# 7. See the curve
loop eval --history
# → The curve goes up ↗Personas
| Persona | Style | Best for |
|---------|-------|----------|
| general | Balanced, clear | Default |
| portfolio_manager | Asset focus, returns, portfolio view | Investment oversight |
| finance | Calculations, rates, per-unit breakdowns | Financial analysis |
| legal | Clause references, contractual precision | Contract review |
| technical | Specs, serial numbers, maintenance data | Technical due diligence |
| executive | Brief, high-level, key takeaways | Quick decisions |
| junior | Explanations, context, approachable | Learning & onboarding |
loop config set persona finance # Set default
loop chat --persona legal # Override per session
loop query "..." --persona executive # Override per queryArchitecture
Documents are pre-parsed at ingest into plain text with page/sheet markers. At query time, the AI reads files using native tools (read, grep, search). No embeddings. No vector store. No RAG pipeline. File search IS the retrieval.
┌─────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ PDF │ │ │ │ Plain │ │ │
│ Excel │───▶│ Ingest │───▶│ Text │───▶│ Query │──▶ Answer
│ CSV │ │ │ │ Corpus │ │ │ + Citation
└─────────┘ └──────────┘ └──────────┘ └──────────┘
│
┌────┴────┐
│ Eval │
│ Loop │──▶ The Curve ↗
└─────────┘How it works
- Ingest — PDFs parsed via PyMuPDF4LLM with
--- PAGE N ---markers. Excel sheets become pipe-delimited tables. CSV stays as-is with headers. - Index — Every document listed in
INDEX.mdwith type classification. This is the AI's table of contents. - Query — AI reads INDEX.md, picks relevant files, greps for keywords, reads specific pages. Cites
[filename, Page N]. - Eval — QA pairs with expected answers. AI answers each, LLM grader scores pass/fail. Dimensions enable slice-and-dice analysis.
- Improve — Reflector analyzes failure patterns, Curator proposes prompt delta, Tester validates no regressions.
Storage
All data in ~/.loop/:
~/.loop/
├── corpus/ # Parsed text files + INDEX.md
├── config.json # Persona, model settings
├── system.md # Current system prompt (auto-improved)
├── chat-logs/ # Session JSONL files
├── benchmarks/ # QA pairs (draft, reviewed, versioned)
└── eval/ # Eval runs, judge prompt, improvement historyCI
212 acceptance tests. Real LLM calls. No mocks. Runs on every push.
Test Files 34 passed (34)
Tests 212 passed | 22 skipped (234)Requirements
- Node.js 18+ (required)
- Python 3.10+ with
pymupdf4llm(only for PDF parsing) - LLM API key — Anthropic, OpenRouter, or compatible provider
pip install pymupdf4llm # PDF supportBuilt for
DeltaXY — aviation leasing consultancy. Loop was built to answer questions across portfolios of aircraft lease documents, fleet spreadsheets, and utilization reports. But it works on any document corpus.
License
MIT
