seta-indexer
v0.1.1
Published
Vector database indexer for documentation
Maintainers
Readme
seta-indexer
A standalone CLI tool for vector database indexing and semantic search over documentation. Supports PDF, Markdown, text, and more. Powered by local embeddings and LanceDB.
Features
- Indexes PDF, Markdown, MDX, TXT, JSON, YAML, XML, CSV
- Intelligent chunking with sentence/paragraph boundary detection
- Fast local embeddings with all-MiniLM-L6-v2 (via @xenova/transformers)
- Vector similarity search with filtering (library, topic, difficulty)
- Incremental updates and file change detection
- Flexible file selection with
--includeand--excludeglob patterns
Installation
npm install -g seta-indexer
# or use npx
npx seta-indexer <folder> [options]Usage
Index a documentation folder
npx seta-indexer /path/to/docsLocal usage (from cloned repo)
# Clone and setup
git clone https://github.com/techformist/seta-indexer.git
cd seta-indexer
npm install
# Build the project
npm run build
# Run locally with node
node dist/cli.js /path/to/docs
# Or use the dev script for development
npm run dev -- index /path/to/docsAvailable commands (local usage)
# Index documents
node dist/cli.js index /path/to/docs --verbose
# Search indexed content
node dist/cli.js search "your query" /path/to/docs
# Show database statistics
node dist/cli.js stats /path/to/docs
# Clean/remove database
node dist/cli.js clean /path/to/docs
# Run tests
npm testWith options
--verbose, -v: Detailed logging--force: Force re-index all files--chunk-size <size>: Chunk size (default: 1000)--chunk-overlap <overlap>: Overlap (default: 200)--model <model>: Embedding model (default: all-MiniLM-L6-v2)--db-path <path>: Custom DB path--include <patterns...>: Glob patterns to include (e.g.**/*.mddocs/**/*.pdf)--exclude <patterns...>: Glob patterns to exclude (e.g.**/drafts/**)
Supported File Types
- .pdf, .md, .mdx, .txt, .json, .yaml, .yml, .xml, .csv (by default)
- Use
--include/--excludefor custom file selection
Output Examples
Indexing:
🚀 Starting indexing process for: /docs
📁 Documentation path: /docs
🗄️ Database path: /docs/.seta_lancedb
📋 Loading existing index state...
🔍 Scanning documentation files...
📄 Found 25 documentation files
🧠 Initializing embedding model...
🔗 Connecting to LanceDB...
⚙️ Processing documentation files...
📄 Processing: main_guide.md
📝 Generated 12 chunks
✅ Generated 12 embedded chunks
✅ Indexing completedTroubleshooting
- Ensure all dependencies are installed (
npm install) - For PDF extraction errors, check file integrity
- For embedding errors, ensure enough RAM and disk space for model caching
- For DB errors, use
--forceto re-index from scratch
License
MIT
