udb-cli
v0.1.8
Published
Personal knowledge base with RAG-powered chat. Save notes, ingest URLs, and search with natural language.
Maintainers
Readme
UDB (YouDB) - Personal Knowledge Base
A local RAG (Retrieval-Augmented Generation) CLI that lets you save and search personal knowledge using natural language.
Table of Contents
- What is UDB?
- Quick Start
- Usage
- How It Works
- Configuration
- Data Storage
- Requirements
- Troubleshooting
- Development
- License
What is UDB?
UDB is your personal knowledge assistant. You talk to it in plain English, and it can:
- Save notes, commands, and snippets
- Ingest web articles, YouTube videos, tweets, Confluence pages, Google Docs, and local files
- Search your knowledge base semantically
- Answer questions using only your saved knowledge
All data stays local on your machine. No cloud storage.
Quick Start
Option 1: Automated Install (Recommended)
curl -fsSL https://raw.githubusercontent.com/bhavidhingra/udb-cli/main/install.sh | bashThis script checks/installs dependencies (Ollama, yt-dlp), installs udb-cli from npm, and sets up the embedding model.
Option 2: npm (if you already have Ollama)
npm install -g udb-cliPrerequisites:
# Start Ollama and enable auto-start on boot (macOS)
brew services start ollama
# Pull the embedding model
ollama pull nomic-embed-textUsage
Just run udb to start chatting:
udbUDB Chat - Your personal knowledge base assistant
Commands: "exit" to quit, "clear" to reset history
Multi-line input: end line with \ to continue
I can search, add, ingest URLs, list, and delete from your KB.
You: _Examples
Save a note:
You: Save this command: git stash -u saves all changes including untracked files
UDB: Added successfully!
Source ID: kb-1234567890-abc
Chunks: 1Ask a question:
You: How do I stash untracked files in git?
UDB: git stash -u saves all changes including untracked filesIngest a URL:
You: Add this article: https://example.com/blog/post
UDB: Ingested successfully!
Source ID: kb-1234567890-xyz
Chunks: 5Ingest a YouTube video:
You: Save this video: https://www.youtube.com/watch?v=dQw4w9WgXcQ
UDB: Ingested successfully! (transcript extracted)Ingest a Confluence page:
You: Save this Confluence page: https://mycompany.atlassian.net/wiki/spaces/TEAM/pages/123456/Meeting+Notes
UDB: Ingested successfully!
Source ID: kb-1234567890-conf
Chunks: 3Ingest a Google Doc:
You: Save this doc: https://docs.google.com/document/d/1abc123xyz/edit
UDB: Ingested successfully!
Source ID: kb-1234567890-gdoc
Chunks: 4Ingest a local file:
You: Ingest ~/notes/meeting.md
UDB: Ingested successfully!
Source ID: kb-1234567890-file
Chunks: 3
You: Add /path/to/README.md to my KB
UDB: Ingested successfully!
Source ID: kb-1234567891-file
Chunks: 5List all sources:
You: What's in my knowledge base?
UDB: Sources (3):
• kb-123... Git Commands [text]
• kb-456... Blog Article [article]
• kb-789... Meeting Notes [text]Delete a source:
You: Delete source kb-123
UDB: Deleted source: kb-123Multi-line input:
You: Save this: \
... # Docker Commands \
... docker ps - list containers \
... docker logs <id> - view logs \
... docker exec -it <id> bash - shell into container
UDB: Added successfully!How It Works
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ You │────▶│ Claude │────▶│ KB Tools │
│ (chat) │ │ (reasoning)│ │ (MCP) │
└─────────────┘ └─────────────┘ └─────────────┘
│
┌─────────────────────────┴─────────────────────────┐
│ │
┌─────▼─────┐ ┌─────────────┐ ┌─────────────┐
│ Ollama │ │ SQLite │ │ sqlite-vss │
│ embeddings│ │ (storage) │ │ (vectors) │
└───────────┘ └─────────────┘ └─────────────┘- Chat Interface: You talk to UDB in natural language
- Claude: Understands your intent and calls the right KB tools
- KB Tools: Add, search, ingest, list, delete operations
- Ollama: Generates embeddings locally (nomic-embed-text, 768 dimensions)
- SQLite + sqlite-vss: Stores content and enables vector similarity search
Supported Content Types
| Type | Source | Extraction Method | | --------------- | ------------ | ----------------------- | | Articles | Web URLs | Mozilla Readability | | Videos | YouTube | yt-dlp (transcripts) | | Tweets | Twitter/X | FxTwitter API | | Confluence | Atlassian | REST API | | Google Docs | Google Drive | OAuth 2.0 API | | Text | Direct input | As-is | | Local Files | File paths | Direct fs read (.md, .txt, etc.) |
Search
UDB uses semantic search, not keyword matching:
- Your query is converted to a 768-dimensional vector
- Cosine similarity finds the most relevant chunks
- Results are deduplicated by source
- Only content above 40% similarity is returned
System Prompt
UDB uses a dynamic system prompt that includes current context (date, time, timezone) so it can answer time-related questions. The prompt guides Claude:
You are UDB, a personal knowledge base assistant. Your job is to help users by answering questions based on their knowledge base.
CURRENT CONTEXT:
- Date: Wednesday, February 19, 2026
- Time: 1:30:00 PM
- Timezone: Asia/Kolkata
You have access to these KB tools:
- kb_search: Search the knowledge base for relevant content
- kb_add: Add text content (notes, commands, snippets) to the KB
- kb_ingest: Ingest content from URLs or local files (articles, YouTube videos, tweets, .md, .txt files)
- kb_list: List all sources in the KB
- kb_delete: Delete a source by ID
- kb_get_source_chunks: Get ALL chunks from a specific source by its ID
WORKFLOW FOR ANSWERING QUESTIONS:
1. First, use kb_search to find relevant content
2. If kb_search finds a relevant source BUT the specific answer is NOT in the returned chunks:
- Note the source_id from the search results
- IMMEDIATELY use kb_get_source_chunks with that source_id to read ALL chunks
- The answer is likely in a chunk that wasn't returned by similarity search
3. Only after reading all relevant chunks, provide your answer
IMPORTANT RULES:
- Be CONCISE - give direct answers without excessive formatting, headers, or repetition
- Use the KB as your ONLY source of truth - NEVER make up information
- If you find a relevant source, ALWAYS use kb_get_source_chunks before saying "I couldn't find the specific information"
- Do NOT give up after kb_search alone - the information may be in other chunks of the same source
- Cite the source briefly when relevant
When the user wants to save information:
- Use kb_add for text or kb_ingest for URLs and local file paths
- Confirm the action briefly
When the user asks to see raw KB content or list sources:
- Use kb_list to show sources
- You can show the raw search results if the user explicitly asks for themConfiguration
Settings are in src/config.ts:
{
DATA_DIR: '~/.udb', // Where data is stored
DB_FILE: 'kb.db', // SQLite database
OLLAMA_URL: 'http://127.0.0.1:11434',
OLLAMA_MODEL: 'nomic-embed-text',
KB_CHUNK_SIZE: 800, // Characters per chunk
KB_CHUNK_OVERLAP: 200, // Overlap between chunks
KB_MIN_CHUNK: 50, // Minimum chunk size
KB_SEARCH_LIMIT: 10, // Default search results
KB_MIN_SIMILARITY: 0.7, // Similarity threshold
CLAUDE_MODEL: 'us.anthropic.claude-sonnet-4-20250514-v1:0',
}Environment Variables
Create a .env file in the project root (automatically loaded):
# Core settings
UDB_DATA_DIR=~/.udb # Data directory
OLLAMA_URL=http://127.0.0.1:11434
OLLAMA_MODEL=nomic-embed-text
CLAUDE_MODEL=us.anthropic.claude-sonnet-4-20250514-v1:0
# Atlassian/Confluence (optional, for ingesting Confluence pages)
[email protected]
ATLASSIAN_API_TOKEN=your-api-tokenTo get an Atlassian API token:
- Go to https://id.atlassian.com/manage-profile/security/api-tokens
- Create a new API token
- Add it to your
.envfile along with your Atlassian email
Google Docs Setup
To ingest Google Docs, you need to set up OAuth 2.0 credentials:
- Go to Google Cloud Console
- Create or select a project
- Enable the Google Docs API (APIs & Services → Enable APIs)
- Create OAuth credentials (APIs & Services → Credentials → Create Credentials → OAuth client ID)
- Configure OAuth consent screen if prompted (External, add your email as test user)
- Application type: Desktop app
- Download the credentials JSON and save as
~/.udb/credentials.json
On first use, UDB will open your browser for Google authorization. Tokens are stored in ~/.udb/google-tokens.json and automatically refresh
Data Storage
All data is stored locally in ~/.udb/:
~/.udb/
├── kb.db # SQLite database
├── kb.db-shm # (if WAL mode enabled)
├── kb.db-wal # (if WAL mode enabled)
├── credentials.json # Google OAuth credentials (you create this)
├── google-tokens.json # Google OAuth tokens (auto-generated)
└── locks/ # Concurrency lock filesDatabase Schema
kb_sources - Original content
id, url, title, source_type, summary, raw_content,
content_hash (UNIQUE), tags, created_at, updated_atkb_chunks - Chunked content with embeddings
id, source_id (FK), chunk_index, content,
embedding (BLOB), created_atkb_chunks_vss - Vector search index
embedding(768) -- sqlite-vss virtual tableRequirements
- Node.js 18+
- Ollama running locally with
nomic-embed-textmodel - Claude CLI authenticated (for the chat interface)
- yt-dlp (optional, for YouTube transcripts)
Troubleshooting
"Ollama not available"
Ollama is required for generating embeddings (semantic search).
# Install Ollama
brew install ollama # macOS
# OR download from https://ollama.ai/download
# Start Ollama
ollama serve
# Pull the embedding model
ollama pull nomic-embed-text
# Verify it's running
curl http://127.0.0.1:11434/api/tags"yt-dlp not installed"
YouTube video ingestion requires yt-dlp:
# Install yt-dlp
brew install yt-dlp # macOS
pip install yt-dlp # any platform
# Verify installation
yt-dlp --version"sqlite-vss extension failed to load"
The native module may need rebuilding:
npm rebuildSearch returns no results
- Check if Ollama is running
- Verify content was chunked:
sqlite3 ~/.udb/kb.db "SELECT COUNT(*) FROM kb_chunks;" - Content may be below minimum chunk size (50 chars)
Claude authentication errors
Ensure Claude CLI is authenticated:
claude --version
# If not logged in, authenticate firstDevelopment
# Build
npm run build
# Run in development
npm run dev
# Type check
npx tsc --noEmitLicense
MIT
