udb-cli

v0.1.9

Published

16 days ago

Personal knowledge base with RAG-powered chat. Save notes, ingest URLs, and search with natural language.

Downloads

204

0High
0Medium
0Low

bhavidhingra09

rag knowledge-base cli ollama embeddings sqlite vector-search

UDB (YouDB) - Personal Knowledge Base

A local RAG (Retrieval-Augmented Generation) CLI that lets you save and search personal knowledge using natural language.

What is UDB?

UDB is your personal knowledge assistant. You talk to it in plain English, and it can:

Save notes, commands, and snippets
Ingest web articles, YouTube videos, tweets, Confluence pages, Google Docs, and local files
Search your knowledge base semantically
Answer questions using only your saved knowledge

All data stays local on your machine. No cloud storage.

Quick Start

Option 1: Automated Install (Recommended)

curl -fsSL https://raw.githubusercontent.com/bhavidhingra/udb-cli/main/install.sh | bash

This script checks/installs dependencies (Ollama, yt-dlp), installs udb-cli from npm, and sets up the embedding model.

Option 2: npm (if you already have Ollama)

npm install -g udb-cli

Prerequisites:

Ollama running with nomic-embed-text model
yt-dlp (optional, for YouTube videos)

# Start Ollama and enable auto-start on boot (macOS)
brew services start ollama

# Pull the embedding model
ollama pull nomic-embed-text

Usage

Just run udb to start chatting:

udb

UDB Chat - Your personal knowledge base assistant
Commands: "exit" to quit, "clear" to reset history
Multi-line input: end line with \ to continue
I can search, add, ingest URLs, list, and delete from your KB.

You: _

Examples

Save a note:

You: Save this command: git stash -u saves all changes including untracked files
UDB: Added successfully!
  Source ID: kb-1234567890-abc
  Chunks: 1

Ask a question:

You: How do I stash untracked files in git?
UDB: git stash -u saves all changes including untracked files

Ingest a URL:

You: Add this article: https://example.com/blog/post
UDB: Ingested successfully!
  Source ID: kb-1234567890-xyz
  Chunks: 5

Ingest a YouTube video:

You: Save this video: https://www.youtube.com/watch?v=dQw4w9WgXcQ
UDB: Ingested successfully! (transcript extracted)

Ingest a Confluence page:

You: Save this Confluence page: https://mycompany.atlassian.net/wiki/spaces/TEAM/pages/123456/Meeting+Notes
UDB: Ingested successfully!
  Source ID: kb-1234567890-conf
  Chunks: 3

Ingest a Google Doc:

You: Save this doc: https://docs.google.com/document/d/1abc123xyz/edit
UDB: Ingested successfully!
  Source ID: kb-1234567890-gdoc
  Chunks: 4

Ingest a local file:

You: Ingest ~/notes/meeting.md
UDB: Ingested successfully!
  Source ID: kb-1234567890-file
  Chunks: 3

You: Add /path/to/README.md to my KB
UDB: Ingested successfully!
  Source ID: kb-1234567891-file
  Chunks: 5

List all sources:

You: What's in my knowledge base?
UDB: Sources (3):
  • kb-123... Git Commands [text]
  • kb-456... Blog Article [article]
  • kb-789... Meeting Notes [text]

Delete a source:

You: Delete source kb-123
UDB: Deleted source: kb-123

Multi-line input:

You: Save this: \
... # Docker Commands \
... docker ps - list containers \
... docker logs <id> - view logs \
... docker exec -it <id> bash - shell into container
UDB: Added successfully!

How It Works

┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│   You       │────▶│   Claude    │────▶│  KB Tools   │
│  (chat)     │     │  (reasoning)│     │  (MCP)      │
└─────────────┘     └─────────────┘     └─────────────┘
                                              │
                    ┌─────────────────────────┴─────────────────────────┐
                    │                                                   │
              ┌─────▼─────┐    ┌─────────────┐    ┌─────────────┐
              │  Ollama   │    │   SQLite    │    │ sqlite-vss  │
              │ embeddings│    │  (storage)  │    │ (vectors)   │
              └───────────┘    └─────────────┘    └─────────────┘

Chat Interface: You talk to UDB in natural language
Claude: Understands your intent and calls the right KB tools
KB Tools: Add, search, ingest, list, delete operations
Ollama: Generates embeddings locally (nomic-embed-text, 768 dimensions)
SQLite + sqlite-vss: Stores content and enables vector similarity search

Supported Content Types

| Type | Source | Extraction Method | | --------------- | ------------ | ----------------------- | | Articles | Web URLs | Mozilla Readability | | Videos | YouTube | yt-dlp (transcripts) | | Tweets | Twitter/X | FxTwitter API | | Confluence | Atlassian | REST API | | Google Docs | Google Drive | OAuth 2.0 API | | Text | Direct input | As-is | | Local Files | File paths | Direct fs read (.md, .txt, etc.) |

Search

UDB uses semantic search, not keyword matching:

Your query is converted to a 768-dimensional vector
Cosine similarity finds the most relevant chunks
Results are deduplicated by source
Only content above 40% similarity is returned

System Prompt

UDB uses a dynamic system prompt that includes current context (date, time, timezone) so it can answer time-related questions. The prompt guides Claude:

You are UDB, a personal knowledge base assistant. Your job is to help users by answering questions based on their knowledge base.

CURRENT CONTEXT:
- Date: Wednesday, February 19, 2026
- Time: 1:30:00 PM
- Timezone: Asia/Kolkata

You have access to these KB tools:
- kb_search: Search the knowledge base for relevant content
- kb_add: Add text content (notes, commands, snippets) to the KB
- kb_ingest: Ingest content from URLs or local files (articles, YouTube videos, tweets, .md, .txt files)
- kb_list: List all sources in the KB
- kb_delete: Delete a source by ID
- kb_get_source_chunks: Get ALL chunks from a specific source by its ID

WORKFLOW FOR ANSWERING QUESTIONS:
1. First, use kb_search to find relevant content
2. If kb_search finds a relevant source BUT the specific answer is NOT in the returned chunks:
   - Note the source_id from the search results
   - IMMEDIATELY use kb_get_source_chunks with that source_id to read ALL chunks
   - The answer is likely in a chunk that wasn't returned by similarity search
3. Only after reading all relevant chunks, provide your answer

IMPORTANT RULES:
- Be CONCISE - give direct answers without excessive formatting, headers, or repetition
- Use the KB as your ONLY source of truth - NEVER make up information
- If you find a relevant source, ALWAYS use kb_get_source_chunks before saying "I couldn't find the specific information"
- Do NOT give up after kb_search alone - the information may be in other chunks of the same source
- Cite the source briefly when relevant

When the user wants to save information:
- Use kb_add for text or kb_ingest for URLs and local file paths
- Confirm the action briefly

When the user asks to see raw KB content or list sources:
- Use kb_list to show sources
- You can show the raw search results if the user explicitly asks for them

Configuration

Settings are in src/config.ts:

{
  DATA_DIR: '~/.udb',              // Where data is stored
  DB_FILE: 'kb.db',                // SQLite database
  OLLAMA_URL: 'http://127.0.0.1:11434',
  OLLAMA_MODEL: 'nomic-embed-text',
  KB_CHUNK_SIZE: 800,              // Characters per chunk
  KB_CHUNK_OVERLAP: 200,           // Overlap between chunks
  KB_MIN_CHUNK: 50,                // Minimum chunk size
  KB_SEARCH_LIMIT: 10,             // Default search results
  KB_MIN_SIMILARITY: 0.7,          // Similarity threshold
  CLAUDE_MODEL: 'us.anthropic.claude-sonnet-4-20250514-v1:0',
}

Environment Variables

Create a .env file in the project root (automatically loaded):

# Core settings
UDB_DATA_DIR=~/.udb          # Data directory
OLLAMA_URL=http://127.0.0.1:11434
OLLAMA_MODEL=nomic-embed-text
CLAUDE_MODEL=us.anthropic.claude-sonnet-4-20250514-v1:0

# Atlassian/Confluence (optional, for ingesting Confluence pages)
[email protected]
ATLASSIAN_API_TOKEN=your-api-token

To get an Atlassian API token:

Go to https://id.atlassian.com/manage-profile/security/api-tokens
Create a new API token
Add it to your .env file along with your Atlassian email

Google Docs Setup

To ingest Google Docs, you need to set up OAuth 2.0 credentials:

Go to Google Cloud Console
Create or select a project
Enable the Google Docs API (APIs & Services → Enable APIs)
Create OAuth credentials (APIs & Services → Credentials → Create Credentials → OAuth client ID)
- Configure OAuth consent screen if prompted (External, add your email as test user)
- Application type: Desktop app
Download the credentials JSON and save as ~/.udb/credentials.json

On first use, UDB will open your browser for Google authorization. Tokens are stored in ~/.udb/google-tokens.json and automatically refresh

Data Storage

All data is stored locally in ~/.udb/:

~/.udb/
├── kb.db              # SQLite database
├── kb.db-shm          # (if WAL mode enabled)
├── kb.db-wal          # (if WAL mode enabled)
├── credentials.json   # Google OAuth credentials (you create this)
├── google-tokens.json # Google OAuth tokens (auto-generated)
└── locks/             # Concurrency lock files

Database Schema

kb_sources - Original content

id, url, title, source_type, summary, raw_content,
content_hash (UNIQUE), tags, created_at, updated_at

kb_chunks - Chunked content with embeddings

id, source_id (FK), chunk_index, content,
embedding (BLOB), created_at

kb_chunks_vss - Vector search index

embedding(768)  -- sqlite-vss virtual table

Requirements

Node.js 18+
Ollama running locally with nomic-embed-text model
Claude CLI authenticated (for the chat interface)
yt-dlp (optional, for YouTube transcripts)

Troubleshooting

"Ollama not available"

Ollama is required for generating embeddings (semantic search).

# Install Ollama
brew install ollama   # macOS
# OR download from https://ollama.ai/download

# Start Ollama
ollama serve

# Pull the embedding model
ollama pull nomic-embed-text

# Verify it's running
curl http://127.0.0.1:11434/api/tags

"yt-dlp not installed"

YouTube video ingestion requires yt-dlp:

# Install yt-dlp
brew install yt-dlp   # macOS
pip install yt-dlp    # any platform

# Verify installation
yt-dlp --version

"sqlite-vss extension failed to load"

The native module may need rebuilding:

npm rebuild

Search returns no results

Check if Ollama is running
Verify content was chunked: sqlite3 ~/.udb/kb.db "SELECT COUNT(*) FROM kb_chunks;"
Content may be below minimum chunk size (50 chars)

Claude authentication errors

Ensure Claude CLI is authenticated:

claude --version
# If not logged in, authenticate first

Development

# Build
npm run build

# Run in development
npm run dev

# Type check
npx tsc --noEmit

License

MIT