@gmickel/gno

v0.24.0

Published

8 days ago

Local semantic search for your documents. Index Markdown, PDF, and Office files with hybrid BM25 + vector search.

0High
0Medium
0Low

gmickel

embeddings llm local-first mcp rag search semantic-search vector-search

GNO

Your Local Second Brain: Index, search, and synthesize your entire digital life.

ClawdHub: GNO skills bundled for Clawdbot — clawdhub.com/gmickel/gno

GNO

GNO is a local knowledge engine that turns your documents into a searchable, connected knowledge graph. Index notes, code, PDFs, and Office docs. Get hybrid search, AI answers with citations, and wiki-style note linking—all 100% offline.

What's New in v0.24

Structured Query Documents: first-class multi-line query syntax using term:, intent:, and hyde:
Cross-Surface Rollout: works across CLI, API, MCP, SDK, and Web Search/Ask
Portable Retrieval Prompts: save/share advanced retrieval intent as one text payload instead of repeated flags or JSON arrays

v0.23

SDK / Library Mode: package-root importable SDK with createGnoClient(...) for direct retrieval, document access, and indexing flows
Inline Config Support: embed GNO in another app without writing YAML config files
Programmatic Indexing: call update, embed, and index directly from Bun/TypeScript
Docs & Website: dedicated SDK guide, feature page, homepage section, and architecture docs

v0.22

Promoted Slim Retrieval Model: published slim-retrieval-v1 on Hugging Face for direct hf: installation in GNO
Fine-Tuning Workflow: local MLX LoRA training, portable GGUF export, automatic checkpoint selection, promotion bundles, and repeatable benchmark comparisons
Autonomous Search Harness: bounded candidate search with early-stop guards, repeated incumbent confirmation, and promotion targets
Public Docs & Site: fine-tuned model docs and feature pages now point at the published HF model and the slim-tuned preset

Fine-Tuned Model Quick Use

models:
  activePreset: slim-tuned
  presets:
    - id: slim-tuned
      name: GNO Slim Retrieval v1
      embed: hf:gpustack/bge-m3-GGUF/bge-m3-Q4_K_M.gguf
      rerank: hf:ggml-org/Qwen3-Reranker-0.6B-Q8_0-GGUF/qwen3-reranker-0.6b-q8_0.gguf
      gen: hf:guiltylemon/gno-expansion-slim-retrieval-v1/gno-expansion-auto-entity-lock-default-mix-lr95-f16.gguf

Then:

gno models use slim-tuned
gno models pull --gen
gno query "ECONNREFUSED 127.0.0.1:5432" --thorough

Full guide: Fine-Tuned Models · Feature page

What's New in v0.21

Ask CLI Query Modes: gno ask now accepts repeatable --query-mode term|intent|hyde entries, matching the existing Ask API and Web controls

v0.20

Improved Model Init Fallbacks: upgraded node-llama-cpp to 3.17.1 and switched to build: "autoAttempt" for better backend selection/fallback behavior

v0.19

Exclusion Filters: explicit exclude controls across CLI, API, Web, and MCP to hard-prune unwanted docs by title/path/body text
Ask Query-Mode Parity: Ask now supports structured term / intent / hyde controls in both API and Web UI

v0.18

Intent Steering: optional intent control for ambiguous queries across CLI, API, Web, and MCP query flows
Rerank Controls: candidateLimit lets you tune rerank cost vs. recall on slower or memory-constrained machines
Stability: query expansion now uses a bounded configurable context size (models.expandContextSize, default 2048)
Rerank Efficiency: identical chunk texts are deduplicated before scoring and expanded back out deterministically

v0.17

Structured Query Modes: term, intent, and hyde controls across CLI, API, MCP, and Web
Temporal Retrieval Upgrades: since/until, date-range parsing, and recency sorting with frontmatter-date fallback
Web Retrieval UX Polish: richer advanced controls in Search and Ask (collection/date/category/author/tags + query modes)
Metadata-Aware Retrieval: ingestion now materializes document metadata/date fields for better filtering and ranking
Migration Reliability: SQLite-compatible migration path for existing indexes (including older SQLite engines)

v0.15

HTTP Backends: Offload embedding, reranking, and generation to remote GPU servers
Simple URI config: http://host:port/path#modelname
Works with llama-server, Ollama, LocalAI, vLLM
Run GNO on lightweight machines while GPU inference runs on your network

v0.13

Knowledge Graph: Interactive force-directed visualization of document connections
Graph with Similarity: See semantic similarity as golden edges (not just wiki/markdown links)
CLI: gno graph command with collection filtering and similarity options
Web UI: /graph page with zoom, pan, collection filter, similarity toggle
MCP: gno_graph tool for AI agents to explore document relationships
REST API: /api/graph endpoint with full query parameters

v0.12

Note Linking: Wiki-style [[links]], backlinks, and AI-powered related notes
Tag System: Filter searches by frontmatter tags with --tags-any/--tags-all
Web UI: Outgoing links panel, backlinks panel, related notes sidebar
CLI: gno links, gno backlinks, gno similar commands
MCP: gno_links, gno_backlinks, gno_similar tools

Quick Start

gno init ~/notes --name notes    # Point at your docs
gno index                        # Build search index
gno query "auth best practices"  # Hybrid search
gno ask "summarize the API" --answer  # AI answer with citations

GNO CLI

Installation

Install GNO

Requires Bun >= 1.0.0.

bun install -g @gmickel/gno

macOS: Vector search requires Homebrew SQLite:

brew install sqlite3

Verify everything works:

gno doctor

Connect to AI Agents

MCP Server (Claude Desktop, Cursor, Zed, etc.)

One command to add GNO to your AI assistant:

gno mcp install                      # Claude Desktop (default)
gno mcp install --target cursor      # Cursor
gno mcp install --target claude-code # Claude Code CLI
gno mcp install --target zed         # Zed
gno mcp install --target windsurf    # Windsurf
gno mcp install --target codex       # OpenAI Codex CLI
gno mcp install --target opencode    # OpenCode
gno mcp install --target amp         # Amp
gno mcp install --target lmstudio    # LM Studio
gno mcp install --target librechat   # LibreChat

Check status: gno mcp status

Skills (Claude Code, Codex, OpenCode)

Skills integrate via CLI with no MCP overhead:

gno skill install --scope user       # User-wide
gno skill install --target codex     # Codex
gno skill install --target all       # Both Claude + Codex

Full setup guide: MCP Integration · CLI Reference

SDK

Embed GNO directly in another Bun or TypeScript app. No CLI subprocesses. No local server required.

import { createDefaultConfig, createGnoClient } from "@gmickel/gno";

const config = createDefaultConfig();
config.collections = [
  {
    name: "notes",
    path: "/Users/me/notes",
    pattern: "**/*",
    include: [],
    exclude: [],
  },
];

const client = await createGnoClient({
  config,
  dbPath: "/tmp/gno-sdk.sqlite",
});

await client.index({ noEmbed: true });

const results = await client.query("JWT token flow", {
  noExpand: true,
  noRerank: true,
});

console.log(results.results[0]?.uri);
await client.close();

Core SDK surface:

createGnoClient({ config | configPath, dbPath? })
search, vsearch, query, ask
get, multiGet, list, status
update, embed, index
close

Install in an app:

bun add @gmickel/gno

Full guide: SDK docs

Search Modes

BM25 indexes full documents (not chunks) with Snowball stemming, so "running" matches "run". Vector embeds chunks with document titles for context awareness. All retrieval modes also support metadata filters: --since, --until, --category, --author, --tags-all, --tags-any.

gno search "handleAuth"              # Find exact matches
gno vsearch "error handling patterns" # Semantic similarity
gno query "database optimization"    # Full pipeline
gno query "meeting decisions" --since "last month" --category "meeting,notes" --author "gordon"
gno query "performance" --intent "web performance and latency"
gno query "performance" --exclude "reviews,hiring"
gno ask "what did we decide" --answer # AI synthesis

Output formats: --json, --files, --csv, --md, --xml

Retrieval V2 Controls

Existing query calls still work. Retrieval v2 adds optional structured intent control and deeper explain output.

# Existing call (unchanged)
gno query "auth flow" --thorough

# Structured retrieval intent
gno query "auth flow" \
  --intent "web authentication and token lifecycle" \
  --candidate-limit 12 \
  --query-mode term:"jwt refresh token -oauth1" \
  --query-mode intent:"how refresh token rotation works" \
  --query-mode hyde:"Refresh tokens rotate on each use and previous tokens are revoked." \
  --explain

# Multi-line structured query document
gno query $'auth flow\nterm: "refresh token" -oauth1\nintent: how refresh token rotation works\nhyde: Refresh tokens rotate on each use and previous tokens are revoked.' --fast

Modes: term (BM25-focused), intent (semantic-focused), hyde (single hypothetical passage)
Explain includes stage timings, fallback/cache counters, and per-result score components
gno ask --json includes meta.answerContext for adaptive source selection traces
Search and Ask web text boxes also accept multi-line structured query documents with Shift+Enter

Agent Integration

Give your local LLM agents a long-term memory. GNO integrates as a Claude Code skill or MCP server, allowing agents to search, read, and cite your local files.

Skills

Skills add GNO search to Claude Code/Codex without MCP protocol overhead:

gno skill install --scope user

GNO Skill in Claude Code

Then ask your agent: "Search my notes for the auth discussion"

Skill setup guide →

MCP Server

Connect GNO to Claude Desktop, Cursor, Raycast, and more:

GNO MCP

GNO exposes tools via Model Context Protocol:

Design: MCP tools are retrieval-only. Your AI assistant (Claude, GPT-4) synthesizes answers from retrieved context. Best retrieval (GNO) + best reasoning (your LLM).

MCP setup guide →

Web UI

Visual dashboard for search, browsing, editing, and AI answers. Right in your browser.

gno serve                    # Start on port 3000
gno serve --port 8080        # Custom port

GNO Web UI

Open http://localhost:3000 to:

Search: BM25, vector, or hybrid modes with visual results
Browse: Paginated document list, filter by collection
Edit: Create, edit, and delete documents with live preview
Ask: AI-powered Q&A with citations
Manage Collections: Add, remove, and re-index collections
Switch presets: Change models live without restart

Search

Three retrieval modes: BM25 (keyword), Vector (semantic), or Hybrid (best of both). Adjust search depth for speed vs thoroughness.

Document Editing

GNO Document Editor

Full-featured markdown editor with:

Document Viewer

GNO Document Viewer

View documents with full context: outgoing links, backlinks, and AI-powered related notes sidebar.

Knowledge Graph

GNO Knowledge Graph

Interactive visualization of document connections. Wiki links, markdown links, and optional similarity edges rendered as a navigable constellation.

Collections Management

GNO Collections

Add collections with folder path input
View document count, chunk count, embedding status
Re-index individual collections
Remove collections (documents preserved)

AI Answers

GNO AI Answers

Ask questions in natural language. GNO searches your documents and synthesizes answers with inline citations linking to sources.

Everything runs locally. No cloud, no accounts, no data leaving your machine.

Detailed docs: Web UI Guide

REST API

Programmatic access to all GNO features via HTTP.

# Hybrid search
curl -X POST http://localhost:3000/api/query \
  -H "Content-Type: application/json" \
  -d '{"query": "authentication patterns", "limit": 10}'

# AI answer
curl -X POST http://localhost:3000/api/ask \
  -H "Content-Type: application/json" \
  -d '{"query": "What is our deployment process?"}'

# Index status
curl http://localhost:3000/api/status

No authentication. No rate limits. Build custom tools, automate workflows, integrate with any language.

Full reference: API Documentation

How It Works

graph TD
    A[User Query] --> B(Query Expansion)
    B --> C{Lexical Variants}
    B --> D{Semantic Variants}
    B --> E{HyDE Passage}

    C --> G(BM25 Search)
    D --> H(Vector Search)
    E --> H
    A --> G
    A --> H

    G --> I(Ranked Results)
    H --> J(Ranked Results)
    I --> K{RRF Fusion}
    J --> K

    K --> L(Top 20 Candidates)
    L --> M(Cross-Encoder Rerank)
    M --> N[Final Results]

Strong Signal Check: Skip expansion if BM25 has confident match (saves 1-3s)
Query Expansion: LLM generates lexical variants, semantic rephrases, and a HyDE passage
Parallel Retrieval: Document-level BM25 + chunk-level vector search on all variants
Fusion: RRF with 2× weight for original query, tiered bonus for top ranks
Reranking: Qwen3-Reranker scores best chunk per document (4K), blended with fusion

Deep dive: How Search Works

Features

Local Models

Models auto-download on first use to ~/.cache/gno/models/. For deterministic startup, set GNO_NO_AUTO_DOWNLOAD=1 and use gno models pull explicitly. Alternatively, offload to a GPU server on your network using HTTP backends.

Model Presets

gno models use slim
gno models pull --all  # Optional: pre-download models (auto-downloads on first use)

Fine-Tuned Models

GNO now has a published promoted retrieval model for the default slim path:

model repo: guiltylemon/gno-expansion-slim-retrieval-v1
recommended preset id: slim-tuned
runtime URI:
- hf:guiltylemon/gno-expansion-slim-retrieval-v1/gno-expansion-auto-entity-lock-default-mix-lr95-f16.gguf

Use it when you want the tuned retrieval expansion path immediately, without running local fine-tuning yourself.

For private/internal products, use the same workflow but keep the final GGUF private and point gen: at a file: URI instead of publishing to Hugging Face.

See:

HTTP Backends (Remote GPU)

Offload inference to a GPU server on your network:

# ~/.config/gno/config.yaml
models:
  activePreset: remote-gpu
  presets:
    - id: remote-gpu
      name: Remote GPU Server
      embed: "http://192.168.1.100:8081/v1/embeddings#bge-m3"
      rerank: "http://192.168.1.100:8082/v1/completions#reranker"
      gen: "http://192.168.1.100:8083/v1/chat/completions#qwen3-4b"

Works with llama-server, Ollama, LocalAI, vLLM, or any OpenAI-compatible server.

Configuration: Model Setup

Architecture

┌─────────────────────────────────────────────────┐
│            GNO CLI / MCP / Web UI / API         │
├─────────────────────────────────────────────────┤
│  Ports: Converter, Store, Embedding, Rerank    │
├─────────────────────────────────────────────────┤
│  Adapters: SQLite, FTS5, sqlite-vec, llama-cpp │
├─────────────────────────────────────────────────┤
│  Core: Identity, Mirrors, Chunking, Retrieval  │
└─────────────────────────────────────────────────┘

Details: Architecture

Development

git clone https://github.com/gmickel/gno.git && cd gno
bun install
bun test
bun run lint && bun run typecheck

Contributing: CONTRIBUTING.md

Evals and Benchmark Deltas

Use retrieval benchmark commands to track quality and latency over time:

bun run eval:hybrid
bun run eval:hybrid:baseline
bun run eval:hybrid:delta

Benchmark guide: evals/README.md
Latest baseline snapshot: evals/fixtures/hybrid-baseline/latest.json

License

MIT