npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

searchsocket

v0.5.0

Published

Semantic site search and MCP retrieval for SvelteKit static sites

Readme

SearchSocket

Semantic site search and MCP retrieval for SvelteKit content projects.

Requirements: Node.js >= 20

Features

  • Embeddings: Jina AI jina-embeddings-v5-text-small with task-specific LoRA adapters (configurable)
  • Vector Backend: Turso/libSQL with vector search (local file DB for development, remote for production)
  • Rerank: Jina jina-reranker-v3 enabled by default — same API key
  • Page Aggregation: Group results by page with score-weighted chunk decay
  • Meta Extraction: Automatically extracts <meta name="description"> and <meta name="keywords"> for improved relevance
  • SvelteKit Integrations:
    • searchsocketHandle() for POST /api/search endpoint
    • searchsocketVitePlugin() for build-triggered indexing
  • Client Library: createSearchClient() for browser-side search, buildResultUrl() for scroll-to-section links
  • Scroll-to-Text: searchsocketScrollToText() auto-scrolls to matching sections on navigation
  • MCP Server: Model Context Protocol tools for search and page retrieval

Install

# pnpm
pnpm add -D searchsocket

# npm
npm install -D searchsocket

SearchSocket is typically a dev dependency for CLI indexing. If you use searchsocketHandle() at runtime (e.g., in a Node server adapter), add it as a regular dependency instead.

Quickstart

1. Initialize

pnpm searchsocket init

This creates:

  • searchsocket.config.ts — minimal config file
  • .searchsocket/ — state directory (added to .gitignore)

2. Configure

Minimal config (searchsocket.config.ts):

export default {
  embeddings: { apiKeyEnv: "JINA_API_KEY" }
};

That's it! Turso defaults work out of the box:

  • Development: Uses local file DB at .searchsocket/vectors.db
  • Production: Set TURSO_DATABASE_URL and TURSO_AUTH_TOKEN to use remote Turso

3. Add SvelteKit API Hook

Create or update src/hooks.server.ts:

import { searchsocketHandle } from "searchsocket/sveltekit";

export const handle = searchsocketHandle();

This exposes POST /api/search with automatic scope resolution.

4. Set Environment Variables

The CLI automatically loads .env from the working directory on startup, so your existing .env file works out of the box — no wrapper scripts or shell exports needed.

Development (.env):

JINA_API_KEY=jina_...

Production (add these for remote Turso):

JINA_API_KEY=jina_...
TURSO_DATABASE_URL=libsql://your-db.turso.io
TURSO_AUTH_TOKEN=eyJ...

5. Index Your Content

pnpm searchsocket index --changed-only

SearchSocket auto-detects the source mode based on your config:

  • static-output (default): Reads prerendered HTML from build/
  • build: Discovers routes from SvelteKit build manifest and renders via preview server
  • crawl: Fetches pages from a running HTTP server
  • content-files: Reads markdown/svelte source files directly

The indexing pipeline:

  • Extracts content from <main> (configurable), including <meta> description and keywords
  • Chunks text with semantic heading boundaries
  • Prepends page title to each chunk for embedding context
  • Generates a synthetic summary chunk per page for identity matching
  • Generates embeddings via Jina AI (with task-specific LoRA adapters for indexing vs search)
  • Stores vectors in Turso/libSQL with cosine similarity index

6. Query

Via API:

curl -X POST http://localhost:5173/api/search \
  -H "content-type: application/json" \
  -d '{"q":"getting started","topK":5,"groupBy":"page"}'

Via client library:

import { createSearchClient } from "searchsocket/client";

const client = createSearchClient(); // defaults to /api/search
const response = await client.search({
  q: "getting started",
  topK: 5,
  groupBy: "page",
  pathPrefix: "/docs"
});

Via CLI:

pnpm searchsocket search --q "getting started" --top-k 5 --path-prefix /docs

Response (with groupBy: "page", the default):

{
  "q": "getting started",
  "scope": "main",
  "results": [
    {
      "url": "/docs/intro",
      "title": "Getting Started",
      "sectionTitle": "Installation",
      "snippet": "Install SearchSocket with pnpm add searchsocket...",
      "score": 0.89,
      "routeFile": "src/routes/docs/intro/+page.svelte",
      "chunks": [
        {
          "sectionTitle": "Installation",
          "snippet": "Install SearchSocket with pnpm add searchsocket...",
          "headingPath": ["Getting Started", "Installation"],
          "score": 0.89
        },
        {
          "sectionTitle": "Configuration",
          "snippet": "Create searchsocket.config.ts with your API key...",
          "headingPath": ["Getting Started", "Configuration"],
          "score": 0.74
        }
      ]
    }
  ],
  "meta": {
    "timingsMs": { "embed": 120, "vector": 15, "rerank": 0, "total": 135 },
    "usedRerank": false,
    "modelId": "jina-embeddings-v5-text-small"
  }
}

The chunks array appears when a page has multiple matching chunks above the minChunkScoreRatio threshold. Use groupBy: "chunk" for flat per-chunk results without page aggregation.

Source Modes

SearchSocket supports four source modes for loading pages to index.

static-output (default)

Reads prerendered HTML files from SvelteKit's build output directory.

export default {
  source: {
    mode: "static-output",
    staticOutputDir: "build"
  }
};

Best for: Sites with fully prerendered pages. Run vite build first, then index.

build

Discovers routes automatically from SvelteKit's build manifest and renders them via an ephemeral vite preview server. No manual route configuration needed.

export default {
  source: {
    build: {
      outputDir: ".svelte-kit/output",   // default
      previewTimeout: 30000,             // ms to wait for server (default)
      exclude: ["/api/*", "/admin/*"],   // glob patterns to skip
      paramValues: {                     // values for dynamic routes
        "/blog/[slug]": ["hello-world", "getting-started"],
        "/docs/[category]/[page]": ["guides/quickstart", "api/search"]
      },
      discover: true,                    // crawl internal links to find pages (default: false)
      seedUrls: ["/"],                   // starting URLs for discovery
      maxPages: 200,                     // max pages to discover (default: 200)
      maxDepth: 5                        // max link depth from seed URLs (default: 5)
    }
  }
};

Best for: CI/CD pipelines. Enables vite build && searchsocket index with zero route configuration.

How it works:

  1. Parses .svelte-kit/output/server/manifest-full.js to discover all page routes
  2. Expands dynamic routes using paramValues (skips dynamic routes without values)
  3. Starts an ephemeral vite preview server on a random port
  4. Fetches all routes concurrently for SSR-rendered HTML
  5. Provides exact route-to-file mapping (no heuristic matching needed)
  6. Shuts down the preview server

Dynamic routes: Each key in paramValues maps to a route ID (e.g., /blog/[slug]) or its URL equivalent. Each value in the array replaces all [param] segments in the URL. Routes with layout groups like /(app)/blog/[slug] also match the URL key /blog/[slug].

Link discovery: Enable discover: true to automatically find pages by crawling internal links from seedUrls. This is useful when dynamic routes have many parameter values that are impractical to enumerate. The crawler respects maxPages and maxDepth limits and only follows links within the same origin.

crawl

Fetches pages from a running HTTP server.

export default {
  source: {
    crawl: {
      baseUrl: "http://localhost:4173",
      routes: ["/", "/docs", "/blog"],  // explicit routes
      sitemapUrl: "https://example.com/sitemap.xml"  // or discover via sitemap
    }
  }
};

If routes is omitted and no sitemapUrl is set, defaults to crawling ["/"] only.

content-files

Reads markdown and svelte source files directly, without building or serving.

export default {
  source: {
    contentFiles: {
      globs: ["src/routes/**/*.md", "content/**/*.md"],
      baseDir: "."
    }
  }
};

Client Library

SearchSocket exports a lightweight client for browser-side search:

import { createSearchClient } from "searchsocket/client";

const client = createSearchClient({
  endpoint: "/api/search",  // default
  fetchImpl: fetch           // default; override for SSR or testing
});

const response = await client.search({
  q: "deployment guide",
  topK: 8,
  groupBy: "page",
  pathPrefix: "/docs",
  tags: ["guide"],
  rerank: true
});

for (const result of response.results) {
  console.log(result.url, result.title, result.score);
  if (result.chunks) {
    for (const chunk of result.chunks) {
      console.log("  ", chunk.sectionTitle, chunk.score);
    }
  }
}

Scroll-to-Text Navigation

When a visitor clicks a search result, SearchSocket can automatically scroll them to the relevant section on the destination page. This uses two utilities:

buildResultUrl(result)

Builds a URL from a search result that includes:

  • A _ssk query parameter for SvelteKit client-side navigation (read by searchsocketScrollToText)
  • A Text Fragment (#:~:text=) for native browser scroll-to-text on full page loads (Chrome 80+, Safari 16.1+, Firefox 131+)

Import from searchsocket/client:

import { createSearchClient, buildResultUrl } from "searchsocket/client";

const client = createSearchClient();
const { results } = await client.search({ q: "installation" });

// Use in your search UI
for (const result of results) {
  const href = buildResultUrl(result);
  // "/docs/getting-started?_ssk=Installation#:~:text=Installation"
}

If the result has no sectionTitle, the original URL is returned unchanged.

searchsocketScrollToText

A SvelteKit afterNavigate hook that reads the _ssk parameter and scrolls the matching heading into view. Add it to your root layout:

<!-- src/routes/+layout.svelte -->
<script>
  import { afterNavigate } from '$app/navigation';
  import { searchsocketScrollToText } from 'searchsocket/sveltekit';

  afterNavigate(searchsocketScrollToText);
</script>

The hook:

  • Matches headings (h1–h6) case-insensitively with whitespace normalization
  • Falls back to a broader text node search if no heading matches
  • Scrolls smoothly to the first match
  • Is a silent no-op when _ssk is absent or no match is found

Vector Backend: Turso/libSQL

SearchSocket uses Turso (libSQL) as its single vector backend, providing a unified experience across development and production.

Local Development

By default, SearchSocket uses a local file database:

  • Path: .searchsocket/vectors.db (configurable)
  • No account or API keys needed
  • Full vector search with libsql_vector_idx and vector_top_k
  • Perfect for local development and CI testing

Production (Remote Turso)

For production, switch to Turso's hosted service:

  1. Sign up for Turso (free tier available):

    # Install Turso CLI
    brew install tursodatabase/tap/turso
    
    # Sign up
    turso auth signup
    
    # Create a database
    turso db create searchsocket-prod
    
    # Get credentials
    turso db show searchsocket-prod --url
    turso db tokens create searchsocket-prod
  2. Set environment variables:

    TURSO_DATABASE_URL=libsql://searchsocket-prod-xxx.turso.io
    TURSO_AUTH_TOKEN=eyJhbGc...
  3. Index normally — SearchSocket auto-detects the remote URL and uses it.

Direct Credential Passing

Instead of environment variables, you can pass credentials directly in the config. This is useful for serverless deployments or multi-tenant setups:

export default {
  embeddings: {
    apiKey: "jina_..."  // direct API key (takes precedence over apiKeyEnv)
  },
  vector: {
    turso: {
      url: "libsql://my-db.turso.io",       // direct URL
      authToken: "eyJhbGc..."               // direct auth token
    }
  }
};

Direct values take precedence over environment variable lookups (apiKeyEnv, urlEnv, authTokenEnv).

Dimension Mismatch Auto-Recovery

When switching embedding models (e.g., from a 1536-dim model to Jina's 1024-dim), the vector dimension changes. SearchSocket automatically detects this and recreates the chunks table with the new dimension — no manual intervention needed. A full re-index (--force) is still required after switching models.

Why Turso?

  • Single backend — one unified Turso/libSQL store for vectors, metadata, and state
  • Local-first development — zero external dependencies for local dev
  • Production-ready — same codebase scales to remote hosted DB
  • Cost-effective — Turso free tier includes 9GB storage, 500M row reads/month
  • Vector search nativeF32_BLOB vectors, cosine similarity index, vector_top_k ANN queries

Serverless Deployment (Vercel, Netlify, etc.)

SearchSocket works on serverless platforms with a few adjustments:

Requirements

  1. Remote Turso database — local SQLite is not available in serverless (no persistent filesystem). Set TURSO_DATABASE_URL and TURSO_AUTH_TOKEN as platform environment variables.

  2. Inline config via rawConfig — the default config loader uses jiti to import searchsocket.config.ts from disk, which isn't bundled in serverless. Use rawConfig to pass config inline:

// hooks.server.ts (Vercel / Netlify)
import { searchsocketHandle } from "searchsocket/sveltekit";

export const handle = searchsocketHandle({
  rawConfig: {
    project: { id: "my-docs-site" },
    source: { mode: "static-output" },
    embeddings: { apiKeyEnv: "JINA_API_KEY" },
  }
});
  1. Environment variables — set these on your platform dashboard:
    • JINA_API_KEY
    • TURSO_DATABASE_URL
    • TURSO_AUTH_TOKEN

Rate Limiting

The built-in InMemoryRateLimiter auto-disables on serverless platforms (it resets on every cold start). Use your platform's WAF or edge rate-limiting instead.

What Only Applies to Indexing

The following features are only used during searchsocket index (CLI), not the search handler:

  • ensureStateDirs — creates .searchsocket/ state directories
  • Local SQLite fallback — only needed when TURSO_DATABASE_URL is not set

Adapter Guidance

| Platform | Adapter | Notes | |----------|---------|-------| | Vercel | adapter-auto (default) | Serverless — use rawConfig + remote Turso | | Netlify | adapter-netlify | Serverless — same as Vercel | | VPS / Docker | adapter-node | Long-lived process — no limitations, local SQLite works |

Embeddings: Jina AI

SearchSocket uses Jina AI's embedding models to convert text into semantic vectors. A single JINA_API_KEY powers both embeddings and optional reranking.

Default Model

  • Model: jina-embeddings-v5-text-small
  • Dimensions: 1024 (default)
  • Cost: ~$0.00005 per 1K tokens
  • Task adapters: Uses retrieval.passage for indexing, retrieval.query for search queries (LoRA task-specific adapters for better retrieval quality)

How It Works

  1. Chunking: Text is split into semantic chunks (default 2200 chars, 200 overlap)
  2. Title Prepend: Page title is prepended to each chunk for better context (chunking.prependTitle, default: true)
  3. Summary Chunk: A synthetic identity chunk is generated per page with title, URL, and first paragraph (chunking.pageSummaryChunk, default: true)
  4. Embedding: Each chunk is sent to Jina's embedding API with the retrieval.passage task adapter
  5. Batching: Requests batched (64 texts per request) for efficiency
  6. Storage: Vectors stored in Turso with metadata (URL, title, tags, depth, etc.)

Cost Estimation

Use --dry-run to preview costs:

pnpm searchsocket index --dry-run

Output:

pages processed: 42
chunks total: 156
chunks changed: 156
embeddings created: 156
estimated tokens: 32,400
estimated cost (USD): $0.000648

Reranking

Since embeddings and reranking share the same Jina API key, enabling reranking is one boolean:

export default {
  embeddings: { apiKeyEnv: "JINA_API_KEY" },
  rerank: { enabled: true }
};

Note: Changing the model after indexing requires re-indexing with --force.

Search & Ranking

Page Aggregation

By default (groupBy: "page"), SearchSocket groups chunk results by page URL and computes a page-level score:

  1. The top chunk score becomes the base page score
  2. Additional matching chunks contribute a decaying bonus: chunk_score * decay^i
  3. Optional per-URL page weights are applied multiplicatively

Configure aggregation behavior:

export default {
  ranking: {
    minScore: 0,                // minimum absolute score to include in results (default: 0, disabled)
    aggregationCap: 5,          // max chunks contributing to page score (default: 5)
    aggregationDecay: 0.5,      // decay factor for additional chunks (default: 0.5)
    minChunkScoreRatio: 0.5,    // threshold for sub-chunks in results (default: 0.5)
    pageWeights: {              // per-URL score multipliers
      "/": 1.1,
      "/docs": 1.15,
      "/download": 1.2
    },
    weights: {
      aggregation: 0.1,        // weight of aggregation bonus (default: 0.1)
      incomingLinks: 0.05,     // incoming link boost weight (default: 0.05)
      depth: 0.03,             // URL depth boost weight (default: 0.03)
      rerank: 1.0              // reranker score weight (default: 1.0)
    }
  }
};

pageWeights supports exact URL matches and prefix matching. A weight of 1.15 on "/docs" boosts all pages under /docs/ by 15%. Use gentle values (1.05-1.2x) since they compound with aggregation.

minScore filters out low-relevance results before they reach the client. Set to a value like 0.3 to remove noise. In page mode, pages below the threshold are dropped; in chunk mode, individual chunks are filtered. Default is 0 (disabled).

Chunk Mode

Use groupBy: "chunk" for flat per-chunk results without page aggregation:

curl -X POST http://localhost:5173/api/search \
  -H "content-type: application/json" \
  -d '{"q":"vector search","topK":10,"groupBy":"chunk"}'

Build-Triggered Indexing

Automatically index after each SvelteKit build.

vite.config.ts or svelte.config.js:

import { searchsocketVitePlugin } from "searchsocket/sveltekit";

export default {
  plugins: [
    svelteKitPlugin(),
    searchsocketVitePlugin({
      enabled: true,        // or check process.env.SEARCHSOCKET_AUTO_INDEX
      changedOnly: true,    // incremental indexing (faster)
      verbose: false
    })
  ]
};

Environment control:

# Enable via env var
SEARCHSOCKET_AUTO_INDEX=1 pnpm build

# Disable via env var
SEARCHSOCKET_DISABLE_AUTO_INDEX=1 pnpm build

Commands

searchsocket init

Initialize config and state directory.

pnpm searchsocket init

searchsocket index

Index content into vectors.

# Incremental (only changed chunks)
pnpm searchsocket index --changed-only

# Full re-index
pnpm searchsocket index --force

# Preview cost without indexing
pnpm searchsocket index --dry-run

# Override source mode
pnpm searchsocket index --source build

# Limit for testing
pnpm searchsocket index --max-pages 10 --max-chunks 50

# Override scope
pnpm searchsocket index --scope staging

# Verbose output
pnpm searchsocket index --verbose

searchsocket status

Show indexing status, scope, and vector health.

pnpm searchsocket status

# Output:
# project: my-site
# resolved scope: main
# embedding model: jina-embeddings-v5-text-small
# vector backend: turso/libsql (local (.searchsocket/vectors.db))
# vector health: ok
# last indexed (main): 2025-02-23T10:30:00Z
# tracked chunks: 156
# last estimated tokens: 32,400
# last estimated cost: $0.000648

searchsocket dev

Watch for file changes and auto-reindex.

pnpm searchsocket dev

# With MCP server
pnpm searchsocket dev --mcp --mcp-port 3338

Watches:

  • src/routes/** (route files)
  • build/ (if static-output mode)
  • Build output dir (if build mode)
  • Content files (if content-files mode)
  • searchsocket.config.ts (if crawl or build mode)

searchsocket clean

Delete local state and optionally remote vectors.

# Local state only
pnpm searchsocket clean

# Local + remote vectors
pnpm searchsocket clean --remote --scope staging

searchsocket prune

Delete stale scopes (e.g., deleted git branches).

# Dry run (shows what would be deleted)
pnpm searchsocket prune --older-than 30d

# Apply deletions
pnpm searchsocket prune --older-than 30d --apply

# Use custom scope list
pnpm searchsocket prune --scopes-file active-branches.txt --apply

searchsocket doctor

Validate config, env vars, and connectivity.

pnpm searchsocket doctor

# Output:
# PASS config parse
# PASS env JINA_API_KEY
# PASS turso/libsql (local file: .searchsocket/vectors.db)
# PASS source: build manifest
# PASS source: vite binary
# PASS embedding provider connectivity
# PASS vector backend connectivity
# PASS vector backend write permission
# PASS state directory writable

searchsocket mcp

Run MCP server for Claude Desktop / other MCP clients.

# stdio transport (default)
pnpm searchsocket mcp

# HTTP transport
pnpm searchsocket mcp --transport http --port 3338

searchsocket search

CLI search for testing.

pnpm searchsocket search --q "turso vector search" --top-k 5 --rerank

MCP (Model Context Protocol)

SearchSocket provides an MCP server for integration with Claude Code, Claude Desktop, and other MCP-compatible AI tools. This gives AI assistants direct access to your indexed site content for semantic search and page retrieval.

Tools

search(query, opts?)

  • Semantic search across indexed content
  • Returns ranked results with URL, title, snippet, score, and routeFile
  • Options: scope, topK (1-100), pathPrefix, tags, groupBy ("page" | "chunk")

get_page(pathOrUrl, opts?)

  • Retrieve full indexed page content as markdown with frontmatter
  • Options: scope

Setup (Claude Code)

Add a .mcp.json file to your project root (safe to commit — no secrets needed since the CLI auto-loads .env):

{
  "mcpServers": {
    "searchsocket": {
      "type": "stdio",
      "command": "npx",
      "args": ["searchsocket", "mcp"],
      "env": {}
    }
  }
}

Restart Claude Code. The search and get_page tools will be available automatically. Verify with:

claude mcp list

Setup (Claude Desktop)

Add to ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "searchsocket": {
      "command": "npx",
      "args": ["searchsocket", "mcp"],
      "cwd": "/path/to/your/project"
    }
  }
}

Restart Claude Desktop. The tools appear in the MCP menu.

HTTP Transport

For non-stdio clients, run the MCP server over HTTP:

npx searchsocket mcp --transport http --port 3338

This starts a stateless server at http://127.0.0.1:3338/mcp. Each POST request creates a fresh server instance with no session persistence.

Environment Variables

The CLI automatically loads .env from the working directory on startup. Existing process.env values take precedence over .env file values. This only applies to CLI commands (searchsocket index, searchsocket mcp, etc.) — library imports like searchsocketHandle() rely on your framework's own .env handling (Vite/SvelteKit).

Required

Jina AI:

  • JINA_API_KEY — Jina AI API key for embeddings and reranking

Optional (Turso)

Remote Turso (production):

  • TURSO_DATABASE_URL — Turso database URL (e.g., libsql://my-db.turso.io)
  • TURSO_AUTH_TOKEN — Turso auth token

If not set, uses local file DB at .searchsocket/vectors.db.

Optional (Scope/Build)

  • SEARCHSOCKET_SCOPE — Override scope (when scope.mode: "env")
  • SEARCHSOCKET_AUTO_INDEX — Enable build-triggered indexing
  • SEARCHSOCKET_DISABLE_AUTO_INDEX — Disable build-triggered indexing

Configuration

Full Example

export default {
  project: {
    id: "my-site",
    baseUrl: "https://example.com"
  },

  scope: {
    mode: "git",           // "fixed" | "git" | "env"
    fixed: "main",
    sanitize: true
  },

  source: {
    mode: "build",         // "static-output" | "crawl" | "content-files" | "build"
    staticOutputDir: "build",
    strictRouteMapping: false,

    // Build mode (recommended for CI/CD)
    build: {
      outputDir: ".svelte-kit/output",
      previewTimeout: 30000,
      exclude: ["/api/*"],
      paramValues: {
        "/blog/[slug]": ["hello-world", "getting-started"]
      },
      discover: false,
      seedUrls: ["/"],
      maxPages: 200,
      maxDepth: 5
    },

    // Crawl mode (alternative)
    crawl: {
      baseUrl: "http://localhost:4173",
      routes: ["/", "/docs", "/blog"],
      sitemapUrl: "https://example.com/sitemap.xml"
    },

    // Content files mode (alternative)
    contentFiles: {
      globs: ["src/routes/**/*.md"],
      baseDir: "."
    }
  },

  extract: {
    mainSelector: "main",
    dropTags: ["header", "nav", "footer", "aside"],
    dropSelectors: [".sidebar", ".toc"],
    ignoreAttr: "data-search-ignore",
    noindexAttr: "data-search-noindex",
    respectRobotsNoindex: true
  },

  chunking: {
    maxChars: 2200,
    overlapChars: 200,
    minChars: 250,
    headingPathDepth: 3,
    dontSplitInside: ["code", "table", "blockquote"],
    prependTitle: true,       // prepend page title to chunk text before embedding
    pageSummaryChunk: true    // generate synthetic identity chunk per page
  },

  embeddings: {
    provider: "jina",
    model: "jina-embeddings-v5-text-small",
    apiKey: "jina_...",          // direct API key (or use apiKeyEnv)
    apiKeyEnv: "JINA_API_KEY",
    batchSize: 64,
    concurrency: 4
  },

  vector: {
    dimension: 1024,  // optional, inferred from first embedding
    turso: {
      url: "libsql://my-db.turso.io",    // direct URL (or use urlEnv)
      authToken: "eyJhbGc...",            // direct token (or use authTokenEnv)
      urlEnv: "TURSO_DATABASE_URL",
      authTokenEnv: "TURSO_AUTH_TOKEN",
      localPath: ".searchsocket/vectors.db"
    }
  },

  rerank: {
    enabled: true,
    topN: 20,
    model: "jina-reranker-v3"
  },

  ranking: {
    enableIncomingLinkBoost: true,
    enableDepthBoost: true,
    pageWeights: {
      "/": 1.1,
      "/docs": 1.15
    },
    minScore: 0,
    aggregationCap: 5,
    aggregationDecay: 0.5,
    minChunkScoreRatio: 0.5,
    weights: {
      incomingLinks: 0.05,
      depth: 0.03,
      rerank: 1.0,
      aggregation: 0.1
    }
  },

  api: {
    path: "/api/search",
    cors: {
      allowOrigins: ["https://example.com"]
    },
    rateLimit: {
      windowMs: 60_000,
      max: 60
    }
  }
};

License

MIT