npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@libragen/core

v0.4.0

Published

Core library for embedding, chunking, vector storage, and hybrid search

Downloads

39

Readme


This package provides the core functionality for building and querying libragen libraries — private, local RAG libraries that ground your AI in real documentation instead of stale training data.

Full documentation →

Installation

npm install @libragen/core

Quick Start

Building a Library

The simplest way to build a library is using the Builder class:

import { Builder } from '@libragen/core';

const builder = new Builder();

// Build from a local directory
const result = await builder.build('./my-docs', {
   name: 'my-library',
   version: '1.0.0',
   description: 'My documentation library',
});

console.log(`Built library at ${result.outputPath}`);
console.log(`Chunks: ${result.stats.chunkCount}`);

// Build from a git repository
const gitResult = await builder.build('https://github.com/user/repo', {
   name: 'repo-docs',
   gitRef: 'main',
   include: ['docs/**/*.md'],
});

Querying a Library

import { Embedder, VectorStore, Searcher } from '@libragen/core';

const embedder = new Embedder();
await embedder.initialize();

const store = new VectorStore('./my-library.libragen');
store.initialize();

const searcher = new Searcher(embedder, store);
const results = await searcher.search({ query: 'authentication' });

console.log(results);

await embedder.dispose();
store.close();

API Reference

Builder

High-level API for building .libragen libraries from source files or git repositories.

import { Builder } from '@libragen/core';

const builder = new Builder();

// Build from local source
const result = await builder.build('./src', {
   name: 'my-library',
   version: '1.0.0',
   description: 'My library description',
   agentDescription: 'Use this library when...',
   exampleQueries: ['how to authenticate', 'error handling'],
   keywords: ['typescript', 'api'],
   programmingLanguages: ['typescript'],
   chunkSize: 1000,
   chunkOverlap: 100,
   include: ['**/*.ts', '**/*.md'],
   exclude: ['**/test/**'],
});

console.log(result.outputPath);           // Absolute path to .libragen file
console.log(result.stats.chunkCount);     // Number of chunks created
console.log(result.stats.sourceCount);    // Number of source files
console.log(result.stats.embedDuration);  // Embedding time in seconds

// Build from git with progress callback
const gitResult = await builder.build('https://github.com/user/repo', {
   gitRef: 'v2.0.0',
   gitRepoAuthToken: process.env.GITHUB_TOKEN,
   license: ['MIT'],
}, (progress) => {
   console.log(`${progress.phase}: ${progress.message} (${progress.progress}%)`);
});

if (gitResult.git) {
   console.log(gitResult.git.commitHash);
   console.log(gitResult.git.detectedLicense);
}

Build options:

interface BuildOptions {
   output?: string;              // Output path for .libragen file
   name?: string;                // Library name
   version?: string;             // Library version (default: '0.1.0')
   contentVersion?: string;      // Source content version
   description?: string;         // Short description
   agentDescription?: string;    // AI agent guidance
   exampleQueries?: string[];    // Example queries
   keywords?: string[];          // Searchable tags
   programmingLanguages?: string[];
   textLanguages?: string[];     // ISO 639-1 codes
   frameworks?: string[];
   chunkSize?: number;           // Target chunk size (default: 1000)
   chunkOverlap?: number;        // Chunk overlap (default: 100)
   include?: string[];           // Glob patterns to include
   exclude?: string[];           // Glob patterns to exclude
   noDefaultExcludes?: boolean;  // Disable default exclusions
   gitRef?: string;              // Git branch/tag/commit
   gitRepoAuthToken?: string;    // Auth token for private repos
   license?: string[];           // SPDX license identifiers
}

Embedder

Generates vector embeddings using Transformers.js with the Xenova/bge-small-en-v1.5 model (384 dimensions).

import { Embedder } from '@libragen/core';

const embedder = new Embedder();
await embedder.initialize();

// Single embedding
const embedding = await embedder.embed('some text');

// Batch embeddings
const embeddings = await embedder.embedBatch(['text1', 'text2']);

// Clean up
await embedder.dispose();

Configuration options:

const embedder = new Embedder({
   model: 'Xenova/bge-small-en-v1.5',  // Model to use
   quantization: 'q8',                  // 'fp32', 'fp16', 'q8', or 'q4'
   batchSize: 32,                       // Batch size for embedBatch
   cacheDir: './.cache',                // Model cache directory
   numThreads: 4,                       // ONNX Runtime threads (auto-detected by default)
});

Performance tuning:

The numThreads option controls ONNX Runtime's intra-op parallelism for CPU inference. By default, it auto-detects your CPU core count and uses cores - 1 threads. This typically provides 2-3x speedup on multi-core systems:

  • Auto-detect (default): Uses Math.max(1, cpuCores - 1)
  • Custom value: Set to specific number (e.g., numThreads: 4)
  • Single-threaded: Set to 1 to disable parallelism

Chunker

Splits text into semantic chunks using recursive character splitting.

import { Chunker } from '@libragen/core';

const chunker = new Chunker({
   chunkSize: 1500,    // Target chunk size in characters
   chunkOverlap: 200,  // Overlap between chunks
});

// Chunk a file
const chunks = await chunker.chunkFile('./src/index.ts');

// Chunk text directly
const chunks = chunker.chunkText(content, 'index.ts');

// Chunk a directory
const chunks = await chunker.chunkDirectory('./src', {
   patterns: ['**/*.ts', '**/*.md'],
   ignore: ['**/node_modules/**'],
});

Chunk structure:

interface Chunk {
   content: string;
   metadata: ChunkMetadata;
}

interface ChunkMetadata {
   sourceFile: string;
   startLine?: number;
   endLine?: number;
   language?: string;
}

VectorStore

SQLite-based storage with sqlite-vec for vector search and FTS5 for keyword search.

import { VectorStore } from '@libragen/core';

const store = new VectorStore('./library.libragen');
store.initialize();

// Add chunks with embeddings
store.addChunks(chunks, embeddings);

// Search
const vectorResults = store.searchVector(queryEmbedding, { k: 10 });
const keywordResults = store.searchKeyword('authentication', { k: 10 });

// Metadata
store.setMetadata({ name: 'my-lib', version: '1.0.0' });
const metadata = store.getMetadata();

store.close();

Searcher

Hybrid search combining vector similarity and BM25 keyword matching with RRF fusion.

import { Searcher, Reranker } from '@libragen/core';

const reranker = new Reranker(); // Optional
const searcher = new Searcher(embedder, store, { reranker });

const results = await searcher.search({
   query: 'how to handle errors',
   k: 10,
   contentVersion: 'v2.0.0',  // Optional filter
   rerank: true,              // Use cross-encoder reranking
});

Search result structure:

interface SearchResult {
   content: string;
   score: number;
   sourceFile: string;
   sourceType: string;
   sourceRef?: string;
   contentVersion?: string;
   startLine?: number;
   endLine?: number;
   language?: string;
}

Reranker

Cross-encoder reranking using mixedbread-ai/mxbai-rerank-xsmall-v1.

import { Reranker } from '@libragen/core';

const reranker = new Reranker();

const reranked = await reranker.rerank(
   'authentication middleware',
   candidates,  // Array of { content, ... }
   { topK: 5 }
);

Library

High-level API for creating and managing library files.

import { Library } from '@libragen/core';

// Create a new library
const library = await Library.create('./lib.libragen', {
   name: 'my-lib',
   version: '1.0.0',
   description: 'Description',
   agentDescription: 'Use this library for...',
   contentVersion: 'v2.0.0',
   contentVersionType: 'semver',
   keywords: ['typescript', 'utilities'],
   programmingLanguages: ['typescript'],
   textLanguages: ['en'],
   exampleQueries: ['how to parse JSON', 'error handling'],
});

// Add content
library.addChunks(chunks, embeddings);

// Close (automatically finalizes if chunks were added)
await library.close();

// Open existing library
const lib = await Library.open('./lib.libragen');
const metadata = lib.getMetadata();
await lib.close();

// Validate a library
const { valid, errors, warnings } = await Library.validate('./lib.libragen');

Configuration

Get default paths for libragen data. These respect the LIBRAGEN_HOME environment variable.

import {
   getLibragenHome,
   getDefaultLibraryDir,
   getDefaultManifestDir,
   getDefaultCollectionConfigDir,
   getModelCacheDir,
   detectProjectLibraryDir,
   hasProjectLibraryDir,
} from '@libragen/core';

// Get the base libragen directory
const home = getLibragenHome();
// macOS: ~/Library/Application Support/libragen
// Windows: %APPDATA%\libragen
// Linux: $XDG_DATA_HOME/libragen (defaults to ~/.local/share/libragen)
// Or: $LIBRAGEN_HOME if set

// Get specific directories
const librariesDir = getDefaultLibraryDir();  // $home/libraries
const manifestDir = getDefaultManifestDir();  // $home
const configDir = getDefaultCollectionConfigDir();  // $home
const modelDir = getModelCacheDir();  // $home/models (or LIBRAGEN_MODEL_CACHE)

// Project-local library detection
const projectDir = detectProjectLibraryDir();  // .libragen/libraries in cwd
const hasLocal = await hasProjectLibraryDir();  // true if directory exists

LibraryManager

Manages installed libraries across multiple locations.

import { LibraryManager } from '@libragen/core';

// Default: auto-detect .libragen/libraries in cwd + global directory
const manager = new LibraryManager();

// Or use explicit paths only (no global, no auto-detection)
const customManager = new LibraryManager({
   paths: ['.libragen/libraries', '/shared/libs'],
});

// Install a library (defaults to global directory)
const installed = await manager.install('./lib.libragen', {
   force: true,   // Overwrite existing
});

// List installed libraries
const libraries = await manager.listInstalled();

// Find a specific library (searches paths in order)
const lib = await manager.find('my-lib');

// Uninstall
await manager.uninstall('my-lib');

Options:

interface LibraryManagerOptions {
   // Explicit paths to use (excludes global and auto-detection)
   paths?: string[];

   // Auto-detect .libragen/libraries in cwd (default: true)
   autoDetect?: boolean;

   // Include global directory (default: true)
   includeGlobal?: boolean;

   // Current working directory for auto-detection
   cwd?: string;
}

CollectionClient

Fetch libraries from remote collections.

import { CollectionClient } from '@libragen/core';

const client = new CollectionClient();
await client.loadConfig();

// Add a collection
await client.addCollection({
   name: 'my-collection',
   url: 'https://example.com/collection.json',
   priority: 10,
});

// Search for libraries
const results = await client.search('react hooks');

// Get a specific library
const entry = await client.getEntry('some-library');

// Download
await client.download(entry, './downloaded.libragen');

Sources

FileSource

Read files from the local filesystem.

import { FileSource } from '@libragen/core';

const source = new FileSource();

const files = await source.getFiles({
   paths: ['./src', './docs'],
   patterns: ['**/*.ts', '**/*.md'],
   ignore: ['**/node_modules/**'],
   maxFileSize: 1024 * 1024,  // 1MB
});

GitSource

Clone and read files from git repositories. Automatically detects licenses from LICENSE files.

import { GitSource } from '@libragen/core';

const source = new GitSource();

const result = await source.getFiles({
   url: 'https://github.com/user/repo',
   ref: 'main',
   depth: 1,
   patterns: ['**/*.ts'],
});

console.log(result.files);           // Array of source files
console.log(result.commitHash);      // Full commit SHA
console.log(result.detectedLicense); // { identifier: 'MIT', file: 'LICENSE', confidence: 'high' }

// Clean up temp directory for remote repos
if (result.tempDir) {
   await source.cleanup(result.tempDir);
}

LicenseDetector

Detect SPDX license identifiers from license files.

import { LicenseDetector } from '@libragen/core';

const detector = new LicenseDetector();

// Detect from file content
const result = detector.detectFromContent(licenseText);
// { identifier: 'MIT', confidence: 'high' }

// Detect from a directory (checks LICENSE, LICENSE.md, COPYING, etc.)
const detected = await detector.detectFromDirectory('./my-project');
// { identifier: 'Apache-2.0', file: 'LICENSE', confidence: 'high' }

Supported licenses:

  • MIT
  • Apache-2.0
  • GPL-3.0, GPL-2.0
  • LGPL-3.0, LGPL-2.1
  • BSD-3-Clause, BSD-2-Clause
  • ISC
  • Unlicense
  • MPL-2.0
  • CC0-1.0
  • AGPL-3.0

Git URL Utilities

Helper functions for working with git URLs.

import { isGitUrl, parseGitUrl, getAuthToken, detectGitProvider } from '@libragen/core';

// Check if a string is a git URL
isGitUrl('https://github.com/user/repo');  // true
isGitUrl('/local/path');  // false

// Parse a git URL into components
const parsed = parseGitUrl('https://github.com/vercel/next.js/tree/main/docs');
// { repoUrl: 'https://github.com/vercel/next.js', ref: 'main', path: 'docs' }

// Get auth token for a repo (checks environment variables)
const token = getAuthToken('https://github.com/user/repo');
// Checks GITHUB_TOKEN, GH_TOKEN for GitHub; GITLAB_TOKEN for GitLab, etc.

// Detect git provider from URL
detectGitProvider('https://github.com/user/repo');  // 'github'
detectGitProvider('https://gitlab.com/user/repo');  // 'gitlab'

Migrations

Schema migration utilities for upgrading library files.

import {
   MigrationRunner,
   CURRENT_SCHEMA_VERSION,
   MigrationRequiredError,
   SchemaVersionError,
} from '@libragen/core';

// Check and run migrations on a library
const runner = new MigrationRunner();

try {
   const result = await runner.migrateIfNeeded('./library.libragen');
   if (result.migrated) {
      console.log(`Migrated from v${result.fromVersion} to v${result.toVersion}`);
   }
} catch (e) {
   if (e instanceof MigrationRequiredError) {
      console.log('Migration required but not run (use force option)');
   }
   if (e instanceof SchemaVersionError) {
      console.log('Unsupported schema version');
   }
}

// Current schema version
console.log(CURRENT_SCHEMA_VERSION);  // e.g., 2

Update Checking

Utilities for checking and applying library updates from collections.

import {
   LibraryManager,
   CollectionClient,
   checkForUpdate,
   findUpdates,
   performUpdate,
} from '@libragen/core';

const manager = new LibraryManager();
const client = new CollectionClient();
await client.loadConfig();

// Get installed libraries
const installed = await manager.listInstalled();

// Find all available updates
const updates = await findUpdates(installed, client, { force: false });

for (const update of updates) {
   console.log(`${update.name}: ${update.currentVersion} → ${update.newVersion}`);

   // Apply the update
   await performUpdate(update, manager);
}

// Or check a single library
const lib = await manager.find('my-library');
const entry = await client.getEntry('my-library');

if (lib && entry) {
   const update = checkForUpdate(lib, entry);
   if (update) {
      await performUpdate(update, manager);
   }
}

Types:

interface UpdateCandidate {
   name: string;
   currentVersion: string;
   currentContentVersion?: string;
   newVersion: string;
   newContentVersion?: string;
   source: string;  // Download URL
   location?: 'global' | 'project';
}

interface CheckUpdateOptions {
   force?: boolean;  // Update even if versions match
}

Utilities

Helper functions for common operations.

import { formatBytes, deriveGitLibraryName, formatDuration } from '@libragen/core';

// Format bytes into human-readable string
formatBytes(1536);      // "1.5 KB"
formatBytes(1048576);   // "1 MB"
formatBytes(0);         // "0 Bytes"

// Derive library name from git URL
deriveGitLibraryName('https://github.com/vercel/next.js.git');  // "vercel-next.js"
deriveGitLibraryName('https://github.com/microsoft/typescript'); // "microsoft-typescript"

// Format duration in seconds
formatDuration(45.5);   // "45.5s"
formatDuration(90);     // "1m 30s"
formatDuration(120);    // "2m"

Time Estimation

Estimate embedding time based on system capabilities. Useful for providing user feedback during builds.

import { getSystemInfo, estimateEmbeddingTime, formatSystemInfo } from '@libragen/core';

// Get system information
const info = getSystemInfo();
console.log(info.cpuModel);      // "Apple M2 Pro"
console.log(info.cpuCores);      // 12
console.log(info.totalMemoryGB); // 32
console.log(info.platform);      // "darwin"
console.log(info.arch);          // "arm64"

// Estimate time for embedding chunks
const estimate = estimateEmbeddingTime(500);
console.log(estimate.estimatedSeconds);  // 10
console.log(estimate.formattedTime);     // "10s"
console.log(estimate.chunksPerSecond);   // 50

// Format system info for display
const display = formatSystemInfo(info);
console.log(display);  // "Apple M2 Pro (12 cores)"

The estimation accounts for different CPU types:

  • Apple Silicon (M1-M4): 35-55 chunks/second
  • Intel/AMD x64: 10-30 chunks/second (scales with core count)
  • ARM Linux: 8-20 chunks/second

Library File Format

A .libragen file is a SQLite database with the following schema:

Tables:

  • metadata — Key-value store for library metadata
  • chunks — Code/documentation chunks with source info
  • vec_chunks — Virtual table for vector similarity search (sqlite-vec)
  • chunks_fts — FTS5 table for keyword search

Metadata fields:

interface LibraryMetadata {
   name: string;
   version: string;
   description?: string;
   agentDescription?: string;
   contentVersion?: string;
   contentVersionType?: 'semver' | 'date' | 'commit';
   keywords?: string[];
   programmingLanguages?: string[];
   textLanguages?: string[];
   exampleQueries?: string[];
   createdAt: string;
   embedding: {
      model: string;
      dimensions: number;
   };
   chunking: {
      strategy: string;
      chunkSize: number;
      chunkOverlap: number;
   };
   stats: {
      chunkCount: number;
      sourceCount: number;
      fileSize: number;
   };
   contentHash: string;
   source?: {
      type: 'local' | 'git';
      path?: string;       // For local sources
      url?: string;        // For git sources
      ref?: string;        // Branch/tag name
      commitHash?: string; // Full commit SHA
      licenses?: string[]; // SPDX license identifiers
   };
}

Collection Format

Collections are JSON files with the following structure:

{
   "name": "My Collection",
   "description": "A collection of libraries",
   "libraries": [
      {
         "name": "my-library",
         "description": "A helpful library",
         "versions": [
            {
               "version": "1.0.0",
               "contentVersion": "v2.0.0",
               "contentVersionType": "semver",
               "downloadURL": "https://example.com/my-library-1.0.0.libragen",
               "contentHash": "sha256:abc123...",
               "fileSize": 1234567
            }
         ]
      }
   ]
}

Related