@prodisco/search-libs
v0.1.3
Published
Generic TypeScript library indexing and search using Orama
Maintainers
Readme
@prodisco/search-libs
A generic library indexing + search solution using Orama. Extract types, methods, and functions from TypeScript libraries (via .d.ts) and ESM JavaScript libraries (best-effort), index TypeScript scripts, and provide unified structured search for AI agents.
Table of Contents
- Features
- Installation
- Quick Start
- API Reference
- Document Types
- Architecture
- How Library Indexing Works
- Extending the Schema
- License
Features
- Generic Library Extraction: Extract types (classes, interfaces, enums, type-aliases) and methods/functions from npm packages using TypeScript AST parsing (TypeScript
.d.ts+ ESM JavaScript fallback) - Script Indexing: Index TypeScript scripts with automatic metadata extraction (description, keywords, API references)
- Unified Search: Search across types, methods, functions, and scripts with structured queries and structured output
- Extensible Schema: Base Orama schema with support for custom extensions
- AI-Optimized: Structured output designed for AI code generation agents
Installation
npm install @prodisco/search-libsQuick Start
import { LibraryIndexer } from '@prodisco/search-libs';
// Create indexer with packages to extract
const indexer = new LibraryIndexer({
packages: [
{ name: '@kubernetes/client-node' },
{ name: '@prodisco/prometheus-client' },
{ name: 'simple-statistics' },
],
});
// Initialize - extracts and indexes all packages
await indexer.initialize();
// Search across all indexed content
const results = await indexer.search({
query: 'Pod',
documentType: 'type',
limit: 10,
});
console.log(results.results[0]);
// {
// id: 'type:@kubernetes/client-node:V1Pod',
// documentType: 'type',
// name: 'V1Pod',
// library: '@kubernetes/client-node',
// category: 'interface',
// description: 'Pod is a collection of containers...',
// properties: [...],
// typeKind: 'interface',
// }API Reference
LibraryIndexer
The main entry point for indexing and searching.
interface LibraryIndexerOptions {
packages: PackageConfig[];
basePath?: string; // Defaults to process.cwd()
}
interface PackageConfig {
name: string; // npm package name
typeFilter?: RegExp | ((name: string) => boolean);
methodFilter?: RegExp | ((name: string) => boolean);
}Methods
initialize(): Promise<{ indexed: number; errors: ExtractionError[] }>
Extracts and indexes all configured packages.
search(options: SearchOptions): Promise<SearchResult>
Search the index with structured queries.
interface SearchOptions {
query?: string; // Full-text search term
documentType?: string; // 'type' | 'method' | 'function' | 'script' | 'all'
category?: string; // Filter by category
library?: string; // Filter by library
limit?: number; // Max results (default: 10)
offset?: number; // Pagination offset
}
interface SearchResult {
results: IndexedDocument[];
totalMatches: number;
facets: {
documentType: Record<string, number>;
library: Record<string, number>;
category: Record<string, number>;
};
searchTime: number;
}addScript(filePath: string): Promise<void>
Add a TypeScript script to the index. Automatically parses for:
- Description (from first comment block)
- Keywords (from description)
- Resource types (from filename and content AST)
- API references (from content AST)
addScriptsFromDirectory(dirPath: string): Promise<void>
Add all TypeScript scripts from a directory.
removeScript(filePath: string): Promise<void>
Remove a script from the index.
addDocuments(docs: IndexedDocument[]): Promise<void>
Add custom documents to the index (e.g., from external sources).
shutdown(): Promise<void>
Clean up resources.
Document Types
Type Documents
Extracted from .d.ts files (preferred). If no .d.ts is found, types/classes can be extracted from ESM JavaScript source (.js/.mjs) as a best-effort fallback (parameter/return types default to any).
{
id: 'type:@kubernetes/client-node:V1Pod',
documentType: 'type',
name: 'V1Pod',
library: '@kubernetes/client-node',
category: 'interface',
description: 'Pod is a collection of containers...',
properties: [
{ name: 'metadata', type: 'V1ObjectMeta', optional: true },
{ name: 'spec', type: 'V1PodSpec', optional: true },
],
typeKind: 'interface',
nestedTypes: ['V1ObjectMeta', 'V1PodSpec'],
}Method Documents
Extracted from class methods:
{
id: 'method:@kubernetes/client-node:CoreV1Api:listNamespacedPod',
documentType: 'method',
name: 'listNamespacedPod',
library: '@kubernetes/client-node',
category: 'list',
description: 'List pods in a namespace',
parameters: [
{ name: 'namespace', type: 'string', optional: false },
],
returnType: 'Promise<V1PodList>',
signature: 'listNamespacedPod(namespace: string): Promise<V1PodList>',
}Script Documents
Indexed from TypeScript files:
{
id: 'script:get-pod-logs.ts',
documentType: 'script',
name: 'get-pod-logs',
library: 'CachedScript',
category: 'script',
description: 'Retrieves logs from a Kubernetes pod',
filePath: '/path/to/scripts/get-pod-logs.ts',
keywords: 'logs pod kubernetes',
}Architecture
search-libs/
├── extractor/ # TypeScript AST extraction
│ ├── type-extractor # Extract classes, interfaces, enums
│ ├── method-extractor # Extract methods from classes
│ ├── function-extractor # Extract standalone functions
│ └── package-resolver # Find .d.ts files or ESM JS entrypoints in node_modules
├── script/ # Script parsing
│ └── script-parser # Parse scripts for metadata
├── schema/ # Orama schema
│ ├── base-schema # Core schema fields
│ └── schema-builder # Extensibility
└── search/ # Search engine
├── search-engine # Orama wrapper
├── query-builder # Fluent query API
└── result-formatter # Format for AI consumptionHow library indexing works (TypeScript + JavaScript)
This section explains what happens when you call LibraryIndexer.initialize() for a package in node_modules, and how search-libs decides whether to index from TypeScript declarations or JavaScript source.
High-level flow
At a high level, indexing a package looks like:
- Resolve package folder:
basePath/node_modules/<packageName>/ - Decide extraction strategy:
- Prefer TypeScript declarations (
.d.ts) when discoverable - Otherwise, attempt ESM JavaScript source fallback (
.js/.mjs)
- Prefer TypeScript declarations (
- Extract documents:
- Types (classes/interfaces/enums/type-aliases)
- Methods (class methods)
- Functions (standalone functions)
- Insert into Orama and expose them via
search()
TypeScript packages (declaration-first)
For TypeScript libraries (or JS libraries that ship .d.ts), extraction is declaration-first:
1) Finding .d.ts files
search-libs attempts to locate a main .d.ts and then scans for additional .d.ts files:
- Main declaration candidates:
package.json"types"/"typings"package.json"exports"["."]["types"]- common fallbacks like
dist/index.d.ts,lib/index.d.ts,index.d.ts
- Additional declarations:
- Walks the package’s
types/,typings/,dist/,lib/, andsrc/trees (bounded depth) - Skips common test/internal files (e.g.
*.test.*,*.spec.*, names containing__)
- Walks the package’s
2) Understanding what’s “public”
Some packages have internal class names that are re-exported or aliased at the entrypoint. To reduce noise for method indexing, search-libs parses the package’s main .d.ts and builds:
- Public export set: the names users can import
- Alias map: internal names → public names (e.g.
ObjectCoreV1Api→CoreV1Api)
It follows export * from './x' chains (relative only) to build a more complete public view.
3) Extracting types / methods / functions
Once .d.ts files are discovered, each file is parsed with the TypeScript compiler AST and we extract:
- Types:
class,interface,enum, and simpletypealiases- Properties are captured as text (type strings from
.d.ts) - Nested type references are detected for better searchability
- Properties are captured as text (type strings from
- Methods:
- Extracted from class declarations
- If a public export set exists, methods are indexed only for publicly exported classes
- Aliases are applied so class names match what users import
- Functions:
- Extracts function declarations and exported function-valued variables
Notes:
- Types are extracted from all discovered
.d.tsfiles (often includes internal-but-useful helper types). LibraryIndexercan expand complex parameter/return types by looking up extracted types and embedding a compact definition in method docs.
JavaScript packages (ESM source fallback)
If no .d.ts files are discoverable, search-libs attempts to index the package’s ESM JavaScript source.
1) Finding an ESM entry file
Entry resolution is based on package.json and common build layouts:
- Prefer
exports['.']with"import"(then"default") - Then
"module" - Then
"main"only when the package has"type": "module"(for.js) - Plus common fallbacks (
dist/index.js,lib/index.js,index.js, and.mjsvariants)
Only .js (ESM via "type":"module") and .mjs are considered. CommonJS (.cjs) is intentionally ignored.
2) Computing the public surface (exports)
JavaScript libraries often re-export from multiple files. To avoid indexing internal helpers, search-libs first computes the public export surface by traversing the entry’s static export graph (relative only).
Supported patterns include:
export { a, b as c } from './x.js'export * from './x.js'(does not re-exportdefault)export { a, b as c }(local exports)- import + re-export:
import { foo as localFoo } from './x.js';
export { localFoo as foo };- direct exports:
export function foo() {}export class Foo {}export const foo = () => {}export default <identifier>(best-effort; indexed under the namedefaultwhen resolvable)
From this traversal, search-libs builds:
- a per-file allowlist of declaration names that are actually part of the public API
- a per-file alias map for renamed exports (
internalFn→publicFn)
Only relative (./...) re-exports are followed. Non-relative re-exports (from dependencies) are ignored.
3) Extracting from JavaScript source
For each JS module that contributes exports, search-libs runs the same AST extractors as TypeScript, but applies the allowlist/aliases so only public symbols are indexed:
- Exported functions: indexed with parameter/return types defaulting to
any - Exported classes: indexed as type documents, and their methods are indexed as method documents
- Descriptions: pulled from JSDoc comment blocks when present (e.g.
/** ... */)
Filters and tuning
You can control noise and focus via PackageConfig:
typeFilter: include only matching type namesmethodFilter: include only matching method/function namesclassFilter: include methods only from matching class names (applies to the public/aliased class name)
Limitations (by design)
- CommonJS (
module.exports,exports.*) is not supported by the JS fallback. - Dynamic exports are not supported (computed exports, runtime mutation, etc.).
- Re-exports from dependencies (non-relative specifiers like
'lodash') are ignored by the JS fallback. - JS fallback is best-effort: it parses syntax but does not run a type checker; parameter/return types default to
any.
Tips for best results
- If you can, ship
.d.ts(or add@types/<pkg>): declaration-first indexing produces richer type signatures. - For JS-only ESM libraries:
- Prefer static named exports over dynamic export patterns
- Add JSDoc descriptions on exported functions/classes/methods to improve search quality
- Keep exports shallow and explicit at the entrypoint for a clearer public surface
Extending the Schema
For domain-specific fields, use the schema builder:
import { buildSchema, SearchEngine } from '@prodisco/search-libs';
const customSchema = buildSchema({
extensions: {
customField: 'string',
customEnum: 'enum',
},
});
const engine = new SearchEngine({ schema: customSchema });License
MIT
