@x12i/search-adapter
v1.5.1
Published
Tavily-backed web search adapter for Narrix and TypeScript apps.
Readme
@x12i/search-adapter
Tavily-backed web search adapter for Narrix and other TypeScript/Node applications.
This package separates two ideas that used to be conflated:
- Discovery — URLs and text the search provider returned (Tavily).
- Evidence — URLs this adapter actually fetched over HTTP and extracted text from (optional second stage).
SearchResult exposes discoveredSources and evidenceSources separately so downstream code can tell what was found vs what was scoped.
Environment
TAVILY_API_KEY=your_tavily_token_hereOverride per adapter: createSearchAdapter({ tavily: { apiKey: "…" } }).
Install & build
npm install
npm run build
npm test- Integration tests call Tavily only when
TAVILY_API_KEYis set. npm run test:integration:proof— withSEARCH_ADAPTER_TEST_PROOF=1, logs discovery/evidence previews.npm run proof:search -- "query"— live stats for discovery fields; setSEARCH_ADAPTER_FETCH=1to also runfetchPageson the top URLs (see script header).
Public API
import {
createSearchAdapter,
type SearchAdapter,
type SearchAdapterConfig,
type SearchRequest,
type SearchManyRequest,
type SearchResult,
type DiscoveredSource,
type EvidenceSource,
type ProviderFinding,
} from "@x12i/search-adapter";createSearchAdapter(config?: SearchAdapterConfig)
Top-level config
| Option | Default | Role |
|--------|---------|------|
| includeSourceSnippets | true | When false, every DiscoveredSource has snippet: "" and no providerContent / providerRawContent (overridable per request). |
| redactQuery | (none) | (query) => string run on the query after validation and before Tavily. The returned string is what leaves your process for search. Must not be empty/whitespace-only. |
config.tavily
| Option | Default | Role |
|--------|---------|------|
| apiKey | process.env.TAVILY_API_KEY | Tavily API key. |
| apiBaseUrl | (SDK default) | Passed as Tavily apiBaseURL. |
| timeoutMs | 15000 | Client timeout budget → Tavily timeout (seconds). |
| maxRetries | 1 | Retries for retryable provider failures. |
| defaultTopic / defaultSearchDepth / includeAnswer / includeRawContent / snippetMaxChars / maxResults | see src/config.ts | Request defaults. |
config.fetch (second-stage HTTP fetch)
Fetch runs only when config.fetch.enabled === true and the request sets fetchPages: true.
| Option | Default | Role |
|--------|---------|------|
| enabled | false | Master switch for evidence fetching. |
| topK | 5 | Max URLs to fetch per search (also capped by request.fetchTopK). |
| concurrency | 4 | Max concurrent GETs while fetching the batch from a single search() (bounded pool). |
| maxAttempts | 3 | Total HTTP GET attempts per URL (evidence fetch + fetchUrlContent). Retries use exponential backoff with jitter on 408, 429, 502, 503 and transient network errors; honors Retry-After when parseable. |
| timeoutMs | 12000 | Per-attempt fetch timeout (each retry gets a fresh budget). |
| maxContentChars | 500000 | Max extracted text per URL. |
| userAgent | (package string) | User-Agent header. |
Adapter methods
search(request)— Resolves defaults, validatesResolvedSearchRequest, appliesredactQuerywhen configured, runs Tavily →mapTavilyDiscovery(discovery-only), optionallyfetchEvidenceSources, then assemblesSearchResult(discovery + evidence layers).searchMany(request)— Same concurrency /stopOnErrorbehavior; mergesdiscoveredSources,evidenceSources,providerFindings, andfindingsin separate maps.fetchUrlContent(url, options?)— Fetches one URL with the same rules as evidence GETs (timeouts, byte caps,maxAttempts/ backoff). Never throws; always returns anEvidenceSource(checkfetchOk). Does not requirefetch.enabled.healthCheck()— API key configured (no network).
Parallel URL fetches and rate limits
- Inside one
search({ fetchPages: true })call: at mostfetch.concurrencyrequests are in flight at any time; each URL is still tried up tofetch.maxAttemptstimes with backoff between attempts. There is no separate global QPS limit—saturating many hosts in parallel is your tradeoff. - Many parallel
fetchUrlContentcalls: each invocation runs its own retry loop with no cross-call throttling. To stay polite to a single origin, cap parallelism yourself (e.g. a pool of size ≤fetch.concurrency) or serialize fetches.
PII / sensitive tokens in queries
- Prefer
redactQueryon the adapter config to strip or replace patterns (hostnames, ticket IDs, bearer fragments) beforerunTavilySearch. - Keep replacements non-empty so validation still passes, e.g. replace
internal-hostwith[REDACTED_HOST]rather than deleting the whole string. - Per-request override is not exposed; build different adapter instances if you need different redactors.
Tavily include_raw_content: TavilyIncludeRawContent = boolean | "markdown" | "text" (API docs). Request value true is sent to the SDK as "markdown".
DiscoveredSource (Tavily / search API)
What the provider returned for a hit—not proof you fetched the live page yourself.
| Field | Notes |
|--------|--------|
| url, normalizedUrl, domain, title, publishedAt | Normalized URL used for dedupe in searchMany. |
| snippet | Always present (string). Usable excerpt (Tavily snippet → content → raw, truncated by snippetMaxChars). Empty string when the provider sent nothing or when includeSourceSnippets is false. |
| providerContent | From Tavily’s content only, truncated. |
| providerRawContent | From raw_content / rawContent when requested; not capped by snippetMaxChars. |
| snippetKind | "snippet" | "provider_content" | "provider_raw_content". |
| providerScore, rank | From Tavily when present. |
| matchedQueries | Which queries returned this URL (filled/merged in searchMany). |
EvidenceSource (HTTP fetch)
What this process requested and optionally extracted. Check fetchOk before treating extractedText as reliable.
| Field | Notes |
|--------|--------|
| fetchOk, httpStatus, fetchError | Outcome of the GET. |
| origin | fetched_html | fetched_text | fetched_json | fetched_pdf (PDF not extracted yet). |
| extractedText | Plain-ish text (HTML stripped heuristically). |
| authorityScore, freshnessScore, qualityScore | Simple heuristics (gov/CVE/vendor domains, publishedAt age, length/success). |
| derivedFromDiscoveredSourceIds | Discovery row IDs this fetch came from. |
| matchedQueries | Carried from discovery. |
Tavily mapper (mapTavilyDiscovery)
Returns TavilyDiscoveryResult only—never a full SearchResult:
discoveredSourcesproviderSummary/providerSummaryOrigin(from Tavily’sanswer)providerFindings(answer + top snippet hints)
That keeps the Tavily step honest: it is discovery-stage output, not scoped evidence.
SearchResult
interface SearchResult {
ok: boolean;
provider: "tavily";
query: string;
providerSummary?: string;
providerSummaryOrigin?: "provider_answer";
providerFindings: ProviderFinding[];
findings: SearchFinding[]; // evidence-backed / merged; empty until you add that layer
discoveredSources: DiscoveredSource[];
evidenceSources: EvidenceSource[];
request: ResolvedSearchRequest;
timing: SearchTiming;
error?: SearchError;
raw?: { providerResponse?: unknown };
}ProviderFinding (from discovery only)
provider_answer— Tavily’sanswer;sourceIdsis empty (do not treat as grounded in every URL).provider_hint— Short rows from top discovery snippets when there is no answer—hints, not verified claims.
SearchFinding (evidence / merge layer)
Reserved for source_claim, derived, cross_source_consensus, etc. The adapter currently returns findings: []; populate when you merge fetched text or rank evidence.
SearchManyResult.merged
providerFindings— Deduped provider hints across queries.findings— Deduped evidence-backed findings (usually empty until implemented).discoveredSources/evidenceSources— Merged separately by normalized URL.queriesUsed— Sub-query strings in order.totalDiscoveredSources/totalEvidenceSources— Counts after merge.
Example: discovery only (default)
const adapter = createSearchAdapter();
const result = await adapter.search({
query: "CVE-2024-9999",
maxResults: 5,
includeAnswer: true,
});
if (result.ok) {
console.log(result.discoveredSources.length, result.evidenceSources.length);
console.log(result.providerSummary, result.providerSummaryOrigin);
console.log(result.providerFindings.length, result.findings.length);
}Example: discovery + evidence fetch
const adapter = createSearchAdapter({
fetch: { enabled: true, topK: 3, timeoutMs: 15000 },
});
const result = await adapter.search({
query: "CVE-2024-9999 advisory",
maxResults: 5,
fetchPages: true,
fetchTopK: 2,
});
if (result.ok) {
for (const e of result.evidenceSources) {
if (e.fetchOk) console.log(e.url, e.extractedText?.slice(0, 500));
}
}Example: per-URL content after deduplication
const adapter = createSearchAdapter({
fetch: { maxAttempts: 3, timeoutMs: 15000, maxContentChars: 200_000 },
});
const row = await adapter.fetchUrlContent("https://example.com/doc");
if (row.fetchOk) {
console.log(row.extractedText?.slice(0, 500));
} else {
console.warn(row.fetchError, row.httpStatus);
}Example: query redaction
const adapter = createSearchAdapter({
tavily: { apiKey: process.env.TAVILY_API_KEY },
redactQuery: (q) =>
q.replace(/\b[a-z0-9-]+\.internal\.company\b/gi, "[internal-host]"),
});Errors
SearchError includes optional context: { stage?, query?, provider? } with stage among validate | provider_call | map | fetch.
Migration
sources/SearchSource→discoveredSources+evidenceSources.summary→providerSummary; provider-only rows →providerFindings(notfindings).findingsonSearchResultis now for evidence-backed claims only (often empty until you add merge logic).
