@exellix/narrix-web-scoper
v2.0.0
Published
CNI-aware web context planner and mapper for Exellix. Uses @x12i/search-adapter for web retrieval.
Maintainers
Readme
@exellix/narrix-web-scoper
An independent TypeScript library for generating web-search plans from an entity and mapping @exellix/search-adapter results into a stable WebContext shape.
It is designed to be embeddable in Narrix, but it does not require Narrix runner/engine to be useful—you can call it directly anywhere you can provide a SearchAdapter instance.
What it does:
- Builds web search queries from an entity (or from a provided
WebScopingMap) - Calls a
SearchAdapterfrom@exellix/search-adapter(preferssearchMany(...)when available, with fallback toexecute(...)/search(...)) - Maps the result into a stable
WebContextshape for Narrix to consume - Supports a second “gap-driven” search mode (
scopeForGaps) when you havegapHintsindicating missing context (Narrix uses this withdetectGaps, but you can supplygapHintsyourself) - Supports multi-question “question packs” (
scopeQuestionPack) to produce a stable multi-scope web context artifact
For detailed adapter usage (how to configure createSearchAdapter, what request/response shapes are supported, and integration patterns), see the @exellix/search-adapter docs.
Status
This repo currently implements:
- Eligibility gating (allowlist by
datasetId/entityKind, or custom function) - Query building (
buildQueries,buildGapQueries) - Search adapter integration (
scope,scopeForGaps,scopeGeneric,scopeForGapsGeneric,scopeQuestionPack) viaSearchAdapterLike.searchMany(...)(preferred) with fallback toexecute(...)/search(...) - Result mapping from
SearchExecutionResultinto a stableWebContextshape - Deterministic caps for web context size (
maxFindings,maxSources, and snippet caps when enabled)
It does not currently implement memory caching, per-call TTL support, or pack/runner integration described in docs/narrix-web-scoper-plan.md. See Gap analysis.
For the verified source-reading target pipeline (fetch → extract → claim-attributed reasoning), see docs/web-scoping-roadmap.md. The package exports stable contract types (WebAttributedClaim, DisciplinedReasoningInput, SourceContentFetcher, …) for orchestrators and future fetch packages—see Public API.
Requirements
- Node:
>=18(seepackage.json) - Registry access: this package depends on
@exellix/search-adapterfrom GitHub Packages
Environment variables
GITHUB_TOKEN: required to install dependencies from GitHub Packages (see.npmrc)TAVILY_API_KEY: only required if you want to run the optional “real web” integration test (seetests/integration/scope.with-tavily.test.ts)
To get started quickly:
cp .env.example .env
# edit .env and set valuesInstall
1) Configure GitHub Packages auth
This repo uses GitHub Packages for scoped registries. The included .npmrc expects GITHUB_TOKEN to be set:
//npm.pkg.github.com/:_authToken=${GITHUB_TOKEN}
Set GITHUB_TOKEN to a GitHub token with permission to read the relevant packages (and SSO enabled if required by the org).
If you don’t want to commit a real .npmrc, copy the template:
cp .npmrc.example .npmrc2) Install dependencies
npm ciBuild & test
# Build to dist/ (types + sourcemaps)
npm run build
# Run tests once
npm test
# Watch mode
npm run test:watchLive Tavily integration (tests/integration/scope.with-tavily.test.ts) runs whenever TAVILY_API_KEY is set to a non-placeholder value (including CI secrets). Set RUN_LIVE_TAVILY=0 to skip those tests while keeping a key in .env.
npm run test:integration— same integration file with Vitest’s verbose reporter (easier to see what ran vs skipped).- Console lines prefixed with
[live-tavily]report gate status, timing, and result counts when live tests actually execute. - If Tavily returns unauthorized, the first live test fails by default (it used to pass silently). Set
TAVILY_LIVE_ALLOW_UNAUTHORIZED_PASS=1only if you intentionally want a pass without proving Tavily.
Public API
The package exports a single factory plus types:
createWebScoper(config: WebScoperConfig): NarrixWebScoperNarrixWebScopermethods:scope,scopeForGaps,scopeGeneric,scopeForGapsGeneric,scopeQuestionPack- Search / context types:
WebContext,WebFinding,WebSource,WebSourceContentSource,WebSourceExcerptFrom,WebSourceRetrievalStage,WebScoperResult,GapSearchResult,GapHints,WebScopingMap,NarrixScope - Question pack types:
WebScopeQuestion,WebScopePackInput,WebScopePackResult,WebScopeQuestionOutcome,WebContextScopes - Web-scoped persistence types (from
@xronoces/xmemory-scoper):WebScopedDataDoc,WebScopedEntityRef - Verified-pipeline contracts (no I/O in this package):
WebAttributedClaim,WebClaimFreshness,DisciplinedReasoningInput,SourceFetchRequest,SourceFetchResult,SourceFetchOk,SourceFetchErr,SourceContentFetcher
Entry point: src/index.ts. Roadmap: docs/web-scoping-roadmap.md.
Usage
Basic enrichment search (scope)
If enabled is false, scope() returns { available: false, reason: "disabled" }.
If enabled but no searchAdapter is provided, scope() returns { available: true, context: empty, cached: false } (a stubbed empty context with queriesUsed populated).
import { createWebScoper } from "@exellix/narrix-web-scoper";
import { createSearchAdapter } from "@exellix/search-adapter";
const adapter = createSearchAdapter({
tavily: {
apiKey: process.env.TAVILY_API_KEY!,
maxResults: 3,
includeAnswer: true,
},
});
const scoper = createWebScoper({
enabled: true,
eligibility: { datasetIds: ["acme.vulnerabilities"] },
searchAdapter: adapter,
scoping: { maxQueries: 3 },
});
const result = await scoper.scope({
datasetId: "acme.vulnerabilities",
subjectId: "CVE-2024-9999",
entityKind: "vulnerability",
entity: { cveId: "CVE-2024-9999" },
cni: {}, // passed through to planning and execution hints
});
if (result.available) {
// result.context is the stable output shape for your app (and for Narrix, if embedded there)
console.log(result.context.summary);
console.log(result.context.findings);
} else {
console.log(result.reason, result.error);
}Gap-driven search (scopeForGaps)
This mode builds different queries based on gapHints (e.g. unknown dataset, missing schema, empty stories).
const gapResult = await scoper.scopeForGaps({
datasetId: "acme.vulnerabilities",
subjectId: "CVE-2024-9999",
entityKind: "vulnerability",
entity: { cveId: "CVE-2024-9999" },
cni: {},
gapHints: { missingSchema: true },
});
if (gapResult.found) {
console.log(gapResult.gapType, gapResult.context.queriesUsed);
} else {
console.log(gapResult.gapType, gapResult.reason);
}Question packs (multi-scope web context)
Use scopeQuestionPack to run multiple simple web-scoping questions in one call. You get:
results: a map keyed by each question’sidwith one outcome per entry:property_resolved,db_hit,web_fetch, ormiss.context.scopes: legacy-compatible per-scope entries (pluscontext.summary/findings/sources/queriesUsedfrom the primary scope).
Simple questions rule
Each question string must be a plain, direct English question someone would type into a search engine—short, not a mash-up of entity IDs, taxonomy labels, product names, or several sub-questions in one line. Cover more ground with more entries in the questions array, not longer composed strings. This package runs questions as-is and does not validate that rule at runtime; keeping questions simple is a caller responsibility (documentation and review).
Behaviour summary
- Empty
questions: no-op (available: true, emptyresultsandscopes). mappedProperty: optional dot-path intoxmemorySnapshot. If it resolves to a non-empty value, web search and DB lookup are skipped for that question (unlessforceWebis true).- DB layer (optional): set
config.webScopedData.getWebScopedData/saveWebScopedData(for example methods fromcreateWebScopedDataApiin@xronoces/xmemory-scoper). Lookup uses the question text pluslinkedEntities(orsubjectId+entityKindwhen both are set). If callbacks are omitted or persistence fails, the pack still runs (web-only where applicable). forceWeb: skips property resolution andgetWebScopedData, but still callssaveWebScopedDataafter a successful web path when configured.- Parallel search, deduped URL fetch: all pack searches run concurrently (subject to
concurrency). When page fetch is used, each normalized URL (scheme + host + path, tracking params stripped) is downloaded at most once; content is reused for every question that referenced that URL. Per-URL fetch failures are non-fatal. - Persisted shape: successful web answers are built as
WebScopedDataDoc(imported from@xronoces/xmemory-scoper;linkedEntities,rawData.sources, andsynthesizedDataare always populated for new web rows).
De-dupe of identical question text is supported (dedupe: "normalized" by default). Failures are lenient: one scope can miss while others succeed.
const pack = await scoper.scopeQuestionPack({
subject: "CVE-2024-9999",
xmemorySnapshot: graphSnapshot, // optional: for `mappedProperty` on questions
questions: [
{
id: "exploitationReality",
purpose: "In-the-wild exploitation",
question: "Is CVE-2024-9999 exploited in the wild?",
},
{
id: "exploitCode",
purpose: "Public exploit material",
question: "What public exploit code exists for CVE-2024-9999?",
},
],
concurrency: 3,
});
if (pack.available) {
console.log(pack.results.exploitationReality?.status);
console.log(pack.context.summary); // primary scope (legacy)
console.log(pack.context.scopes.exploitationReality.context?.findings);
}Configuration
WebScoperConfig
Key fields used by the current implementation:
enabled?: booleansearchAdapter?: SearchAdapterLike(from@exellix/search-adapter, expected to providesearchManyorsearch/executeat runtime;fetchUrlContentis used for deduped page fetch in question packs when present)webScopedData?: { getWebScopedData?, saveWebScopedData? }— optional hooks aligned with@xronoces/xmemory-scoper/WebScopedDataDocpersistenceeligibility?: { datasetIds?: string[]; entityKinds?: string[]; isEligible?: (args) => boolean }scoping?: { maxQueries?: number; freshnessDays?: number; maxFindings?: number; maxSources?: number; includeSourceSnippets?: boolean; maxSnippetCharsPerSource?: number; maxTotalWebContextChars?: number; snippetIncludeRawContent?: boolean | "markdown" | "text"; sourceExcerptFrom?: "providerContent" | "providerRawContent" | "content" | "rawContent"; fetchPages?: boolean; fetchTopK?: number; ... }(content/rawContentare deprecated aliases for the adapter’sproviderContent/providerRawContent.)
Note: other fields exist in types (e.g. memory, cache) but are not wired for runtime caching in this package yet.
Source body fields (WebSource)
When scoping.includeSourceSnippets is true, each WebContext.sources[] entry may include provider-layer text from @exellix/search-adapter (normalized SearchSource.snippet, providerContent, providerRawContent; legacy clients may still see content / rawContent on the wire—those are the same roles under older names). This is discovery-time material from the search provider, not a guarantee of full-page or live-site truth unless a later fetch stage exists.
providerContent/providerRawContent: first-class copies of the adapter’s bounded excerpt and raw payload (when requested). Prefer these over legacy names.content/rawContent: deprecated mirrors ofproviderContent/providerRawContentfor older consumers.snippet: primary excerpt for this source, chosen viascoping.sourceExcerptFrom(defaultproviderContent, aliasescontent/rawContent):providerContent(default):providerContent→snippet.providerRawContent:providerRawContent→providerContent→snippet. IfsnippetIncludeRawContentis omitted,includeRawContentdefaults totrueso raw is requested.
snippetCharCount: code-point length ofsnippetwhen set.contentOrigin,retrievalStage,matchedQueries: passed through from the adapter when present (provenance for trust and debugging).contentSource: whencontentOriginis a knownWebSourceContentSource, it is copied here; otherwise, ifcontentOriginis absent, web-scoper inferssearch_api_raw_content,search_api_content, orsearch_api_snippetfrom which provider fields were populated (raw wins over bounded content over display snippet).score/rank: passed through from the adapter when present.- URLs are normalized (trimmed, fragment stripped) before
domain/urlare set.
WebContext.summary / summaryOrigin / summaryIsProvider: top-level summary from the adapter; summaryOrigin labels synthesis (e.g. provider_answer); summaryIsProvider is true when that origin is provider_answer. Optional merge counters (discoveredSourceCount, etc.) are copied when the adapter supplies them.
WebFinding: isProviderDerived is set for provider_answer / provider_snippet kinds; isStrongEvidence is true when any linked source has retrievalStage fetched or extracted. Relevance is down-ranked for provider-answer findings before confidence is applied.
Defaults are backward compatible: includeSourceSnippets defaults to false, so these fields are omitted unless you opt in.
Output caps:
maxFindings/maxSources: capsWebContext.findingsandWebContext.sources. Resolution order:input.cni.answerShapeHints.maxFindings/maxSources(if set)- otherwise
config.scoping.maxFindings/maxSources(defaults)
Snippet/text caps (only apply when snippets are enabled):
maxSnippetCharsPerSource: Unicode code-point cap applied per source toproviderContent,providerRawContent(and legacycontent/rawContentmirrors), and the text chosen forsnippet(aftersourceExcerptFrom). When set to a positive number, it is also forwarded assnippetMaxCharson the shared search request so the adapter can normalize earlier.maxTotalWebContextChars: additional budget applied only toWebSource.snippet, across sources in array order (after each snippet’s per-source cap). It does not shrink storedproviderContent/providerRawContent.
To request raw body text from the provider, set scoping.snippetIncludeRawContent (e.g. true or "markdown"); it is forwarded as includeRawContent (boolean true may be sent as "markdown" for SDK compatibility—see search-adapter docs).
WebFinding.support: when the adapter attaches support metadata (e.g. for provider_snippet findings), web-scoper preserves it on the mapped finding.
Query building
buildQueries (enrichment)
Order of precedence:
- If
scopingMap.queriesis provided: templates are interpolated fromentity, sorted by(weight ?? 1)descending, de-duped, and capped bymaxQueries. - Otherwise: an “auto” strategy picks a primary identifier from common fields (
id,cveId,name,hostname,productName,entityKey,identifier,key, …). It emits up to 3 queries:- The identifier alone
- Identifier +
"{entityKind} context" - Identifier +
"{entityKind} details"
buildGapQueries (gap-driven)
Builds up to 5 queries depending on gapHints:
unknownDataset: dataset/entity-kind discovery queriesmissingSchema: schema/documentation/example queriesprocessorNotMatched: “what is this input” + context queriesemptyStories: broader “analysis context” queries- If no hints are set: falls back to a generic context query
Execution adapter integration & mapping
This package is adapter-centric at the type level: it requires a SearchAdapterLike with an execute(request: SearchExecutionRequest) method at runtime (typically created via createSearchAdapter from @exellix/search-adapter). The adapter is responsible for talking to Tavily or other providers and returning a SearchExecutionResult that narrix-web-scoper maps into WebContext.
Repo layout
src/index.ts: main factory + search-adapter mappingsrc/query.ts: enrichment + gap query builderssrc/eligibility.ts: eligibility checkersrc/types.ts: public types (including adapter-facing types re-exported from@exellix/search-adapter)tests/: unit + integration tests (mock adapter + real Tavily-backed adapter)docs/narrix-web-scoper-plan.md: design/spec document (ahead of implementation)docs/nx-gap-analysis.md: notes on Nx + workspace alignment
Security notes
- Do not commit tokens. Use environment variables (this repo uses
GITHUB_TOKENvia.npmrc). - Treat orchestrator outputs as untrusted input if you surface them outside internal systems.
Gap analysis
Implemented in code
- Core API exists:
createWebScoper(),scope(),scopeForGaps(),scopeGeneric(),scopeForGapsGeneric(),scopeQuestionPack(),buildQueries(),buildGapQueries(),isEligible() - Query building: auto + from-map interpolation/weighting/deduping
- Search adapter mapping: supports the
SearchExecutionResultshape from@exellix/search-adapter - Tests: unit tests for eligibility/querying and integration tests using both a mock adapter and a real Tavily-backed adapter via
@exellix/search-adapter
Missing vs docs/narrix-web-scoper-plan.md (high priority)
- Memory caching (
config.memory,config.cache) is defined in types but not implemented- No
ttlSeconds,staleCutoffSeconds,cniHashPolicy,datasetId.webContextnamespacing, orcached/ageSecondslogic
- No
- Config defaults from the plan are not enforced (e.g.
enableddefault false, cache defaults, freshness/maxEvidence, etc.) - Query strategy config (
scoping.queryStrategy: "auto" | "fromMap") exists in types but is not used; current code always uses:- from-map if
scopingMapis provided tobuildQueries() - otherwise auto
- from-map if
- Runner integration and
_webContextCNI enrichment are not present (this is currently a library-only package) - Domain controls (
focusDomains,excludeDomains,maxEvidence,freshnessDays) exist only in the planning doc / types but are not wired into orchestrator calls - Gap search caching policy (“not cached by default”) is not implemented because caching is not implemented at all
Repo/package hygiene gaps (recommended)
- License:
UNLICENSEDwith emptyauthor(decide the intended licensing model) - Publishing metadata: consider adding
repository,homepage,bugs, andfilesinpackage.json - Exports map: consider adding
"exports"for NodeNext consumers (ESM/CJS clarity) - CI: no GitHub Actions/workflow included for
npm test/npm run build - Formatting/linting: no formatter/linter config (Prettier/ESLint) or
npm run lint - Release process: no changelog/versioning guidance (Changesets or similar)
Usability gaps (what’s unclear / what would make adoption smoother)
- Config validation: there’s no runtime validation (e.g.,
enabled: truebut missingsearchAdapter)—today this silently returns an “empty context” stub; this is convenient for tests but surprising in production unless documented. - Deterministic IDs / caching hooks: outputs include
cached: falsealways; if you intend real caching, the API should document cache keys and howsubjectIdis expected to be chosen.
