@tadalabz/descry
v1.0.0
Published
Configurable URL discovery and processing pipeline library with built-in content handling, retention-aware sharded JSON persistence, hookable context stages, and pluggable selector logic.
Maintainers
Readme
Descry
Definition
descry (verb): to catch sight of something far off
Description
Descry is a URL discovery and processing pipeline library that helps you work through the runtime lifecycle around:
descry(): emit candidate URLs plus discovery telemetrysee(): build the canonical context for one candidateamass(): hydrate that context with content and extracted artifactsselect(): run pluggable selector logic against the hydrated contextremember(): persist the durable result of the run
It is a good fit when you want repeatable discovery, a clear context model, pluggable selectors, built-in content handling, and shard-backed JSON persistence.
Out of the box, descry can create stores, fetch content, extract useful artifacts, persist results, re-read stored content on later runs, and emit structured runtime logs when a channel enables them. In most cases, what you bring is channel configuration and selector logic.
This package is currently built and verified on Node.js 24.x.
Best Fit
Descry is a strong fit for structured, channel-based discovery work:
- repeated runs against durable stores
- selector-driven processing decisions
- channel designs that stay bounded in responsibility
- discovery topologies that split or promote recurring hot areas into their own channels
It is not positioned as a monolithic high-scale crawler platform where one forever-hot channel owns an ever-growing universe of URLs.
Quality Statement
Descry is validated with automated tests across the pipeline, persistence, logging, promotion analysis, and public CLI/package surface, plus repeated multi-run scenario exercises that check durable-state behavior over time. These validations have been used to confirm backlog and recrawl behavior, known-work suppression, discovery-scope control, and promotion-trigger behavior under realistic channel workloads.
Canonical Scope
This document is the high-level introduction to the package.
It tells you:
- what descry is,
- what stages it owns,
- what the published package contains,
- where to go next.
When you want more detail, use:
docs/USAGE.md: basic getting-started usage and first pipeline setupdocs/RUNTIME_WALK_THROUGH.md: plain-English runtime walk-through of one pipeline rundocs/DATA_MODEL.md: exact candidate, context, decision, and persistence shapesdocs/CONTENT_HANDLING_GUIDE.md: built-in content handling and override contractsdocs/PERSISTENCE.md: persistence behavior, mechanics, and limitationsdocs/SELECTOR_GUIDE.md: selector authoring guidancedocs/PRIMARY_CHANNEL_GUIDE.md: primary-channel seed guidance
Published Surface
The published package includes:
README.md: this concept summaryindex.js: package entrypointsrc/: runtime implementationdocs/: the documentation set listed abovetools/create-store.js: public CLI for initializing default persistence storestools/analyze-channel.js: public CLI for reading or calculating canonical channelpromotionAnalysistools/extract-channel.js: public CLI for birthing one child channel from persisted or explicitpromotionAnalysisexamples/: the packaged plain starter example and sample configurationLICENSE: license text
Start with docs/USAGE.md for a simple first setup, then use
examples/README.md for the packaged starter example.
