wikipedia-title-index
v1.2.6
Published
Local Wikipedia title index build and constrained query service.
Readme
wikipedia-title-index
Local Wikipedia title index builder and constrained query service.
Requirements
- Node.js >= 24.10.0
- CommonJS runtime
What it provides
- Streaming index build from file or URL
- Local SQLite index (
titles(t TEXT PRIMARY KEY)) - Local REST query service with SQLite authorizer policy
- CLI for build/serve/title lookup/cache/status/clean
Install
npm i wikipedia-title-indexCLI
npx wikipedia-title-index build [--file <path> | --url <url>]
npx wikipedia-title-index serve
npx wikipedia-title-index query "<title-or-prefix>" [limit]
npx wikipedia-title-index cache clear
npx wikipedia-title-index status
npx wikipedia-title-index cleanInstalled-package usage:
- Use
npx wikipedia-title-index ...afternpm i wikipedia-title-index. - Plain
wikipedia-title-index ...works only when installed globally (or when your shell PATH includes local npm bins).
Query modes (important)
wikipedia-title-index has two query surfaces with different capabilities:
| Surface | How to use | Supports raw SQL? | Purpose |
|---|---|---:|---|
| CLI | npx wikipedia-title-index query "<title-or-prefix>" [limit] | No | Exact + prefix title lookup only |
| REST API | POST /v1/titles/query with JSON { sql, params, max_rows } | Yes | Policy-constrained SQL SELECT queries |
Notes:
- The CLI
querycommand does not accept SQL text. - SQL policy enforcement (SQLite authorizer) applies to the REST SQL endpoint.
CLI title lookup example
npx wikipedia-title-index query "Albert" 5REST SQL example
Start service:
npx wikipedia-title-index serveRun SQL query:
curl -sS -X POST http://127.0.0.1:32123/v1/titles/query \
-H "content-type: application/json" \
-d "{\"sql\":\"SELECT t FROM titles WHERE t >= ?1 AND t < ?2 ORDER BY t\",\"params\":[\"Albert\",\"Albert\uffff\"],\"max_rows\":5}"Environment variables
WIKIPEDIA_INDEX_DATA_DIR(default:data)WIKIPEDIA_INDEX_DB_PATH(override full DB path)WIKIPEDIA_INDEX_SOURCE_URL(default:https://dumps.wikimedia.org/enwiki/latest/enwiki-latest-all-titles-in-ns0.gz)WIKIPEDIA_INDEX_AUTOSETUP(0disables auto-setup)SECS_WIKI_INDEX_PORT(default:32123)WIKIPEDIA_INDEX_CACHE_ENABLED(0disables cache, default:1)WIKIPEDIA_INDEX_CACHE_TTL_SECONDS(default:86400,0disables TTL pruning)WIKIPEDIA_INDEX_CACHE_MAX_ENTRIES(default:10000,0disables size pruning)
Query cache:
- Successful
/v1/titles/queryresponses and CLIqueryresults are cached indata/cache/by request shape. - Cache keys include DB fingerprint (
path + size + mtime) to avoid stale reuse after rebuilds.
REST service
Start:
npx wikipedia-title-index serveEndpoints:
GET /healthPOST /v1/titles/query
POST /v1/titles/query request body:
sql(required): SQLSELECTstatementparams(optional): SQL parametersmax_rows(optional): response row cap (bounded by server limits)
OpenAPI contract: openapi/openapi.yaml
Docs
- Operations:
docs/OPS.md - Package contract:
docs/NPM.md - Agent behavior contract:
docs/AGENT.md
Data and licensing
Wikipedia title data is subject to Creative Commons Attribution-ShareAlike (CC BY-SA). This package does not alter, reinterpret, or relicense the underlying data.
Notes
data/artifacts are local runtime/build output and are not part of npm publish.- SQL access is constrained to read-only
SELECTonmain.titles.t. - Lock behavior (
v1.0.1+): lock files are validated by recorded PID liveness. Stale locks are auto-removed; live locks block concurrent start.
