@gozio/gosearch-ingest
v0.0.3
Published
Node.js ingest pipeline: converts location/category JSON into gosearch SQLite FTS5 database
Maintainers
Keywords
Readme
gosearch-ingest
Node.js ingest pipeline for the Gozio FTS5-based location search engine.
Published as the @gozio/gosearch-ingest npm package.
What's here
src/
db.js AUTOGENERATED — schema SQL, initDb(), and shared constants
(source of truth: shared/schema/constants.json + fts5.json)
normalize.js Text and tag normalization (shared by ingest modules)
category_ingest.js Ingest categories JSON → categories / category_tags tables
location_ingest.js Ingest places JSON → locations / location_tags /
locations_fts / location_curated_lists
bin/
seed-db.js CLI: generate a gosearch SQLite database from JSON files
index.js Library entry point — exports ingestToSqlite()
data/ Generated at runtime — not committedDesign overview
See SEARCH_STRATEGY.md for the full schema rationale, FTS5 column weights, query patterns, and ingest decisions. Key points:
- FTS5 + BM25 replaces the old C++ bigram/dice-coefficient engine.
- Tags are bucketed into three FTS columns (
tags_high/tags_normal/tags_low) so per-tag weights survive into BM25 scoring. - Short name tokens (< 5 codepoints) get a synthetic
tags_highentry so exact-name locations outscore longer prefix matches for short queries like "lab" or "ent". - Lowercasing and diacritic stripping are handled by SQLite's
unicode61 remove_diacritics 2tokenizer at both index and query time — no application-layer normalization needed.
Requirements
- Node.js >= 22 (uses the built-in
node:sqlitemodule)
Using the library
const { DatabaseSync } = require('node:sqlite');
const { initDb } = require('@gozio/gosearch-ingest/src/db');
const { ingestCategories } = require('@gozio/gosearch-ingest/src/category_ingest');
const { ingestLocations } = require('@gozio/gosearch-ingest/src/location_ingest');
const db = new DatabaseSync(outPath);
initDb(db, locale);
ingestCategories(db, categories, locale); // categories — array of category objects
ingestLocations(db, networkId, locations, locale); // locations — array of location objects
db.close();categories/locations— plain JS arrays (parsed from JSON)locale— e.g.'en','es'outPath— path where the SQLite file will be written
For a single-command CLI wrapper, see bin/seed-db.js below.
CLI: seed-db
Generate a database from JSON files:
# Production format
node bin/seed-db.js \
--db <path> \
--locale <locale> \
--network <network-id> \
--categories <categories.json> \
--locations <locations.json>
# Fixture format (gosearch test suite)
node bin/seed-db.js \
--db <path> \
--fixture <fixtures.json>The --fixture format expects the gosearch test data schema:
{ categories: [...], locations: [...], curated_lists: [...] }
and applies post-ingest fixups: curated list insertion.
