concept-miner
v1.1.0
Published
Deterministic concept extraction from natural language.
Maintainers
Readme
concept-miner
Deterministic concept extraction from natural language.
Purpose
concept-miner is productized as a CommonJS Node.js package for deterministic, contract-driven concept extraction.
Given text input, it extracts explicit concepts in canonical, deduplicated, traceable form.
Current State
This repository currently contains:
- product-facing contracts:
openapi/openapi.yamlschema/concepts.schema.json
The full productization backlog is tracked in TODO.md, and staged milestones are in ROADMAP.md.
Runtime boundary:
- package payload is product runtime only.
Target Mode Model
default extended mode(default and only product mode): extraction with wikipedia/wikipedia-title-index information.
Development
npm ci
npm run lint
npm test
npm run dev:check
npm run dev:report:metrics
npm run dev:report:maturity
npm run ci:check
$env:RELEASE_TARGET_VERSION = (node -p "require('./package.json').version")
npm run release:checkJavaScript API (Current)
const { extractConcepts, validateConcepts } = require("concept-miner");
const doc = await extractConcepts("alpha beta alpha", {
mode: "default-extended",
});
const validation = validateConcepts(doc);CLI (Current)
concept-miner extract --text "alpha beta alpha" --mode default-extended --out concepts.json
concept-miner validate-concepts --in concepts.jsonOptional runtime enrichment flags for extract:
concept-miner extract --text "alpha beta alpha" \
--mode default-extended \
--wikipedia-title-index-endpoint "http://127.0.0.1:32123" \
--wikipedia-title-index-timeout-ms 1500In default-extended mode, wikipedia-title-index is required. If unavailable or timed out, extraction hard-fails.
REST API
The REST contract is specified in openapi/openapi.yaml, and an in-repo runtime server is available:
npm run serve:apiDefault server bind:
http://127.0.0.1:32180
POST /v1/concepts/extract:
curl -sS -X POST "http://127.0.0.1:32180/v1/concepts/extract?view=compact" \
-H "Content-Type: application/json" \
-d '{
"text": "A webshop accepts orders."
}'Default-extended runtime enrichment options:
{
"text": "The quick brown fox jumps over the lazy dog.",
"options": {
"mode": "default-extended",
"wikipedia_title_index_endpoint": "http://127.0.0.1:32123",
"wikipedia_title_index_timeout_ms": 1500
}
}Release
This repository follows a dual-stream release model:
- Git stream:
- versioned commits
- annotated tags
- optional GitHub Releases
- npm stream:
npm publish- registry propagation checks
- post-publish smoke tests
Relevant documentation:
docs/NPM_RELEASE.mddocs/REPO_WORKFLOWS.mddocs/OPERATIONAL.mddocs/DEV_TOOLING.mddocs/RELEASE_NOTES_TEMPLATE.mddocs/releases/v0.10.0.mddocs/releases/v1.0.0.mddocs/releases/v1.0.1.mddocs/releases/v1.0.2.mddocs/releases/v1.0.3.mddocs/releases/v1.0.4.mddocs/releases/v1.0.5.mddocs/BASELINE_TEST_RUN.mddocs/FROZEN_REFERENCES_POLICY.mddocs/GENERATED_REPORT_ARTIFACTS_POLICY.mddocs/CONTRACT_ALIGNMENT.mddocs/GUARANTEES.mddocs/STATUSQUO.mddocs/TEMPLATE_SETUP.mdCONTRIBUTING.mdSECURITY.mdCHANGELOG.mdproject.config.json
Release automation:
.github/workflows/release.ymlprovides a manualworkflow_dispatchrelease check.
License
See LICENSE.
