knowlifijs
v0.2.1
Published
Turn arbitrary JSON into structured, explainable knowledge: schema discovery, domain detection, event classification, entity extraction, validation and confidence scoring.
Maintainers
Readme
KnowlifiJS
Turn arbitrary JSON into structured, explainable knowledge.
KnowlifiJS inspects JSON of any shape — a news feed, a CRM export, a sensor log — and produces a canonical "knowledge" representation: it finds the title/description, extracts entities, infers the domain (crypto, healthcare, legal, sports, ...), classifies the event/topic, checks for internal contradictions, validates the result, and scores its own confidence. Every decision comes with a human-readable reasoning trail.
Overview
KnowlifiJS is a domain-agnostic knowledge extraction pipeline. It does not hardcode a single schema — instead it discovers structure on the fly:
- Schema Discovery — flattens any JSON object into typed paths (text, date, numeric, boolean).
- Dynamic Field Detection — scores fields to find the most likely title and description, plus candidate entities and dates.
- Entity Extraction — types and scores entities (organizations, people, locations, instruments, facilities, ...).
- Domain Inference — infers the topical domain (crypto, healthcare, legal, sports, etc.) from a weighted vocabulary lexicon.
- Event Classification — classifies the underlying event using a hierarchical taxonomy (Investment, Corporate, Legal, Security, Market...).
- Consistency Checking — flags contradictions between headline, summary, and detected event sentiment.
- Validation — sanity-checks that extracted facts are grounded in the source record.
- Confidence Scoring — combines every signal above into a single,
explainable confidence score (always
< 1).
Every result includes a reasoning trail explaining why a domain or
event was chosen.
Installation
npm install knowlifijsWorks in Node.js (>= 18) and modern browsers. Ships ESM + CommonJS builds with full TypeScript declarations.
Quick Start
import { parse } from 'knowlifijs';
const result = await parse({
headline: 'Acme Robotics raises $50M in Series B funding',
body: 'Acme Robotics, a leading robotics company, announced today that it raised $50M in a Series B funding round led by Vertex Holdings.',
company: 'Acme Robotics',
createdAt: '2025-12-28'
});
console.log(result.knowledge.primarySubject);
// "Acme Robotics raises $50M in Series B funding"
console.log(result.domain);
// { name: 'venture_capital', confidence: 0.74 }
console.log(result.event);
// { category: 'Investment', type: 'Funding Round', confidence: 0.9 }
console.log(result.confidence);
// 0.85parse() also accepts an array of records and returns an array of results:
const results = await parse([recordA, recordB, recordC]);Advanced Usage
Use the Parser class for repeated parsing with shared options, and to
register plugins:
import { Parser } from 'knowlifijs';
const parser = new Parser({
detectDomain: true,
detectEvents: true,
extractEntities: true,
checkConsistency: true,
validate: true
});
const result = await parser.parse(record);
const results = await parser.parseAll([recordA, recordB]);Plugins
Plugins can post-process every parsed record without modifying core code:
import { Parser } from 'knowlifijs';
import type { KnowlifiPlugin } from 'knowlifijs';
const tagHighConfidence: KnowlifiPlugin = {
name: 'tag-high-confidence',
afterParse(result) {
if (result.confidence > 0.8) {
return {
...result,
knowledge: {
...result.knowledge,
keywords: [...result.knowledge.keywords, 'high-confidence']
}
};
}
}
};
const parser = new Parser({ plugins: [tagHighConfidence] });
// or: parser.use(tagHighConfidence);See Extensibility below for the plugin contract.
Configuration
ParserOptions:
| Option | Type | Default | Description |
| ------------------ | ----------------------- | ---------- | ---------------------------------------------------- |
| detectDomain | boolean | true | Run domain inference (result.domain). |
| detectEvents | boolean | true | Run event/topic classification (result.event). |
| extractEntities | boolean | true | Run entity extraction (result.knowledge.entities). |
| checkConsistency | boolean | true | Run the headline/summary contradiction check. |
| validate | boolean | true | Run the integrity validation layer. |
| includeSentiment | boolean | true | Compute and expose result.sentiment. |
| outputMode | 'full' \| 'slim' | 'slim' | Whether to echo source.originalSchema. See Output Modes. |
| limits | Partial<ParserLimits> | see below | Recursion/payload safety limits. See Security Limits. |
| plugins | KnowlifiPlugin[] | [] | Plugins to register at construction time. |
Disabled phases return neutral placeholder values (e.g. { name: 'unknown', confidence: 0 })
so the result shape is always stable.
Output Modes
By default (outputMode: 'slim'), the result omits source.originalSchema
to keep memory usage down — only source.detectedSchema (the flattened
paths/field types KnowlifiJS discovered) is included. Use
outputMode: 'full' to also echo back the original record, e.g. for
debugging:
const slim = await parse(record); // result.source.originalSchema === undefined
const full = await parse(record, { outputMode: 'full' }); // result.source.originalSchema === recordSecurity Limits
Schema scanning enforces safety limits so that deeply nested, circular, or
oversized input can never cause unbounded work or crash the process.
Limits are never enforced by throwing — traversal stops early and a warning
is appended to result.validation.warnings (which also sets
result.validation.passed = false).
interface ParserLimits {
maxDepth: number; // default 20
maxLeaves: number; // default 500
maxArrayItems: number; // default 100
maxPayloadSizeBytes: number; // default 2 * 1024 * 1024 (2MB)
}Possible warnings: max_depth_exceeded, max_leaves_exceeded,
max_array_items_exceeded, circular_reference_detected,
payload_size_exceeded.
const result = await parse(record, {
limits: { maxDepth: 10, maxArrayItems: 50 }
});Adapters
Lightweight, dependency-free adapters convert common feed formats into
records ready for parse()/parseAll():
import { parseJsonFeed, parseRssFeed, parse } from 'knowlifijs';
// JSON Feed (https://jsonfeed.org)
const records = parseJsonFeed(jsonFeedDocument);
const results = await parse(records);
// RSS 2.0 XML
const rssRecords = parseRssFeed(rssXmlString);
const rssResults = await parse(rssRecords);parseRssFeed extracts title, description, link, pubDate, guid,
author and category from each <item>, decodes basic XML entities, and
strips CDATA wrappers. Unrecognized tags are ignored; malformed XML never
throws.
Event Signatures
Every result includes a deterministic eventSignature string, intended for
future semantic deduplication of equivalent records (e.g. the same funding
round reported by multiple sources):
result.eventSignature; // "funding_round:openai:microsoft"It's built from the detected event type plus the top entities (or the
primary subject if no entities were found), slugified and joined with :.
Sentiment Analysis
When includeSentiment is enabled (default), each result includes a
sentiment field:
result.sentiment;
// {
// polarity: 'positive',
// score: 0.6,
// positiveMatches: ['surged', 'growth'],
// negativeMatches: [],
// confidence: 0.6,
// reasoning: ['Positive words: surged, growth']
// }Set includeSentiment: false to omit it.
Performance Notes
- Domain lexicon regexes are compiled once at module load, not per record.
- Entity occurrence counting avoids per-call regex compilation.
- The integrity validation layer skips
JSON.stringify(record)entirely when there are no extracted entities to verify. - Default
outputMode: 'slim'avoids retaining the original record in every result.
See benchmarks/REPORT.md for before/after
numbers. Run npm run bench to reproduce locally.
Socialyx Integration Example
import { Parser, parseRssFeed } from 'knowlifijs';
const parser = new Parser({
outputMode: 'slim',
limits: { maxDepth: 15, maxPayloadSizeBytes: 1 * 1024 * 1024 },
includeSentiment: true
});
const records = parseRssFeed(rssXmlFromUpstream);
const results = await parser.parseAll(records);
for (const result of results) {
if (!result.validation.passed) continue; // skip records that hit safety limits
socialyx.ingest({
subject: result.knowledge.primarySubject,
domain: result.domain.name,
event: result.eventSignature,
sentiment: result.sentiment?.polarity,
confidence: result.confidence
});
}Architecture
JSON record
│
▼
payload size guard — reject-gracefully if > maxPayloadSizeBytes
│
▼
scanSchema() — flatten into typed paths/leaves
(depth/leaves/array limits, circular-ref detection)
│
▼
detectFields() — find title, description, entity candidates, dates
│
▼
extractEntities() — type + score entities
│
▼
buildKnowledge() — primarySubject, keywords, textSummary
│
├─▶ inferDomain() — domain + confidence + reasoning
├─▶ detectEvent() — event category/type + confidence + reasoning
├─▶ buildEventSignature() — deterministic dedup key
├─▶ buildSentimentSummary() — sentiment polarity + confidence (optional)
├─▶ checkConsistency() — headline/summary/event contradiction check
└─▶ validate() — integrity + safety-limit warnings
│
▼
computeConfidenceBreakdown() — combine all signals + explainable breakdown
│
▼
KnowledgeResult (outputMode: 'slim' | 'full')Each phase lives in its own module under src/ and can be imported and
used independently for advanced/extension scenarios:
import { scanSchema, detectFields, inferDomain } from 'knowlifijs';Output Format
interface KnowledgeResult {
source: {
originalSchema?: JsonRecord; // only present when outputMode: 'full'
detectedSchema: {
paths: string[];
textFields: string[];
dateFields: string[];
numericFields: string[];
booleanFields: string[];
};
};
knowledge: {
primarySubject: string;
entities: EntityResult[];
keywords: string[];
textSummary: string;
sourceDomain: string;
};
domain: { name: string; confidence: number };
event: { category: string; type: string; confidence: number };
eventSignature: string;
sentiment?: SentimentSummary; // present unless includeSentiment: false
consistency: ConsistencyResult;
validation: ValidationResult;
confidence: number;
confidenceBreakdown: ConfidenceBreakdown;
reasoning: { domain: string[]; event: string[] };
}Examples
Runnable examples live in examples/:
node examples/crypto.js
node examples/healthcare.js
node examples/legal.js
node examples/sports.js
node examples/startup-funding.jsExtensibility
KnowlifiJS supports a lightweight plugin architecture via KnowlifiPlugin:
interface KnowlifiPlugin {
name: string;
setup?: (parser: ParserLike) => void;
afterParse?: (result: KnowledgeResult, record: JsonRecord) => KnowledgeResult | void;
}setupruns once when the plugin is registered.afterParseruns after every record is parsed and can return an augmented result (orvoidto leave it unchanged).
This is enough to build domain-specific plugins (e.g. a CyberSecurityPlugin
or FinancePlugin) entirely outside of core — core never needs to change to
support them.
FAQ
Does KnowlifiJS call any external APIs / LLMs? No. Everything runs locally using deterministic heuristics and lexicons.
Can I use this in the browser? Yes — the package has no Node-only dependencies and ships an ESM build.
Why is confidence always below 1?
Confidence is intentionally capped (at 0.97) — the heuristics are
probabilistic, and a perfect score would overstate certainty.
How do I add a new domain or event type?
Extend DOMAIN_LEXICON (in src/domains/domainInference.ts) or
EVENT_TAXONOMY (in src/events/eventDetector.ts), or — for purely
additive behavior — write a plugin.
Contributing
See CONTRIBUTING.md.
