@alexshrestha/marble

v0.3.0

Published

23 days ago

Hyper-personalized content curation through person synthesis and clone-population simulation.

0High
0Medium
0Low

simulation personalization person-synthesis knowledge-graph user-modeling ai recommendations vector-search link-prediction

Marble

Hyper-personalized content curation through person synthesis and simulation.

import { Marble } from 'marble';

const marble = new Marble({
  storage: './user-kg.json',
  llm: async (prompt) => callYourLLM(prompt),
});
await marble.init();

// Rank items — second arg is ephemeral context, not user data.
// User data is built up through react() / feedbackBatch() / learn().
const top10 = await marble.select(items, {
  calendar: ['investor call 14:00'],
  active_projects: ['launch prep'],
});

Marble creates multiple simulated versions of a user, tests them against real-world signals, and learns which version predicts their actual behavior. No thumbs-up buttons needed.

Why Marble?

Works from User One

Creates multiple synthetic versions of a user, tests them against real signals, and evolves the best-predicting clone daily — no large user base required.

Predicts from Day Zero

Creates personalized recommendations within hours of signup—from just 3 interactions, Marble synthesizes preferences across timing, context, and novelty without waiting for behavioral patterns.

Explains the WHY

Hypothesis-driven insights with confidence scores — "Why this matters to your specific goals right now."

Synthesizes Missing Intelligence

Generates insights about relationships, timing, and stakeholder concerns you never explicitly provided.

Models Your Network

Understands the people who influence your decisions and tailors recommendations for multi-party dynamics.

The result: Content that feels like "How did it know I needed to see this today?"

Marble vs. The Competition

What Marble Does

| Capability | How | |------------|-----| | Day-one intelligence | Synthetic clones work immediately — no cold start | | Temporal awareness | Considers calendar, deadlines, project phases | | Relationship modeling | Understands stakeholder concerns and decision dynamics | | Business outcome focus | Optimizes for your KPIs, not generic engagement | | Predictive reasoning | "Why this will help your meeting" — not just similarity scores | | Privacy-first | Runs locally, no data upload required |

Real-World Scenarios

Day-Zero Personalization New user signs up, visits 3 stories on AI safety and startup hiring. Marble generates a clone, tests it against the stories they engaged with, surfaces "Engineering culture during scaling" before they ask for it.

Implicit Preference Learning No rating buttons. Marble reads dwell time (45 sec on AI policy piece), skip pattern (2 sec on crypto), and detects emerging interests—then predicts what's relevant tomorrow based on what resonates today.

Temporal & Context Evolution Continuously refines predictions by moment—morning check: growth strategies, afternoon: customer success challenges, as context shifts.

Zero Data Cold Start Immediate personalization through synthetic clone evolution — no waiting for behavior data.

Technical Architecture

// Marble approach
const contextGraph = {
  interests: { ai: 0.8, startups: 0.6 },
  calendar: [{ event: "investor_pitch", time: "today 2pm" }],
  relationships: {
    skeptical_cto: { concerns: ["security"], influence: 0.9 }
  },
  activeProjects: [{ name: "product_launch", deadline: "2026-04-15" }]
};
// Predict based on business context + psychology

7-Dimensional Scoring vs. Similarity Matching

// CF: Single similarity score
score = cosineSimilarity(userPrefs, itemFeatures);

// Marble: Multi-dimensional business intelligence
magic_score = interest(0.25) + temporal(0.30) + novelty(0.20)
            + actionability(0.15) + source_trust(0.10)
            × freshness_decay × stakeholder_alignment;

Why competitors can't easily copy this: Requires rebuilding recommendation infrastructure from scratch—context graphs, business metric optimization, relationship modeling, and temporal intelligence. Not a feature add to existing systems.

Quick Start

npm install marble

import { Marble } from 'marble';

const marble = new Marble({
  storage: './user-kg.json',
  llm: async (prompt) => callYourLLM(prompt), // enables L1.5-L3 pipeline
});
await marble.init();

// (Optional) Ingest existing user data
await marble.ingestConversations('./chatgpt-export.json');

// Rank items. Second arg is ephemeral context — calendar/projects/mood,
// not user profile data.
const ranked = await marble.select(items, {
  calendar: ['investor call 14:00'],
  active_projects: ['launch prep'],
});

ranked.forEach((item, i) => {
  console.log(`${i+1}. [${item.relevance_score.toFixed(3)}] ${item.title}`);
});

// Record a reaction — pass the full item, not just the id
await marble.react(items[0], 'up');

// Or process an entire batch at once (contrastive learning: Day 2 > Day 1)
await marble.feedbackBatch([
  { item: items[0], reaction: 'up' },
  { item: items[1], reaction: 'skip' },
  { item: items[2], reaction: 'share' },
]);

// Run the canonical learning pipeline. `learn()` is the one entry point
// you need to call — it runs every layer in order:
//   seedClones → L1.5 insight swarm → L2 inference → L3 clone evolution
//   → refreshClones → rebuildVectorIndex → cluster → predictLinks
//   → hypothesisTesting
// Each post-evolution stage is idempotent and graceful — they skip
// cleanly when their inputs aren't ready (no embeddings provider, no
// vector index, etc.). No separate orchestrator call needed.
const stats = await marble.learn();
// { insights: 7, candidates: 4, clones: 12 }

// (Optional) L2 trait synthesis — derives structured traits with replication,
// contradiction, and emergent-fusion origins. Persists to kg.user.syntheses[].
await marble.synthesize();

// (Optional) Churn scan + salience diagnostic. Detects "serial pivoter"-style
// traits that live in the time series of belief invalidations, and reports
// how many of the KG's nodes are stale-active one-offs.
const { churnSyntheses, distribution } = await marble.rebuild();

learn() is required for the "Day 2 > Day 1" progressive improvement claim. react() and feedbackBatch() record signals into the KG, but the clone population, inference engine, and insight swarm only update when you call learn(). A typical integration calls learn() after every N reactions (e.g. N=10) or on a daily schedule. Without it, ranking relies on interest aggregation alone and will not show clone-driven improvements over time.
Caveat — clones add the most value on warmer profiles. With ≥ ~20 signals and items that carry real category labels, clone consensus reranks correctly and warm > cold (validated on Last.fm — see Validation). On very thin cold-start data (3–10 history items, no real category metadata), the LLM-driven seedClones step can introduce variance large enough to hurt top-K. Lower cloneBoostWeight or gate clone consensus until the KG has enough signal if you see this.
synthesize() and rebuild() are optional and run on their own schedule — synthesize() is LLM-heavy (per-node trait extraction) so daily/weekly is typical; rebuild() is cheap and deterministic, safe to run on every learn() or on a cron.

Run tests:

git clone https://github.com/AlexShrestha/marble.git
cd marble && npm install && npm test

Features

Pluggable providers — Anthropic, OpenAI, DeepSeek, or any OpenAI-compatible host (Moonshot, Together, Fireworks, Groq, OpenRouter, Azure, vLLM) for LLM; OpenAI or DeepSeek for embeddings
Privacy-first — All user state stays on your machine; only per-item scoring/enrichment calls go out to the provider you configure
Three modes — Score (fast), Swarm (rich), WorldSim (B2B PMF)
Implicit learning — Learns from dwell time, scroll depth, forwards, silence
Insight-driven KG — Reasons about WHY, not just WHAT (see docs/insight-kg.md)
Trait synthesis — Structured cross-domain traits with five origin types (single_node, trait_replication, contradiction, emergent_fusion, churn_pattern) — downstream tools match against traits / affinities / aversions as predicates, not prose labels
Salience-aware — getTopSalient() filters for important nodes before any pairwise pass; stale one-off facts fade automatically; churn scan surfaces "serial pivoter" traits that live in the time series of invalidations
Relationship-aware — Models the people in a user's life to improve recommendations
Narrative arc — Stories sequenced for flow, not just ranked by score

How It Works

┌──────────────────────────────────────────────────┐
│  1. GATHER                                        │
│  RSS, HN, NewsAPI + World Signals (trends,        │
│  search volume, social velocity)  ~100 stories    │
├──────────────────────────────────────────────────┤
│  2. SCORE / SWARM                                 │
│  Score: magic_score formula (embeddings-based)     │
│  Swarm: 5 agents evaluate through different lenses │
├──────────────────────────────────────────────────┤
│  3. ARC REORDER                                   │
│  Sequence into narrative flow (opener → closer)    │
├──────────────────────────────────────────────────┤
│  4. DELIVER                                       │
│  Telegram, Email, JSON API, Webhook, Video         │
├──────────────────────────────────────────────────┤
│  5. LEARN                                         │
│  L1.5 Insight Swarm (7 psychological lenses)       │
│  → L2 Inference + Temporal Patterns                │
│  → L3 Clone Evolution (kill bottom 20%, mutate)    │
├──────────────────────────────────────────────────┤
│  6. SYNTHESIZE (optional, LLM-heavy)              │
│  Trait extraction → Replication grouping           │
│  → Contradiction detection → K-way fusion          │
│  → kg.user.syntheses[] (5 origin types)            │
├──────────────────────────────────────────────────┤
│  7. REBUILD (optional, deterministic)             │
│  Churn scan (slots reassigned ≥3× in 180d)         │
│  + Salience distribution diagnostic                │
└──────────────────────────────────────────────────┘

Three Modes

| Mode | What it does | Use case | |------|-------------|----------| | Score (v1) | Deterministic scoring against user KG | Fast, predictable, no API calls | | Swarm (v2) | Multi-agent evaluation with 6 specialized lenses (injectable — see new Swarm(kg, { lenses })) | Richer selection, catches what scoring misses | | Debate (v2+) | Swarm + a second LLM round on items where agents disagree (variance > 0.04) | When divergent agent opinions should be reconciled, not just averaged | | WorldSim (v3) | Population-level simulation for product-market fit | B2B — "which users for this product?" |

The Magic Score

magic_score = interest(0.25) + temporal(0.30) + novelty(0.20)
            + actionability(0.15) + source_trust(0.10)
            × freshness_decay

Interest match (25%) — Semantic similarity via local ONNX embeddings
Temporal relevance (30%) — Is this relevant TODAY? (calendar, projects, deadlines)
Novelty (20%) — Surprise factor (inverse topic frequency)
Actionability (15%) — Can the user act on this?
Source trust (10%) — Learned per-source credibility

Swarm Agents (default lens set)

Six agents, each asking a different question. The set is the default when new Swarm(kg) is constructed without a lenses option; callers can inject their own lens array for tailored/per-user curation.

| Agent | Weight | Question | |-------|--------|----------| | Career | 25% | "Will this help their business?" | | Timing | 25% | "Does this matter TODAY specifically?" | | Serendipity | 20% | "Would this delight them unexpectedly?" | | Growth | 15% | "Will this stretch their thinking?" | | Contrarian | 15% | "What is everyone else missing?" | | Social Proof | 10% | "How well-received is this among the broader population and similar users?" |

Note on "swarm" naming. Marble has three distinct systems all called "swarm" — this one (narrative curation with static-by-default lenses), generateAgentFleet (programmatic per-story scoring, always dynamic), and runInsightSwarm (L1.5 psychological probing, always dynamic). They don't share wiring. See docs/architecture.md for the distinction.

Architecture

marble/
├── core/                   # The engine (standalone)
│   ├── index.js           # Main Marble class — select/react/learn/synthesize/rebuild
│   ├── kg.js              # Insight-driven knowledge graph (v2)
│   ├── scorer.js          # magic_score computation
│   ├── swarm.js           # Multi-agent curation (5 lenses)
│   ├── insight-swarm.js   # L1.5 psychological probe committee (7 lenses)
│   ├── inference-engine.js# L2 inference: L1.5 passthrough + temporal patterns
│   ├── trait-synthesis.js # L2 trait synthesis (4 origins) — per-node extraction,
│   │                      # replication, contradiction, K-way fusion
│   ├── salience.js        # Salience scoring + churn scan (5th origin:
│   │                      # churn_pattern) + getTopSalient/salienceDistribution
│   ├── clone.js           # Digital twin — user snapshot for simulation
│   ├── evolution.js       # Clone population evolution
│   ├── signals.js         # Implicit signal detection
│   ├── arc.js             # Narrative arc reranking (10 slots)
│   ├── decay.js           # Exponential decay (14-day half-life)
│   ├── embeddings.js      # Local ONNX embeddings (384-dim)
│   └── types.js           # Type definitions, weights
│
├── web/                 # Web reader + signal tracker + dashboard
│   ├── reader.js        # Story page (tracks dwell, scroll, clicks)
│   ├── tracker.js       # Signal collection endpoint
│   └── dashboard.js     # User profile visualization
│
├── adapters/
│   ├── sources/         # RSS, HackerNews, NewsAPI
│   ├── delivery/        # Telegram, Email, API, Webhook
│   └── signals/         # World signals (trends, velocity)
│
├── worldsim/            # World Clone — B2B product-market fit
│   ├── archetypes.js    # Synthetic user population
│   ├── pmf.js           # PMF analysis engine
│   └── index.js         # WorldSim class
│
├── api/                 # REST API server
├── test/                # Test harness (30 stories)
├── examples/            # Integration examples
└── docs/                # Detailed documentation
    ├── architecture.md
    ├── api-reference.md
    ├── insight-kg.md
    ├── archetypes-relationships.md
    └── contributing.md

Core Concepts

Knowledge Graph (Insight-Driven)

Not a flat interest tracker. Marble's KG generates hypotheses about WHY a user cares about something, then tests those hypotheses with content.

         YOU (root)
        / | \   \
 projects interests people calendar
    |        |        |       |
"side-app" "AI/ML" "co-founder" "call 14:00"
    |        |        |       |
[stories connecting to these nodes score higher]

Every signal triggers hypothesis generation, not just a weight increment. See docs/insight-kg.md for the full deep-dive.

Digital Twin (Clone)

A synthetic snapshot of the user for simulation. Captures weighted interests, behavioral patterns, today's context, and source trust. The evolution engine spawns N variants and kills the bottom 20% per cycle — survivors converge on real preferences over repeated learn() cycles as the KG accumulates signal.

Narrative Arc

Top 10 stories aren't just ranked — they're sequenced:

| Position | Role | Purpose | |----------|------|---------| | 1 | Opener | High energy, attention-grabbing | | 2 | Bridge | Transition to substance | | 3-4 | Deep dives | Core insights | | 5 | Pivot | Change of pace, surprise | | 6 | Deep dive | Third substantive piece | | 7 | Practical | Actionable, how-to | | 8 | Horizon | Future-looking | | 9 | Personal | Close to home | | 10 | Closer | Warm, human, memorable |

Signal Layers

No thumbs-up/down needed. Three layers of implicit feedback:

| Layer | Weight | Signals | User effort | |-------|--------|---------|-------------| | World | ~80% | Trends, search volume, social velocity | Zero | | Sector | ~15% | Industry forums, competitor activity | Zero | | Personal | ~5% | Dwell time, forwards, replies, silence | Passive |

Reactions are multi-signal

A reaction is one record of user engagement with an item. Marble accepts a structured shape so a single physical action can decompose into multiple signals:

// One click on a recommended item produces three reactions:
kg.recordReaction({ item_id, signal: 'title_attention',  validity: 0.40, polarity: +1 })
kg.recordReaction({ item_id, signal: 'click_through',    validity: 0.20, polarity: +1 })
kg.recordReaction({ item_id, signal: 'destination_bounce', validity: 0.55, polarity: -1, meta: { reason: 'paywall' } })

Clones learn three different things: title patterns work, source has friction, recommend similar titles but route around the source. ClonePopulation.evolve multiplies fitness deltas by validity × polarity per reaction.

Validity calibration — caller-tunable, no enforcement:

| Signal cost | Example | Validity range | |---|---|---| | Real-world action | purchase, calendar booking, workout completed | ~1.0 | | Sustained engagement | read-to-end, full-track-play | 0.65–0.85 | | Click | tap, follow-link | 0.2–0.5 | | Hover, visible-impression | mouse-over, scrolled-past | 0.1 |

Friction signals get polarity: -1 with validity proportional to confidence (quick bounce ≈ 0.4, explicit "this was bad" ≈ 0.85).

Marble does NOT ship a signal taxonomy. What signals exist and how strong each is are consumer decisions — a music app cares about "song-full-play", a fitness app about "workout-completed", a news app about "read-to-end". Marble accepts whatever recordReaction calls come in and multiplies. The legacy marble.react(item, 'up') surface still works as a convenience for the simple thumbs-up case.

Curator (fact verification, the keystone)

Marble's pipeline is one-shot at install: init → ingest → learn → investigate → synthesize. After it runs, evolution requires reactions — but until reactions are wired, the system is structurally inert past cold-start. The curator is the missing piece that turns this into a continuous learning loop.

# Run periodically (cron / launchd) — picks 15 suspect facts, classifies each
*/30 * * * * cd /path/to/project && marble curate --limit 15 >> ~/.marble/curator.log 2>&1

The curator never deletes (valid_to retirement is mechanical reconciliation's job). It applies one of four decisions per fact:

| Decision | Action | |---|---| | confirm | bump strength + evidence_count | | unclear | keep fact, lower strength, flag _meta.challenge_candidate: true (surfaced via select({ includeChallenges: N })) | | ambiguous | lower strength + write gap:<topic> belief (unblocks seedClones() 50-archetype path) | | skip | leave alone |

Run undo with marble curate --revert <run_id> if a curator pass made wrong calls — every decision has a per-fact _meta.history entry the revert walks.

Don't run marble learn and marble curate concurrently — both write KG state. Schedule them sequentially in cron.

Want to see the KG and the curator running live? There's a separate companion 3D visualization — Hono server + 3d-force-graph front-end — that renders your graph, streams the autonomous curator loop with animated decisions, and lets you chat with the KG. Not bundled with marble core (marble ships as a library); see docs/graph-visualization.md for setup including launchd / systemd auto-start.

Integration Modes

1. Local-First (Recommended)

const marble = new Marble({ mode: 'local' });
const results = await marble.select(stories, userContext);

2. Enhanced (Optional LLM)

const marble = new Marble({
  mode: 'enhanced',
  llm: async (prompt) => await yourLLMProvider(prompt)
});

Any (prompt) => string works. Or skip writing an adapter entirely: marble has six built-in providers driven by LLM_PROVIDER — anthropic, openai, deepseek, openai-compatible (Moonshot / Together / Groq / OpenRouter / vLLM / Ollama / LM Studio), opencode (free, local CLI, no API key), and claude-cli (uses your Claude Code subscription). See docs/llm-providers.md for the env vars, defaults, and budget controls.

3. World Clone (B2B PMF)

import { WorldSim } from 'marble/worldsim';
const worldsim = new WorldSim();
const pmf = await worldsim.simulate(yourProduct);
console.log(`PMF Score: ${pmf.pmf_score}/1.0`);

Documentation

| Doc | What it covers | |-----|---------------| | Installation & Setup | Full setup guide, configuration, adapters, troubleshooting | | How It Works | The data synthesis process explained simply | | Architecture | Full system design, data flow, component interactions | | API Reference | Every endpoint and function with examples | | Usage Examples | Real code showing how to integrate Marble | | LLM Providers | Wiring Anthropic / OpenAI / DeepSeek / opencode / Claude CLI / Ollama into the llm option, including free no-API-key paths | | Graph Visualization | Companion 3D viz project, autonomous curator loop, launchd/systemd auto-start, building your own renderer | | Glossary | Disambiguates overloaded terms (swarm / clone / curate) — read this if you're confused which "curate" or "swarm" a piece of code means | | Competitive Positioning | Why Marble isn't just "better collaborative filtering" | | Insight-Driven KG | How Marble reasons about WHY, not just WHAT | | Archetypes & Relationships | Relationship simulation, archetype generation | | Contributing | How to contribute to Marble |

Validation

Marble's ranking claims have been validated against public datasets via the marble-bench benchmark suite. All runs use seeded PRNG and are reproducible. The harness covers six datasets across news, e-commerce, and music, with both cold-start and post-learn() measurements, run against three LLM providers (OpenAI gpt-4o-mini, Anthropic Claude Haiku 4.5, Moonshot Kimi K2.5).

| Dataset | Metric | Result | |---|---|---| | MIND-small (news, ~70K test impressions, 100 users) | nDCG@10 cold-start | +45–46% over popularity baseline; MRR ~2×, P@5 ~1.7× | | Amazon Reviews 5-core — Gift_Cards | nDCG@10 (K=20) | 0.944 vs 0.629 popularity (+50%) | | Amazon Reviews 5-core — Video_Games | nDCG@10 (K=20) | 0.928 vs 0.564 popularity (+65%) | | Last.fm-1K (year-1 → year-2 drift, 5 users, post-learn()) | warm nDCG@10 | +28% on user with room to improve (1/5 users; 4/5 already at ceiling 1.000) |

Honest caveats:

AUC sits near 0.5 on topic-thin MIND data — top-K ranking is strong (nDCG/MRR/P@5 all beat baselines), but the tied tail is what AUC penalizes. The OOTB-pass-3 #interestMatch embedding fallback addresses the structural cause; remaining ties are dataset artefacts (HuggingFace mteb/mind_small strips category metadata).
learn() clone consensus can hurt top-K on cold-start MIND profiles (3–10 history items, no real category labels). The variance comes from LLM non-determinism in seedClones — same seed, two runs, different archetype hypotheses. Two follow-ups suggested: down-weight cloneBoostWeight (currently 0.3) when seedClones ran in the cold-start branch, or gate clone consensus behind a KG-size minimum (e.g. ≥ 20 signals).
Provider behaviour matters. Kimi K2.5 emits 1–2K tokens of chain-of-thought prose before the JSON array on the seedClones prompt, hits the 4096 max_tokens cap mid-output, and returns LLM_UNPARSEABLE. This is now surfaced in learn() failures rather than silently degrading; raise max_tokens or use a model that respects "respond with only JSON" instructions.

The benchmark suite also caught 15 real Marble issues during build, 14 of which are merged across the OOTB integration passes (743091d, 9c69de5, 016eaa2, 6fe7cf6, f8cd52a).

License

MIT