melete-ai

v0.157.0

Published

5 days ago

Melete — the Self-Driving Discovery Brain. A closed-loop active-experiment engine with a pluggable oracle and a signed, offline-verifiable discovery trace. Mneme remembers; Melete discovers.

Downloads

11,441

0High
0Medium
0Low

mneme_npm

self-driving-lab discovery active-learning experiment-design optimization provenance signed reproducible resonance ai-agent oracle replication cross-agent

Melete

The Sovereign Verifiable AI Analyst & Optimizer

Find the best — and most robust — settings for any system you can measure, in the fewest experiments — then hand over a signed verdict anyone can re-verify offline.

🌐 Live demo → melete.mneme-ai.space

MIT · zero runtime dependencies · runs on your machine

What it is (in one line)

You have a system you can measure — an ML pipeline, a server/DB/network config, a recipe, a simulation. Melete proposes the next setting to try, you measure it (or give a formula), and it converges to the best stable answer — then explains why in plain language and signs a verdict you (or an auditor) can re-check offline. Your data never leaves your machine.

Use it in 60 seconds — 3 ways

1) Through the website (no code). Open the live demo, pick your field (Pharma · Semiconductor · AI/ML · …), press Watch to see it discover, or use guided mode: it proposes → you measure in real life → you type the score → repeat.

2) CLI / npm (on your own machine):

npm i -g melete-ai
melete bench            # measured: beats random / grid search
melete gauntlet         # every engine's correctness check (must be 100)
melete poopt cert.json  # verify a signed certificate offline

3) API — connect your real process (air-gapped). First, get an endpoint to call — two options:

# A) hosted demo (quick try):        base URL = https://melete.mneme-ai.space
# B) self-host (sovereign — data never leaves your machine):
npm i -g melete-ai
melete-server                         # → serves on http://localhost:8790

Then POST to that base URL:

POST /next             { space, observations }              → the next setting to try
POST /aegis            { space, objective, budget }         → the best ROBUST setting (survives wobble)
POST /discover         { space, objective, budget }         → full run + signed Sovereign Verdict + Replay Token
POST /sovereign/verify { …verdict }                         → re-verify provenance OFFLINE
POST /replay/verify    { …token }                           → re-derive the decision step-by-step OFFLINE

…or skip HTTP entirely and call the library in-process: import { sovereignAnalyze, aegisDiscover, proposeNext } from "melete-ai".

✦ What's inside — by category

68 independently-verified modules. Every claim below is a check you can re-run: npx melete-ai gauntlet.

🔍 Optimize — the best setting in the fewest experiments

| capability | what it does | |---|---| | Adaptive discovery | a portfolio of search strategies reaches 99% of the optimum in ≈12 experiments — ≈8× fewer than random (measured over 300 seeds: avg 12.2, 300/300 reached; melete bench) | | Mixed spaces | real · integer · categorical · conditional knobs, not just dials | | Multi-objective | the Pareto front of best trade-offs (yield and cost) | | Noise-robust | the value you can trust under measurement noise, not a lucky spike |

import { proposeNext } from "melete-ai";          // loop: propose → you measure → repeat
const { next } = proposeNext({ space:[{name:"pH",type:"real",min:3,max:9}], observations:obs, goal:"maximize" });

Hosted, no install: POST https://melete.mneme-ai.space/next

🛡 Trust & verify — the honesty stack (no other optimizer ships this)

| certificate | the question it answers — signed, offline-verifiable | |---|---| | 🏅 Trustworthy Discovery | is it REAL (not noise) · CAUSAL (not confounded) · ROBUST (survives wobble)? | | 🏔 Stability | is the optimum reproducible, or a lucky one-off? (STABLE ⇒ reproduced ≥97.5%, measured) | | 💎 Honest-Search Proof | is this a GENUINE search or a FAKED one? Re-derive the trace offline (no oracle) — a forgery is rejected. (360/360 forgeries caught; something an LLM cannot do) | | 🛡 Tolerance Certificate | the certified ±tolerance that still keeps ≥90% of the optimum — a worst-case Lipschitz guarantee, not an average. (8400/8400 off-grid adversarial samples held the floor) | | 📜 Proof of Improvement | switching from setting A to recipe B is a proven gain of ≥Δ — noise-aware 97.5% lower bound; refuses within noise. Common-random-numbers pairing certifies the same gain from ~8× fewer measurements; sequential early-stopping (Bonferroni α-split) stops the moment the gain is certified — ~1.9× fewer on average (41.9 vs 80). (Δ valid ≥97.5%, false-cert ≤2.5%) | | 🔐 Pre-Registration | commit the objective, space, budget & decision rule before running, then prove the result obeyed it — no goalpost-moving, no cherry-picking. (6 deviation classes all rejected; the scientific-integrity layer) | | 🪨 Decision-Breakdown | how many measurements would an adversary (fraud, a glitchy sensor) have to corrupt to flip your "B beats A" verdict? The exact tamper-distance — a strong clean call survives many corruptions, a marginal one flips on one. The cert ships the explicit minimal attack (a witness you re-apply), takes an arbitrary adversary range (real sensor/physical bounds), and a stronger adversary provably never raises the count. (witness truly flips 100%; monotone 100%; an inflated claim caught 100%) | | 📉 Winner's Curse | you searched N settings and reported the best — but that number is inflated (it's the max of N noisy trials, partly luck). The signed selection correction: the winner's TRUE value is ≥ this de-biased lower bound, the discount grows with N, and it works with σ unknown (estimated from replicates, studentized). (valid bound ≥97.5%, measured 99.5%; with σ estimated a plain plug-in breaks at 94.9% — studentized holds 99.3%; naive overstates 90%) | | 🧭 Extrapolation-Guard | is the recommended setting inside the data you measured, or a blind extrapolation? It's flagged with an exact separating-hyperplane witness — proof it's outside the convex hull of your evidence, in any direction (not just out-of-box; it catches an in-box point that's off a correlated-knob manifold, which an axis test misses) — plus a density signal for interior voids. (out-of-box & in-box-off-hull → flagged 100% with a valid, re-verifiable witness; never false-flags an in-data point; a fake "supported" is caught) | | ⏱ Anytime-Valid | an AI agent peeks after every experiment — and naive "stop when p<0.05" then false-alarms ~40% of the time. An e-value martingale (Ville's inequality) stays valid under unlimited peeking + optional stopping, plus a time-uniform confidence sequence — a running interval on the gain valid at all times at once, so the agent can read the estimate at any peek and trust it. (FP ≤ α measured 2.4% vs naive 42%; the CS covers the true gain uniformly 97.4% where a naive per-peek CI holds only 58%; it tightens as evidence accrues) | | 📏 Conformal Prediction | "ŷ ± 1.96σ" assumes Gaussian noise — wrong on real skewed/heavy-tailed data. Split-conformal wraps any predictor with a distribution-free interval, coverage guaranteed ≥ 1−α, finite-sample exact (exchangeability, no assumption). The normalized (adaptive) mode scales by a per-input difficulty so coverage is balanced across input regions under heteroscedastic noise, not just on average. (coverage on target across normal/heavy/skewed — spread 0.2pp vs Gaussian's 2.8pp; 22% tighter than the over-covering Gaussian on skew; exact at n=20; under input-dependent noise plain under-covers the hard region 83% while adaptive balances 90%/90%) | | 🎯 Calibration v2 | when a model/agent says "90% sure", is it right ~90% of the time? Two tests, not one: the global Spiegelhalter Z names over/under-confidence, and a per-bin Hosmer-Lemeshow test catches mid-range miscalibration near p=0.5 where the global Z is structurally blind — Bonferroni-split so the combined false-flag stays ≤ α. Reports ECE, recalibrates, and localizes the worst-calibrated bin. (global Z alone catches the mid-range blind spot only 2.7%; the v2 conditional test catches it 100% and localizes it 100%; calibrated falsely flagged ≤ α; over/under-confidence detected 100% with direction named 100%; recalibration cuts ECE 10.2%→6.4%) | | 👥 Subgroup Validity | "B beats A +3% overall" can hide a segment B actively harms (Simpson's paradox). It tests the effect in every subgroup — declaring UNIFORM-IMPROVEMENT via an intersection-union test (level α, correctly not over-penalized) and flagging HARMED-SUBGROUP via Holm step-down (more powerful than Bonferroni at the same family-wise error), naming the hurt one. (detects + names a harmed segment 100%; pooled test says "improvement" while a segment is harmed 99% — the trap; uniform-claim power 27% vs 6% naive, size-controlled; Holm beats Bonferroni at FWER ≤ α) | | 🔒 Privacy v2 (new) | sovereignty keeps your raw data home — but the moment you share an aggregate (a federated mean, a published stat, a pooled gradient) you can leak the individuals in it (membership inference). This certifies the release is (ε,δ)-differentially private via the tight analytic Gaussian (Balle-Wang) — the minimum noise — reveals only the noised value, and signs it. v2 adds a zCDP accountant: the cumulative budget accumulates ρ additively (composes like √k, not k like basic Σε), so far more releases fit under one budget while staying provably sound. The dishonest failure it catches: under-noising while claiming a small ε. (the optimal membership-inference attack on a certified release stays inside the (ε,δ) region; 1/5 the noise leaks far outside it and is rejected; tight calibration — achieved δ = target δ; for 50 releases zCDP certifies ε=5.3 vs basic 25; under one (ε=3,δ=1e-5) budget zCDP admits 17 releases vs basic's 6 — and the composed attack stays sound) | | 🗑 Unlearning v2 (new) | the "right to be forgotten" with proof, in batches. When users ask to be deleted, a provider's cheapest move is to do nothing and claim it's done. This forgets a whole batch of k records from a ridge model EXACTLY via a Woodbury block rank-k downdate — O(k³+kd²), touching only those records' own contributions, never the other n−k rows — proves the served model equals one retrained from scratch without them and equals deleting them one-by-one (sequential), reports the batch's influence + the residual influence left behind (must be ~0), and signs it. An auditor re-derives it from the Gram matrix alone (never the raw rows). The dishonest failure it catches: fake / partial deletion. (the block downdate equals full retraining and sequential deletion to ~1e-15; a provider that secretly keeps the batch is flagged RESIDUAL-INFLUENCE orders of magnitude above tolerance and a forged "DELETED" cert is rejected; forgets a batch without retraining) | | 🌐 Distribution-Shift (DRO) v2 (new) | every optimizer reports the value it measured on the data it saw — but deployment data drifts (the customer mix, the traffic, the population), and a setting that looks best on the nominal data can collapse under a modest shift. This certifies the worst-case mean over every distribution within a χ²-divergence ball of radius ρ — V = mean − √(ρ·Var), the exact Cauchy-Schwarz-tight bound — guaranteeing "under any shift up to χ² ≤ ρ, the value is provably ≥ V". AEGIS guards input wobble and Tolerance parameter wobble; this guards the data distribution itself, so a fragile high-variance setting is out-ranked by a robust one. v2 adds a confidence mode: setting ρ = z²/n makes V a calibrated (1−α) lower bound on the true mean — robust to sampling error too — the Duchi-Namkoong unification that DRO and a confidence interval are the same object. (over 24,000 reweightings none beat the certified worst case and the aligned adversary achieves it exactly; monotone in ρ; confidence mode covers the true mean 94.5% on light tails ≈95% and over-covers on skew; the confidence-mode bound equals the textbook one-sided CLT bound mean−z·SE exactly) | | ⚖️ Fairness v2 (new) | regulators (the EU AI Act, fair-lending law) demand proof an automated decision doesn't discriminate — but the naive "the rates look equal" check is a trap twice: a real gap can hide in sampling noise, and a harmless wobble can be mistaken for bias. This measures the demographic-parity gap (and, with outcomes, the equalized-odds TPR/FPR gaps) each with simultaneous Bonferroni-corrected Wilson confidence intervals, then returns a calibrated verdict — FAIR / UNFAIR (naming the metric + groups) / INCONCLUSIVE — and signs it. v2 adds intersectional fairness: it tests every intersection of protected attributes too, catching the fairness-gerrymandering bias that hides at an intersection while each attribute alone looks fair. (a biased model is detected + named 100%; a truly fair model is falsely accused ≤ α; the gap CI covers the true gap ≥ 1−α; an XOR-gerrymander model that every marginal test passes is caught UNFAIR at the named intersection 100%, with no false alarm ≤ α) | | 🧩 Attribution (new) | "why was I denied?" is a legal right (GDPR, EU AI Act) — but a single SHAP run or an LLM's rationalization gives numbers nobody can check, and a vendor can quietly tilt them. This computes the exact Shapley value for each feature from the model's own coalition value table and proves the fairness axioms — the credits sum exactly to the prediction (efficiency), identical features get equal credit (symmetry), a do-nothing feature gets zero (dummy), and attribution is additive across models (linearity) — then signs it. A tilted explanation whose credits don't add up to the prediction is rejected on re-derivation. (efficiency holds to ~1e-14; dummy = 0 exactly; symmetric features get exactly-equal credit; linearity to ~1e-14; a pairwise interaction is split fairly +0.2/+0.2; an inflated credit breaks efficiency and is caught) | | 🤝 Verification Receipt (new — two-sided) | every certificate above is a one-way proof an issuer signs about themselves. This turns it two-sided: a verifier (regulator / auditor / customer / counterparty agent) re-derives the issuer's certificate offline and counter-signs a receipt bound to it with their own key. Who benefits: ① the issuer gets a portable, independently counter-signed attestation (worth more to a buyer/regulator than a self-signed claim); ② the verifier gets an offline-checkable record of what they verified and when, tamper-evident — protection if the decision is challenged. Neither has to trust the other. (a receipt over a genuine cert verifies; it's bound to that exact cert (a different cert is rejected); a tampered cert yields a REJECTED verdict and a forged "VERIFIED" receipt is caught; issuer≠verifier independence is enforced — no self-rubber-stamp; works across every certificate kind) | | 📑 SLA Certificate v2 (new — two-sided) | AI is sold on uptime SLAs + "trust us" on quality. This puts the quality in an enforceable, both-party contract: the provider commits measurable terms (calibration ECE ≤ 5%, fairness gap ≤ 0.1, accuracy ≥ 90%, p95 latency ≤ 200 ms — each able to bind to its signed metric certificate) and each period is signed PASS, or BREACH naming exactly which term failed and by what margin. v2 adds a hash-chained compliance ledger over the billing cycle — a tamper-evident history with auto-accrued penalty (removing/altering a period breaks the chain). Who benefits: ① the provider turns "our model is good" into an enforceable promise + a signed track record that wins enterprise deals; ② the consumer gets a guarantee with teeth — provable breaches + the penalty owed, offline-checkable, so refunds aren't he-said-she-said. (compliant→PASS; a drifted term→BREACH named with margin; multi-breach all named; ≤/≥ handled; forged PASS rejected; the ledger computes breach-rate/streak/penalty exactly and catches a tampered or hidden period) | | ✍️ Consent Certificate (new — two-sided) | GDPR consent is a checkbox in a database the company can rewrite. This makes it a two-party signed artifact: the data subject signs a scoped grant (which purposes, which fields, an expiry); the controller's every use is adjudicated against it (ALLOWED / DENIED, re-derived from the grant) and signed; the subject can sign a revocation. Who benefits: ① the subject holds a signed record of exactly what they agreed to and can prove any out-of-scope / expired / post-revocation use (real recourse); ② the controller holds signed use-certificates proving each use was within consent — an audit-ready, liability-bounding trail. (in-scope use ALLOWED; off-purpose / off-field / expired / post-revocation use each DENIED + named; a use before a later revocation stays ALLOWED; a controller forging ALLOWED is rejected on re-derivation; subject≠controller two-party chain) | | 🎫 Trust Passport (new — two-sided) | a vendor shouldn't hand a regulator eight separate proof files. The passport composes many certificates (fairness + calibration + privacy + SLA + consent…) into one signed bundle — each member bound by an order-independent merkle root — that re-verifies every member in a single offline call and names any that fail. It's itself a signed cert, so the Verification Receipt counter-signs the whole bundle. Who benefits: ① the issuer ships one portable artifact (a swapped/tampered member is caught); ② the verifier checks the entire compliance posture at once + sees exactly which member failed, then counter-signs once. (composes 3+ kinds → ALL-VERIFIED; a tampered or swapped member is rejected by hash-binding; a forged "all-verified" with a failing member is caught; merkle root is order-independent; a two-party receipt over the passport verifies) | | 🧬 Model Supply-Chain (AIBOM) (new — multi-party) | a deployed model is many parties' work — a base-model vendor, a fine-tuner, an optimizer, a deployer. This is a hash-chained AI Bill of Materials where each step is signed by the key of the party responsible for it (4+ distinct signers) and declares the prior artifacts it consumed. Any downstream consumer verifies the whole provenance offline. Who benefits (≥3): the base-model vendor (attribution + scoped liability), the fine-tuner/optimizer (prove their layer), the deployer (prove an unbroken lineage), the regulator/end-user (verify the whole chain + who is accountable). (a 4-party chain with 4 distinct signers verifies + names each; tamper / reorder / remove / impersonation / a broken-provenance link are each caught; rides inside a Trust Passport) | | 🕵️ Private Audit Proof (new — multi-party · flagship) | the wall every AI audit hits: to verify a claim you must be handed the model and the whole (often private/regulated) dataset. This breaks the deadlock — the vendor Merkle-commits every per-record outcome, a Fiat–Shamir challenge derived from that commitment selects a tiny random sample the vendor cannot cherry-pick, and the auditor checks the claim on just those k records (e.g. 300 of 100,000). A claim inflated past tolerance is caught with probability rising toward 1 in k. Audit without handing over the data. Who benefits (≥3): the vendor (proves compliance without exposing model/data), the auditor/regulator (sound audit of a tiny sample), data subjects (minimal exposure), a relying party (re-checks offline). (honest claim accepted ~100%; a true-80%/claim-90% cheater caught 86%→96%→100% as k=30→100→300; only ~0.3% of records revealed; a tampered opening fails its Merkle path; the sample is a pure function of the committed root) · HONEST: not zero-knowledge (the k sampled records are seen) and not a SNARK — a data-minimizing, binding, sound spot-check; soundness is in the random-oracle model, a grinding prover faces work ~1/(1−ε)^k. | | 🎟️ Proof-Carrying Answers (new — multi-party · runtime) | batch audits prove a model was good last quarter; this proves THIS answer, right now. Every output ships a tiny signed trust tag a consumer (or another agent) verifies offline in microseconds: is the input inside the model's certified evidence envelope (with a re-derivable witness if not), is the confidence from a calibrated model (bound by hash), what is the bound provenance (AIBOM lineage + SLA) — verdict TRUSTED / OUT-OF-SCOPE / NEEDS-REVIEW. The runtime trust layer for every AI answer; the missing primitive for multi-agent AI. Who benefits (≥3): the provider (answers carry their own trust), the consuming agent (verifies + safely rejects out-of-scope answers), the platform/regulator (audits the signed stream), the end user (protected from confident-but-unbacked answers). (in-scope→TRUSTED 100% with zero false flags; out-of-scope flagged 100% with a witness dimension; under-confident→NEEDS-REVIEW; a forged TRUSTED is rejected on re-derivation; ~800-byte proof, O(d) data-free verify) · HONEST: proves an answer is BACKED + IN-SCOPE + from a calibrated, provenance-bound model — not that it is factually correct (no per-answer oracle); it catches the out-of-scope / under-confident answers most likely to be wrong. | | 🌍 AI Transparency Log (new — ecosystem · flagship) | Certificate Transparency (RFC 6962) made the whole web's TLS accountable — every certificate publicly logged, append-only, Merkle-auditable. This is that mechanism for AI claims. Every Melete certificate is appended to a public, tamper-evident Merkle log; anyone can prove a claim is included, and prove the log never rewrote history (a consistency proof between two signed tree heads). A vendor can no longer show a fair certificate to one auditor and bury the biased one. The global accountability substrate the whole stack sits on. Who benefits (a whole ecosystem): submitters (public, non-repudiable record), auditors/light-clients (verify inclusion + append-only offline), monitors (regulators/journalists/public detect a rewrite or fork), end users (trust only what is logged). (inclusion provable for every claim across sizes 1..100; a non-member rejected; append-only consistency proven for every m<n; rewriting a past claim is caught as inconsistent with the old signed tree head; a split view is caught; tree heads Ed25519-signed) · HONEST: proves WHAT was logged + that history was not rewritten; it does not force anyone to log (a policy/ecosystem incentive, exactly as with web CT), and a leaf is a claim hash, not a judgement of truth. | | 🛰️ Witness Network (new — ecosystem) | a transparency log is only as honest as the assumption the operator shows everyone the SAME history — the split-view attack CT had to solve. The fix: independent witnesses (other vendors, NGOs, clouds) co-sign the log's Signed Tree Head; a relying party trusts it only if a quorum of distinct witnesses co-signed the same root. The operator can no longer present two histories, and any split view is exposed by the conflicting co-signatures. Trust without trusting the operator. Who benefits: the operator (earns checkable honesty), witnesses (a public good + mutual accountability), relying parties (quorum of independents), regulators (proof of one history). (a head co-signed by ≥quorum distinct witnesses is accepted; below quorum rejected; forged or duplicate co-signatures ignored; a witness refuses a non-append-only head; a split view — two roots at one size — is detected 100%) | | 💾 Live Public Log (new — real, persistent) | not a demo — a real, file-backed public transparency log running on the server that survives restarts (identical Merkle root + signing key), with independent witnesses co-signing its current tree head and a live monitor (size · root · latest claims · witness quorum). Submit a claim and watch it appended. Who benefits: submitters (a permanent record that survives deploys), auditors (an old tree head still checks out), monitors (watch a live growing log), everyone (log + witnesses make AI claims accountable). (durableGauntlet: after a simulated restart the log rebuilds to the identical root + key; inclusion + append-only consistency proofs survive the restart; editing the persisted history is detected; a fresh log mints + persists its key once) | | 🚫 Revocation Registry (new — PKI-grade) | every certificate is valid forever — until it should not be (a model certified fair is later found to discriminate; a key is compromised). This is CRL/OCSP for AI: an authority appends a signed, hash-chained revocation (cert hash + reason + effective time); a relying party checks status GOOD / REVOKED before acting. Time-aware — a decision made before the revocation stays valid; only reliance after is blocked. Post it to the transparency log and a revocation cannot be silently dropped. Who benefits: issuer (withdraw a faulty claim, bound liability), relying parties (stop acting on an invalid cert), regulators (require + verify revocation), end users (protected from already-revoked certs). (GOOD/REVOKED with reason+since; time-aware; authority-signed + chain-linked so tamper / forged-revocation / silent un-revoke are caught; only the pinned authority key is trusted) | | ✅ Live Trust Report (new — one signed verdict) | the stack proves many things separately (fairness, calibration, attribution, lineage). A non-expert cannot read eight proofs — they ask one question: "is this AI trustworthy right now?". This composes the whole lifecycle into a single signed verdict: for every member certificate it checks three things at once — it VERIFIES, it is NOT REVOKED as of the reliance time (time-aware), and it is INCLUDED in the public transparency log — and returns TRUSTED-NOW only if every member passes all three, else NOT-TRUSTED-NOW naming the exact member + reason. Revoke one underlying claim and the same bundle flips live. Who benefits: a consumer / procurement (one trustworthy-or-not answer instead of eight proofs), the issuer (a live-good status that already accounts for revocation + logging), regulators (a current signed verdict, not a stale bundle), end users (protected the moment any claim is withdrawn or was never logged). (all-good → TRUSTED-NOW; one revoked/unlogged/tampered member → NOT-TRUSTED-NOW naming it; time-aware; forged TRUSTED-NOW caught on re-derivation; Ed25519-signed, offline-verifiable) | | 🏛️ Chain of Trust (new — AI Certificate Authority) | every certificate assumes the issuer's key is one to trust — but who authorized that issuer? This is the PKI answer for AI claims: a pinned ROOT authority signs a scoped delegation to an intermediate (which cert kinds, which subject namespace, a validity window, a max path length); intermediates may sub-delegate — but only ever narrower — down to a leaf issuer. A relying party pins ONE root key and verifies any issuer was transitively authorized to make exactly this claim. Who benefits: a root regulator sets policy once (every downstream issuer inherits a bounded, checkable mandate); intermediates get provable scoped power; issuers prove they were authorized, not merely that they signed; relying parties trust a whole ecosystem from one pinned key. (in-scope chain → AUTHORIZED; wrong-root / out-of-kind / out-of-namespace / expired (time-aware) / over-delegation (path length) / broken-link / a child broadening its parent / forgery all rejected naming the link; Ed25519-signed, offline-verifiable) | | 🤝 Swarm Evidence | many AI agents each with weak evidence pool into one verdict stronger than any single one — an agent that lies (claims more than its data shows) is re-derived and excluded, and a consensus check (Cochran's Q) flags whether the agents actually agree, so a pooled significance dragged by one disagreeing agent isn't trusted blindly (it names the culprit). Multi-agent trust, signed. (weak gain caught 61% pooled vs 29% single; null ≤ α; a 10⁶× liar excluded 100%; disagreement detected + attributed 100%, false-flagged only 2.8%) | | 📊 False-Discovery Control | report K findings at once and some are pure luck. It controls the fraction of your reported discoveries that are false at a target q, ships a per-hypothesis q-value (usable at any threshold from one signed cert), and offers a Benjamini-Yekutieli mode that holds under arbitrary dependence (the real case — knobs/metrics are correlated). (BH realized FDP ≤ q, measured 7.6%; naive inflates to 13%; q-values match BH at every threshold; BY safe under ρ=0.5 dependence → 1.6% ≤ q) | | ⬛ Null Engine | brave enough to say "there's nothing to find" on pure noise | | 👑 Sovereign Verdict + ⏪ Replay | Ed25519-signed, deterministic, re-derivable on any machine, forever |

curl -X POST https://melete.mneme-ai.space/trust-certificate -d '{"scenario":"good"}'
curl -X POST https://melete.mneme-ai.space/stability         -d '{"scenario":"easy"}'
curl -X POST https://melete.mneme-ai.space/honest-search     -d '{"seed":3}'   # genuine VERIFIES, a fake is REJECTED
curl -X POST https://melete.mneme-ai.space/tolerance         -d '{"scenario":"broad"}'   # certified ±tolerance
curl -X POST https://melete.mneme-ai.space/improvement       -d '{"seed":7}'            # certified gain A→B (independent vs CRN-paired)
curl -X POST https://melete.mneme-ai.space/prereg            -d '{"seed":3}'            # genuine CONFORMS, a cherry-picked run is REJECTED
npx melete-ai poopt proof-of-optimization.json   # verify any signed certificate offline

🔬 Diagnose — plain-language why

| lens | tells you | |---|---| | Sensitivity · cliffs · shape | which knobs matter, where it breaks, the response shape | | Ceiling · drift | the achievable best, and whether results drift over time |

🔌 Integrate — incl. MCP (trust middleware for AI agents)

npm i melete-ai · CLI npx melete-ai … · HTTP https://melete.mneme-ai.space — /next /discover /trust-certificate /stability /honest-search /tolerance /improvement /prereg /breakdown /selection /support /fdr /anytime /swarm /conformal /subgroup /calibration /privacy /unlearning /dro /fairness /attribution /receipt /sla /consent /passport /aibom /spotcheck /pca /translog /witness /log/submit /log/monitor /revocation /design /design.md /mcp /verify

🔌 Model Context Protocol — be the verification layer any AI agent plugs into. Any agent (Claude · GPT · Gemini · an autonomous coding agent) calls Melete over MCP and gets back a signed, offline-verifiable answer instead of a number to take on faith — de-bias a winner, check support, control the false-discovery rate, propose the next experiment. Plug-and-play, every result Ed25519-signed.

// Claude Desktop / Cursor MCP config:
{ "mcpServers": { "melete": { "command": "melete-mcp" } } }

…or over HTTP: POST /mcp with a JSON-RPC body (initialize · tools/list · tools/call).

Every tool call is metered + audited into a signed trust ledger — a hash-chained, Ed25519-signed receipt per call (which agent, which tool, the hash of the signed result). POST /mcp/usage returns the tamper-evident usage tally (the number you bill on) + the chain-integrity check. One layer, two jobs: usage-based billing and a shared audit trail every agent and human re-verifies offline.

The moat

🔒 Sovereign — runs air-gapped, on your machine; data never touches a cloud.
👑 Verifiable — every verdict is Ed25519-signed; an auditor re-verifies it offline with the embedded public key, no trust in us required.
⏪ Replayable — the engine is fully deterministic, so a signed Replay Token re-derives the exact decision, step by step, on any machine, forever.

Honest by design (DIAKRISIS)

Melete is an optimizer + analyst, not a fortune-teller. "Verifiable" means provenance + reproducibility — proof of what was tested and the result reached, unaltered and re-derivable — not a proof that your code is bug-free or exploit-free (that is undecidable in general; we don't claim it). Efficiency, robustness, and Pareto results are exact and reproducible. Run melete gauntlet — every claim is a check you can re-run.