npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@metaharness/weight-eft

v0.1.1

Published

Fine-tune cheap open-source LLMs (GLM, Qwen, DeepSeek) on your AI coding agent's successful runs with LoRA (SFT + DPO) so your model cascade escalates to expensive frontier models (GPT, Claude) less often — cutting cost-per-resolved. Turns run history int

Downloads

200

Readme

@metaharness/weight-eft

Make cheap open-source LLMs solve more coding tasks on their own. Fine-tune them (LoRA) on your AI agent's past successful runs, so your pipeline calls expensive frontier models (GPT, Claude) less often — and your cost-per-fix drops.

npm version license: MIT node

npm i @metaharness/weight-eft

What is this? (plain language)

If you run an AI coding agent, you probably use a model cascade: a cheap model (GLM / Qwen / DeepSeek) tries first, and only the hard problems escalate to an expensive frontier model (GPT / Claude). Every escalation costs real money.

weight-eft makes the cheap model smarter by fine-tuning it with LoRA on the trajectories your agent already solved — turning your run history into training data. The cheap model then resolves more issues by itself, so you escalate less and pay less per solved task.

It's a self-improving loop: your agent's wins become the next model's training set.

  • Input: your agent's run archive (successful + failed trajectories).
  • Output: portable LoRA training data — SFT + DPO in standard formats (OpenAI chat JSONL / TRL / axolotl / unsloth) + a GPU training plan.
  • Goal: lower cost-per-resolved, not a leaderboard score.

Why it exists (the honest, bounded thesis)

We attack the cost axis, not the capability ceiling. A small (7-14B) local fine-tune will not out-reason a frontier model on the hardest problems — that's a model-capability ceiling (measured: clean-eval ~37.3%, ADR-198 / §53). The win is fewer escalations (lower cost), and the tooling keeps the telemetry honest about exactly that: the eval metric is escalation-rate-reduction + cost/resolved, never "we beat the frontier."

Under the hood this is the gradient/weight counterpart to Darwin's gradient-free policy evolution (freeze the model, evolve the harness) — here we also evolve the cheap model's weights, on the open tier, from the harness's own archive.

The data recipe (on/off-policy)

| Set | Contents | Why | |-----|----------|-----| | SFT | ALL gold-resolved trajectories — cheap-OWN and frontier-escalation | SFT (max-likelihood) is off-policy-stable, so a frontier success on an issue the cheap model couldn't solve is off-policy-safe DISTILLATION. | | DPO | ON-POLICY cheap-vs-cheap pairs ONLYchosen = a resolved sample, rejected = an empty/failed sample by the same cheap model on the same instance (BoN-derived) | A frontier-chosen-vs-cheap-rejected pair is off-policy and unstable (the reference policy never produced the chosen completion). That signal goes to SFT instead. |

Output formats (canonical / portable)

Exported files use standard schemas (portable to TRL / axolotl / unsloth / ruvllm-MicroLoRA), never a custom format. A thin runner-adapter at the training boundary maps standard → whatever the runner ingests.

  • SFT — OpenAI chat JSONL: {"messages":[{role:system},{role:user},{role:assistant,tool_calls:[…]},{role:tool,…},…,{role:assistant}]}. tool_calls are preserved — the ReAct loop is not flattened to plain text; the model learns real tool-use trajectories.
  • DPO — TRL/HF conversational preference: {"prompt":[system+issue], "chosen":[resolved trajectory], "rejected":[failed trajectory]}. ReAct diverges from the first action, so prompt is the shared system+issue and chosen/rejected are full trajectories from there.

The guards

  1. Contamination guard (the headline correctness property). Strict train/eval instance-ID disjointness. The exporter excludes any trajectory whose instance_id is in the caller's evalHoldout, and assertTrainEvalDisjoint throws on any overlap. Training on eval instances is fake lift — the exact contamination we debunk elsewhere.
  2. Reward-hacking filter (Ornith-1.0 borrow). A deterministic monitor drops any "success" that read a withheld gold/test path, modified the verification harness, or escaped the sandbox. An archived reward-hack would teach the model to reward-hack — this is the training-data analog of the conformance firewall, separate from and in addition to the disjointness guard.
  3. Long-context filter. SWE/ReAct trajectories can blow past a 7-14B context window (~32k). Over-budget trajectories are dropped (or truncated with --truncate) and REPORTED — never silently lost.

Every drop is surfaced in the export report (droppedRewardHacked, excludedByHoldout, droppedOverLength, truncatedOverLength).

The weightAdapter genome gene (prune-the-overfitter safety net)

A LoRA tune can overfit. Rather than trust it blindly, the adapter is a gene in the Darwin genome (packages/darwin-mode/bench/swebench/evolve-config.mjs):

  • weightAdapter: null = BASE (no adapter) — the default and the control. A genome that never opts in is byte-identical (by key) to a pre-gene genome.
  • weightAdapter: 'sft' = SFT-distilled adapter.
  • weightAdapter: 'sft-dpo' = SFT then on-policy DPO.

Base competes against the tuned variants under the same conformant fitness, so evolution prunes an adapter that doesn't actually lift held-out resolve. The gene is inert until an adapter is trained (a GPU job) — it only names an adapter; it does not create one.

The training runner (GPU-gated)

weight-eft train is $0 by default — it emits a training plan (config + the exact ruvllm microlora … command). A real run requires BOTH an explicit --train flag AND a detected GPU / endpoint; otherwise it dry-runs or refuses. Target is 7-14B (Qwen2.5-Coder-7B / GLM-4-9B class) — not 32B (§59: 32B q4 spills a 16GB GPU). Stages: SFT first, then optional on-policy DPO from the SFT checkpoint.

CLI

# Status / recipe summary
weight-eft status
metaharness weight-eft status         # via the umbrella CLI

# Export training sets ($0). evalHoldout enforces the contamination guard.
weight-eft export --archive archive.json --eval-holdout holdout.json --out-dir ./out

# Emit the training plan ($0 dry-run). Add --train on a GPU host to run.
weight-eft train --base Qwen/Qwen2.5-Coder-7B-Instruct --params-b 7 \
  --sft ./out/sft.jsonl --dpo ./out/dpo.jsonl --adapter glm5.2

# Measure the cost-Pareto delta (base vs adapter cascade runs).
weight-eft eval --base-outcomes base.json --adapter-outcomes adapter.json

The exact (later, GPU) command to train + eval

# 1) Export ($0) — disjoint train/eval, reward-hack-filtered, long-context-filtered
weight-eft export --archive darwin-archive.json --eval-holdout clean-eval-ids.json --out-dir ./eft

# 2) Train (GPU host) — SFT then on-policy DPO. ruvllm/MicroLoRA executes plan.command.
weight-eft train --base Qwen/Qwen2.5-Coder-7B-Instruct --params-b 7 \
  --sft ./eft/sft.jsonl --dpo ./eft/dpo.jsonl --adapter glm5.2 --train
# (refuses unless WEIGHT_EFT_BASE_URL / CUDA_VISIBLE_DEVICES is set)

# 3) Run the conformant cascade twice (base vs glm5.2-sft-dpo adapter) on the
#    HELD-OUT clean set via the existing darwin eval path, collect per-instance
#    CascadeOutcome[] for each, then:
weight-eft eval --base-outcomes base-outcomes.json --adapter-outcomes adapter-outcomes.json

Input contract

The exporter codes against DarwinTrajectory[] (see src/types.ts) — reconstructable from Firestore darwin_runs + the local prediction/trajectory artifacts (predictions-*.jsonl rows carry instance_id + model_patch; the agentic loop carries the messages array with tool_calls, see darwin-mode/bench/swebench/solve-agentic.mjs). A tiny mock fixture archive lives in __tests__/fixtures/.

Status (honest)

  • Runnable, $0: exporter (with all three guards), training-plan emission, cost-Pareto eval folding, the weightAdapter gene (wired into darwin's evolve-config genome + the umbrella metaharness weight-eft CLI).
  • Scaffolded, GPU-gated: the actual LoRA training (spawn(plan.command) on a GPU host implementing the ruvllm/MicroLoRA seam). No training run, no GPU job, no paid model call has been executed.

See ADR-198 for the full rationale, the SFT-distill / on-policy-DPO recipe, the disjointness invariant, and the self-scaffolding RL roadmap (Ornith-1.0).

License

MIT