npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

harness-evolver

v6.4.2

Published

LangSmith-native autonomous agent optimization for Claude Code

Readme

Harness Evolver

Point at any LLM agent codebase. Harness Evolver will autonomously improve it — prompts, routing, tools, architecture — using multi-agent evolution with LangSmith as the evaluation backend.


Install

Claude Code Plugin (recommended)

/plugin marketplace add raphaelchristi/harness-evolver-marketplace
/plugin install harness-evolver

npx (first-time setup or non-Claude Code runtimes)

npx harness-evolver@latest

Works with Claude Code, Cursor, Codex, and Windsurf.


Quick Start

cd my-llm-project
export LANGSMITH_API_KEY="lsv2_pt_..."
claude

/harness:setup      # explores project, configures LangSmith
/harness:health     # check dataset quality (auto-corrects issues)
/harness:evolve     # runs the optimization loop
/harness:status     # check progress (rich ASCII chart)
/harness:deploy     # tag, push, finalize

What It Looks Like

Tested on a RAG agent (Agno framework, Gemini 3.1 Flash Lite, light mode):

xychart-beta
    title "agno-deepknowledge: 0.575 → 1.000 (+74%)"
    x-axis ["base", "v001", "v002", "v003", "v004", "v005", "v006", "v007"]
    y-axis "Correctness" 0 --> 1
    line [0.575, 0.575, 0.950, 0.950, 0.950, 0.950, 0.950, 1.0]
    bar [0.575, 0.333, 0.950, 0.720, 0.875, 0.680, 0.880, 1.0]

| Iter | Score | Merged? | What the proposer did | |---|---|---|---| | baseline | 0.575 | — | Original agent — hallucinations, broken tool calls, no retry logic | | v001 | 0.333 | Yes | Anti-hallucination prompt (100% correct when API responded, but 60% hit rate limits) | | v002 | 0.950 | Yes | Breakthrough: inlined 17-line KB into prompt, eliminated vector search entirely. 5.7x faster, zero rate limits | | v003 | 0.720 | No | Attempted hybrid retrieval — regressed, rejected by constraint gate | | v004 | 0.875 | No | Response completeness fix — improved one case but regressed others | | v005 | 0.680 | No | Reduced tool calls — broke edge cases, rejected | | v006 | 0.880 | Yes | Evolution memory insight: combined v001's anti-hallucination with one-shot example from archive | | v007 | 1.000 | Yes | One-shot example injection + rubric-aligned responses — perfect on held-out |

The line shows best score (only goes up — regressions aren't merged). The bars show each candidate's raw score. 4 merged, 3 rejected by gate checks. Not every iteration improves — that's the point.


How It Works

| | | |---|---| | LangSmith-Native | No custom scripts. Uses LangSmith Datasets, Experiments, and LLM-as-judge. Everything visible in the LangSmith UI. | | Real Code Evolution | Proposers modify actual code in isolated git worktrees. Winners merge automatically. | | Self-Organizing Proposers | Two-wave spawning, dynamic lenses from failure data, archive branching from losing candidates. Self-abstention when redundant. | | Rubric-Based Evaluation | LLM-as-judge with justification-before-score, rubrics, few-shot calibration, pairwise comparison. | | Smart Gating | Constraint gates, efficiency gate (cost/latency pre-merge), regression guards, Pareto selection, holdout enforcement, rate-limit early abort, stagnation detection. |

Full feature list


Evolution Loop

/harness:evolve
  |
  +- 1. Preflight  (validate state + dataset health + baseline scoring)
  +- 2. Analyze    (trace insights + failure clusters + strategy synthesis)
  +- 3. Propose    (spawn N proposers in git worktrees, two-wave)
  +- 4. Evaluate   (canary → run target → auto-spawn LLM-as-judge → rate-limit abort)
  +- 5. Select     (held-out comparison → Pareto front → efficiency gate → constraint gate → merge)
  +- 6. Learn      (archive candidates + regression guards + evolution memory)
  +- 7. Gate       (plateau → target check → critic/architect → continue or stop)

Detailed loop with all sub-steps


Agents

| Agent | Role | |---|---| | Proposer | Self-organizing — investigates a data-driven lens, decides own approach, may abstain | | Evaluator | LLM-as-judge — rubric-aware scoring via langsmith-cli, few-shot calibration | | Architect | ULTRAPLAN mode — deep topology analysis with Opus model | | Critic | Active — detects evaluator gaming, implements stricter evaluators | | Consolidator | Cross-iteration memory — anchored summarization, garbage collection | | TestGen | Generates test inputs with rubrics + adversarial injection |


Requirements

  • LangSmith account + LANGSMITH_API_KEY
  • Python 3.10+ · Git · Claude Code (or Cursor/Codex/Windsurf)

Dependencies installed automatically by the plugin hook or npx installer.

LangSmith traces any AI framework: LangChain/LangGraph (auto), OpenAI/Anthropic SDK (wrap_*, 2 lines), CrewAI/AutoGen (OpenTelemetry), any Python (@traceable).


Companion: LangSmith Tracing

For full observability into what each proposer does during evolution (every file read, edit, and commit), install the LangSmith tracing plugin:

/plugin marketplace add langchain-ai/langsmith-claude-code-plugins
/plugin install langsmith-tracing@langsmith-claude-code-plugins

With both plugins installed, the evolution loop traces to LangSmith as a hierarchy: iteration → proposers → tool calls.


References


License

MIT