npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@lablnet/hyperagents

v1.0.0

Published

Self-improving agent framework powered by LangChain and LangGraph

Readme

HyperAgents

Self-improving agent framework powered by LangChain and LangGraph.

Inspired by HyperAgents (Meta Research, 2026) -- ported to TypeScript with a generic, pluggable architecture.

What it does

HyperAgents runs an evolutionary self-improvement loop where a MetaAgent rewrites a TaskAgent's code to make it better at solving tasks. Each generation:

  1. Select a parent agent from the archive
  2. MetaAgent reads past evaluation scores and edits the source code
  3. The modified TaskAgent is evaluated on domain tasks
  4. Score + code diff are saved to the archive
  5. Repeat

The TaskAgent gets better over generations without manual intervention.

New here? Read docs/concepts.md for a detailed explanation of every concept with examples.

Quick start

# Install
pnpm install

# Set your API key
cp .env.example .env
# Edit .env with your OPENAI_API_KEY

# Run the self-improvement demo (watch the score go from 0.42 to 1.00)
pnpm demo:scoring

Architecture

docs/
└── concepts.md           Detailed concepts guide (archive, strategies, self-modification, etc.)
src/
├── agent/              Agents
│   ├── base_agent.ts     Abstract base class
│   ├── llm.ts            Multi-provider LLM factory (OpenAI, Anthropic, Gemini, Ollama)
│   ├── llm_with_tools.ts LangGraph ReAct agentic loop
│   ├── meta_agent.ts     Modifies code to improve the TaskAgent
│   ├── task_agent.ts     Solves domain tasks
│   └── tool_registry.ts  Generic tool registry
├── prompts/            Prompt templates (separated from logic)
│   ├── task_agent.ts     TaskAgent instruction prompt
│   ├── meta_agent.ts     MetaAgent improvement prompt
│   └── llm_judge.ts      LLM judge scoring prompt
├── tools/              Framework tools (used by MetaAgent)
│   ├── bash.ts           Shell command execution
│   └── editor.ts         File viewing and editing
├── core/               Evolutionary loop
│   ├── generate_loop.ts  Self-improvement loop
│   ├── select_parent.ts  Parent selection strategies
│   └── ensemble.ts       Best-of-archive ensemble
├── domains/            Evaluation framework
│   ├── base.ts           Domain interface
│   ├── harness.ts        Generic evaluation harness
│   ├── report.ts         Score reporting
│   └── evaluators.ts     Pluggable evaluators (static, LLM judge, human feedback)
└── utils/              Infrastructure
    ├── archive.ts        JSONL archive management
    ├── executor.ts       Local + Docker execution
    ├── docker.ts         Docker container management
    ├── git.ts            Git diff/patch operations
    └── common.ts         Shared utilities

Key concepts

TaskAgent vs MetaAgent

| | TaskAgent | MetaAgent | |---|---|---| | Role | Solves tasks | Rewrites the TaskAgent's code | | Input | A task description | Repo path + past eval scores | | Output | A prediction | Modified source code on disk | | Tools | Domain-specific (optional) | bash + editor (built-in) |

Three evaluator strategies

import { staticEvaluator, llmJudgeEvaluator, humanFeedbackEvaluator } from "hyperagents";

// 1. Static: exact string match (free, for tasks with one right answer)
staticEvaluator("42", "42") // => 1.0

// 2. LLM Judge: ask an LLM to score (for subjective tasks)
await llmJudgeEvaluator(prediction, {
  description: "Generate tasks from this email",
  rubric: "Score based on relevance and actionability",
}) // => 0.85

// 3. Human Feedback: pass in user ratings (for production apps)
humanFeedbackEvaluator(4 / 5) // => 0.8

Parent selection strategies

The archive stores every agent generation. Parent selection picks which ancestor to improve next (not necessarily the previous one -- it picks from all valid generations):

  • random -- any valid parent
  • latest -- most recent generation
  • best -- highest scoring
  • score_prop -- probability proportional to score
  • score_child_prop -- score-weighted, penalizes over-explored parents (default)

The loop also includes early termination: if the best score in the archive reaches 1.0 (100%), the loop stops automatically to avoid wasting compute.

Self-referential improvement (prompt files)

Both agents can load prompts from editable files instead of hardcoded defaults. This enables the MetaAgent to modify its own instructions across generations:

// Per-agent prompt file
const metaAgent = new MetaAgent({ model, promptFile: "./prompts/meta_agent.txt" });

// Or auto-scaffold via the generate loop
const config: GenerateLoopConfig = {
  // ...
  promptsDir: "./prompts",  // creates meta_agent.txt + task_agent.txt
};

When promptsDir is set, the MetaAgent can edit meta_agent.txt to improve how it approaches future generations — the improver improves itself.

See docs/concepts.md for full details.

Execution modes

  • Local (default): runs in a temp directory, fast for development
  • Docker: container per generation, safe for untrusted LLM-generated code

Examples

Scoring demo (self-improvement in action)

pnpm demo:scoring

A math grading domain where the TaskAgent starts with a bad prompt (strict string matching). The MetaAgent reads the failures and rewrites the prompt to handle mathematical equivalence. Score jumps from 0.42 to 1.00 in one generation.

Bash scripting

pnpm example:bash          # single evaluation
npx tsx examples/bash/run.ts evolve  # evolutionary loop

TaskAgent generates bash commands from descriptions. Supports both single eval and full evolutionary self-improvement.

Calculator (tool improvement)

pnpm example:calculator

The TaskAgent has a deliberately buggy calculator tool (only supports +, -, *, /). The MetaAgent reads the failures and edits calc_tool.ts to add missing operations (power, modulo, sqrt, abs).

Fact-check

npx tsx examples/factcheck/run.ts          # single evaluation
npx tsx examples/factcheck/run.ts evolve   # evolutionary loop

TaskAgent classifies statements as true/false. Includes tricky common myths (e.g., "The Great Wall is visible from space"). Uses runGenerateLoop for full evolutionary self-improvement.

Paper review

pnpm example:paper-review

TaskAgent predicts accept/reject for research papers.

Creating your own domain

Implement the Domain interface:

import type { Domain, DomainConfig, DomainTask, EvalResult, ReportSummary } from "hyperagents";

class MyDomain implements Domain {
  config: DomainConfig = {
    name: "my_domain",
    evalSubsets: ["train"],
    splits: ["train"],
    stagedEvalSamples: 5,
    scoreKey: "accuracy",
  };

  async loadTasks(subset: string, numSamples?: number): Promise<DomainTask[]> {
    // Load from JSON, database, API, etc.
  }

  async evaluate(prediction: string, task: DomainTask): Promise<number> {
    // Use staticEvaluator, llmJudgeEvaluator, or humanFeedbackEvaluator
  }

  formatInput(task: DomainTask): string {
    // Format the task as a prompt for the TaskAgent
  }

  async report(results: EvalResult[]): Promise<ReportSummary> {
    // Aggregate scores
  }
}

LLM providers

import { createLLM } from "hyperagents";

createLLM({ model: "openai/gpt-4o" })
createLLM({ model: "anthropic/claude-sonnet-4-5-20250929" })
createLLM({ model: "gemini/gemini-2.5-pro" })
createLLM({ model: "ollama/llama3" })  // free, runs locally

Docker

Build and run without installing anything locally (except Docker):

# Build the image
docker build -t hyperagents .

# Run the scoring demo
docker run --rm -e OPENAI_API_KEY=sk-... hyperagents examples/scoring/run.ts

# Run the bash example
docker run --rm -e OPENAI_API_KEY=sk-... hyperagents examples/bash/run.ts

# Run the evolutionary loop
docker run --rm -e OPENAI_API_KEY=sk-... hyperagents examples/bash/run.ts evolve

# Use a different model
docker run --rm \
  -e OPENAI_API_KEY=sk-... \
  -e HYPERAGENTS_MODEL=openai/gpt-4o-mini \
  hyperagents examples/scoring/run.ts

# Mount a volume to persist outputs
docker run --rm \
  -e OPENAI_API_KEY=sk-... \
  -v $(pwd)/outputs:/hyperagents/outputs \
  hyperagents examples/scoring/run.ts

For Anthropic or Gemini models, pass the corresponding API key:

docker run --rm \
  -e ANTHROPIC_API_KEY=sk-ant-... \
  -e HYPERAGENTS_MODEL=anthropic/claude-sonnet-4-5-20250929 \
  hyperagents examples/scoring/run.ts

Based on

License

MIT