npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

forgeagent

v3.0.0-alpha.12

Published

AgentForge — AI Agent Factory. Requirements in, verified deliverables out.

Readme

AgentForge — Intent Compiler for AI Agents

Define contracts. Compile agents. Verify deliverables.

AgentForge is a domain-agnostic intent compiler. It takes structured goals—contracts, specs, requirements—and executes them through dynamically generated agents with bounded capabilities and evidence-based verification.

What AgentForge actually is:

  • An agent compiler: proposed agents are validated, normalized, and policy-checked before instantiation
  • A contract enforcement system: VCC (Verifiable Contract Criteria) define acceptance criteria before execution
  • A verification framework: deliverables must pass auditable checks to be accepted
  • A bounded execution runtime: agents operate within capability, tool, and resource constraints

What AgentForge is NOT:

  • A chat assistant with extra steps
  • A "let AI figure it out" autonomous system
  • A software-only framework — AgentForge is domain-agnostic; domain behavior is derived from VCC, DomainContextExtractor, and learned patterns
  • A replacement for human judgment on critical decisions

What AgentForge compiles:

  • A plan (what will be produced)
  • Contracts (what "done" means)
  • Execution steps (how work proceeds)
  • Evidence requirements (how success is proven)

The output is not "an answer"—it is a bundle of artifacts + verification results.

This is not a chat assistant. AgentForge is an intent compiler + bounded execution system that turns contracts into reproducible, verifiable deliverables.

Version Tests TypeScript License


What AgentForge Actually Does

Project → Analysis → Team Design → Agent Compilation → Execution → Review → Deliverables

The critical step is Agent Compilation: proposed agents are validated, normalized, and policy-checked before they are allowed to exist.

You give AgentForge a project—specifications, research notes, datasets, PRDs, codebases, or a single document. AgentForge figures out what experts are needed, generates them, enforces capability boundaries, and executes through quality gates.

# Analyze any project
agentforge analyze ./your-project --dry-run

# Domain: financial-analysis (91% confidence)
# Recommended Roles:
#   • Quantitative Analyst
#   • Portfolio Optimizer
#   • Compliance Auditor
# Key Open Questions:
#   ? Are regulatory constraints fully captured?

Why AgentForge Exists

| Typical AI Tools | AgentForge | | ----------------------- | --------------------------------- | | You explain context | AgentForge analyzes your input | | One generic assistant | Dynamically generated specialists | | Stateless conversations | Persistent learning across runs | | Manual orchestration | Compiler-enforced execution | | Writes snippets | Produces reviewable deliverables |

You don't manage the AI. The project tells AgentForge what's needed.


AgentForge v3.0 — Domain-Agnostic Agent Compiler

AgentForge v3.0 removes all hardcoded agent types.

There are no predefined roles, no enums, no fixed domains.

Instead:

  • The project is analyzed
  • Roles are recommended dynamically
  • Roles are normalized & deduplicated
  • Personas are researched by LLMs
  • Capabilities are enforced by policy
  • Agents are compiled, validated, and only then instantiated

AgentForge will reject agents that violate capability, security, or determinism guarantees—even if an LLM proposes them.

const { agents, result } = await agentforge.generateAgentTeam({
  projectName: 'Risk Platform',
  domain: 'quantitative finance',
  description: 'Portfolio risk optimization system',
  techStack: ['Python', 'NumPy', 'PostgreSQL'],
});

Generated roles are not limited to software development.


What Makes v3.0 Different (and Hard)

| Feature | Why It Matters | | -------------------------- | ------------------------------------- | | Dynamic Roles | Any domain, any expertise | | Persona ≠ Capability | LLM creativity without security risk | | Deterministic Backends | Same input → same execution | | Role Normalization | No duplicate or overlapping agents | | Compiler Gate | Invalid agents cannot run | | Invariant Tests | Architecture is enforced, not implied |

AgentForge v3.0 behaves more like a compiler than a framework.

See: v3.0 Design Contract


VCC — Verifiable Contract Criteria

VCC is AgentForge's contract-driven quality system. Instead of hoping outputs are good, you define acceptance criteria upfront in YAML and AgentForge enforces them.

# vcc_research.yml
acceptance:
  - acId: AC-RES-1
    targetDeliverables: [A1]
    type: structure
    severity: must
    rule:
      requiredSections: [Queries, Sources, Methodology, Limitations]

How it works:

  1. Define VCC spec (YAML) with acceptance criteria
  2. Pass VCC to execution via task.metadata.vcc
  3. Quality pipeline scores artifacts against criteria
  4. Failed criteria trigger refinement with specific feedback
  5. Task fails if MUST criteria aren't satisfied

Type-safe integration:

// VCCContext enforces both fields required together
const vccContext: VCCContext = {
  vcc: loadedSpec,      // Required
  artifactId: 'A1'      // Required
};

// Pass to execution - scoring happens automatically
pipeline.assessQuality(output, { task, vccContext });

VCC supports:

  • Structure criteria — Required sections/headers
  • Rubric criteria — LLM-evaluated quality dimensions
  • Schema criteria — JSON/data format validation

Codebase Navigation & Agent Context

AgentForge now ships a 3-tier filesystem pyramid for agent navigability — all backends (Groq, Gemini, local models, Claude) get structural project context automatically:

| Tier | File | Content | Token cost | |------|------|---------|------------| | 1 | CLAUDE.md → Module Map section | 29-row module index (entry points, purposes) | ~870, once per session | | 2 | src/*/README.md | Key files, entry point, dependency map per module | On-demand via scout | | 3 | Source files | Implementation | On-demand |

27 per-directory README index files now ship in src/, covering every module from src/core/ through src/tui/. Each lists real file names, real exports, entry points, and Depends On / Depended On By relationships — not guesses.

projectIndexFile config option — the tier-1 file is configurable, not hardcoded to CLAUDE.md. Projects with PROJECT_CONTEXT.md, README.md, or any other index file work automatically:

# Use a different index file
agentforge orchestrate --project-index-file PROJECT_CONTEXT.md

# Via env var
AgentForge_PROJECT_INDEX_FILE=README.md agentforge orchestrate

The ScoutAgent loads and injects the Module Map section (or full file up to 3000 chars) into every agent's context before task execution. SubagentBootupRitual does the same for directly-spawned subagents.

VCC Resolution

  • VCC is now resolved ONCE at composition root (orchestrate.ts) before engine creation
  • ResolvedVCCContext provides a 4-tier precedence chain: explicit VCC → PRD resource constraints → model defaults → system defaults
  • All 3 execution engines (local, API, Claude Code) consume the same resolved values — no split-brain threshold resolution
  • All resolved properties are readonly after construction

Execution System (Production-Grade)

In v3.0, all generated agents execute exclusively through this system.

AgentForge includes a full execution engine with:

  • Task state machine
  • Verification gates (checks, analysis, security)
  • AI + human review
  • Rework loops
  • Workspace isolation (e.g., via Git worktrees)
  • Audit logging
pending → in_progress → quality_gate → ai_review → human_review → done
             ↑                ↓              ↓
             └────── in_rework ──────────────┘

This system is always on by default. Risky changes cannot silently bypass review.

See: Execution System

Engine Architecture

LocalExecutionEngine decomposed from 3,820 to 1,307 lines — now a thin coordinator delegating to focused modules:

| Module | Lines | Responsibility | |--------|-------|----------------| | modes/atomic-workflow.ts | 1,667 | Atomic decomposition, remediation, compile gate | | modes/critic-workflow.ts | 456 | VCC + generic critic passes | | modes/ralph-workflow.ts | 65 | Ralph loop strategy | | shared/domain-detection.ts | 477 | Tech stack detection, domain context | | shared/execution-pipeline.ts | 97 | Quality pipeline factory shared by all engines | | core/BaseExecutionEngine.ts | 90 | Abstract base class with template method pattern |


Provider-Agnostic by Design

AgentForge works with:

  • Claude (recommended)
  • OpenAI
  • Local LLMs (Ollama, LM Studio)
  • Custom providers via plugins

Switch providers without changing code or architecture.

See: Supported AI Providers


No Domain Packs Needed

AgentForge is domain-agnostic by design—no pluggable validator packs are required. Domain-specific behavior emerges automatically from three mechanisms that are always active:

  • VCC (Verifiable Contract Criteria) — acceptance criteria defined upfront in YAML drive what "done" means for any domain
  • DomainContextExtractor — detects tech stack, language, and domain signals from the project itself
  • SQLite pattern learning — persists rubrics, thresholds, and quality signals across runs, improving accuracy automatically

There is no plugin API to implement, no pack to install, and no domain to register. Point AgentForge at any project and it adapts.


Quick Start

git clone https://github.com/Platano78/AgentForge.git
cd AgentForge
npm install
cp .env.example .env
# Edit .env and add your ANTHROPIC_API_KEY
npm run build
npm test
# Generate a team from a codebase
npm run agentforge create-team --analyze ./project

See: Quick Start Guide | CLI Reference


Who This Is For

AgentForge is built for:

  • Engineers shipping complex systems
  • Teams tired of "AI assistants" that don't understand context
  • Projects where quality, review, and determinism matter
  • Anyone who wants AI to behave like an organization, not a chatbot

Not designed for casual chat-based coding or one-off prompt experiments.


Quality Scoring System

AgentForge uses an LLM Judge to evaluate every generated artifact against domain-specific rubrics. The judge model is always different from the generator model to prevent self-evaluation bias.

4-Tier Rubric Lookup

Rubrics are resolved through a cascading lookup that balances speed and cost:

| Tier | Source | Latency | Cost | |------|--------|---------|------| | 1 | Memory cache — session-scoped, instant hit | ~0ms | Free | | 2 | Hardcoded rubrics — built-in domain defaults | ~0ms | Free | | 3 | SQLite learned rubrics — persisted across sessions | ~1ms | Free | | 4 | LLM generation — on-demand for unknown domains | ~2-5s | 1 API call |

If all tiers fail, the system falls back to the general rubric.

Supported Domain Rubrics

Hardcoded rubrics ship for these domains:

  • unity — correctness, performance (GC avoidance, pooling), architecture (MonoBehaviour, ScriptableObjects), maintainability, documentation
  • godot — correctness (GDScript 4.x), node-lifecycle, signals-and-patterns, type-safety, performance
  • typescript — correctness (strict mode), type-safety, architecture (SOLID), error-handling, testability
  • general — correctness, clarity, completeness, best-practices

Any domain not listed above will trigger Tier 3/4 lookup and the result is cached for future use.

Judge Model Selection

The judge provider is selected by checking API keys in priority order:

| Priority | Provider | Model | Notes | |----------|----------|-------|-------| | 1 | Claude | claude-3-5-haiku-20241022 | High quality, good rate limits | | 2 | Gemini | gemini-2.0-flash | Fast, generous free tier | | 3 | DeepSeek | deepseek-chat | Strong reasoning | | 4 | Groq | llama-3.3-70b-versatile | Fast but limited free tier |

If no API key is available, quality assessment is skipped and a warning is logged.

Quality Recommendations

Each evaluation produces a score (0-100) and a recommendation:

  • pass (80-100) — artifact accepted, task complete
  • revise (50-79) — automatic retry with criterion-specific feedback
  • fail (0-49) — task marked as failed

Per-criterion scores and feedback are tracked for all domains, enabling continuous rubric improvement.


Atomic Task Execution

Atomic mode decomposes a PRD (Product Requirements Document) into file-level tasks, each targeting a single file creation in 10 minutes or less. This replaces monolithic "generate the whole project" prompts with bounded, verifiable units of work.

How It Works

  1. Decomposition — The PRD is analyzed and split into atomic tasks with explicit dependencies
  2. Execution — Tasks run in dependency order, each with fresh LLM context
  3. Quality gate — Each task output is scored by the LLM Judge
  4. Retry — Tasks scoring "revise" (50-79%) are automatically re-generated with feedback
  5. Persistence — Task specs and results are stored in SQLite for resume capability

Default Configuration

All numeric defaults are centralized in src/config/engine-defaults.ts and overridable via AgentForge_{CONSTANT_NAME} environment variable.

| Setting | Default | CLI Flag | Env Override | |---------|---------|----------|--------------| | Max tasks | 50 | --atomic-max-tasks | AgentForge_DEFAULT_ATOMIC_MAX_TASKS | | Quality threshold | 55 | --quality-threshold | AgentForge_DEFAULT_QUALITY_THRESHOLD | | Max retries | 2 | — | AgentForge_DEFAULT_ATOMIC_TASK_RETRIES | | Target minutes/task | 10 | --atomic-target-minutes | AgentForge_DEFAULT_ATOMIC_TARGET_MINUTES | | Max quality retries | 1 | — | AgentForge_DEFAULT_MAX_REGEN_ATTEMPTS | | TDD mode | off | --atomic-tdd | — | | Compile gate timeout | 120s | — | AgentForge_COMPILE_GATE_TIMEOUT_MS | | Task timeout | 300s | — | AgentForge_DEFAULT_ATOMIC_TASK_TIMEOUT_MS |

TDD Mode

When --atomic-tdd is enabled, each task follows the TDD cycle:

  1. RED — Generate the test file first (expects failure)
  2. GREEN — Generate the implementation to make tests pass
  3. REFACTOR — Optional cleanup pass

CLI Flags

# Enable TDD cycles
agentforge orchestrate -p myproject --atomic-tdd

# Limit decomposition to 30 tasks
agentforge orchestrate -p myproject --atomic-max-tasks 30

# Disable atomic mode (legacy behavior)
agentforge orchestrate -p myproject --no-atomic

# Resume an interrupted workflow
agentforge orchestrate -p myproject --resume

# Stop on first failure
agentforge orchestrate -p myproject --atomic-stop-on-failure

Execution Modes

AgentForge supports four execution modes. All modes use the same agent compiler and workflow structure but differ in how tasks are dispatched.

autonomous-local (Mode 2)

Fully autonomous execution using a local LLM. Zero API cost.

agentforge orchestrate -p myproject -m autonomous-local \
  --local-endpoint http://localhost:8080

Supports llamacpp-router, Ollama, LM Studio, or any OpenAI-compatible local server. The loaded model is auto-detected from the server.

autonomous-api (Mode 1)

Fully autonomous execution using a cloud API. Highest quality output.

agentforge orchestrate -p myproject -m autonomous-api \
  --api-provider claude

Supports claude, openai, groq, nvidia, deepseek, together, and custom providers.

interactive

Human-guided execution with approval checkpoints between tasks.

agentforge orchestrate -p myproject -m interactive \
  --checkpoint-threshold 0.75

The --checkpoint-threshold (0.0-1.0) controls how often AgentForge pauses for human review. Lower values mean more checkpoints.

claude-code

Persona-based orchestration through Claude Code CLI. Generates agent personas and a launcher script.

agentforge orchestrate -p myproject -m claude-code \
  --output ./agentforge-exports

Auto-detects available CLI orchestrators (Claude Code, Gemini CLI, Ollama).


Environment Variables

Required (at least one for quality scoring)

| Variable | Purpose | |----------|---------| | ANTHROPIC_API_KEY | Claude API access (judge + autonomous-api mode) |

Judge Fallbacks (checked in order)

| Variable | Purpose | |----------|---------| | GEMINI_API_KEY or GOOGLE_API_KEY | Gemini Flash judge provider | | DEEPSEEK_API_KEY | DeepSeek judge provider | | GROQ_API_KEY | Groq judge provider |

Execution

| Variable | Purpose | |----------|---------| | LOCAL_LLM_ENDPOINT | Local LLM server URL (e.g., http://localhost:8080) | | LOCAL_LLM_MODEL | Override local model name (auto-detected if omitted) | | OPENAI_API_KEY | OpenAI provider for autonomous-api mode |

Observability (optional)

| Variable | Purpose | |----------|---------| | LANGSMITH_API_KEY | LangSmith tracing and observability |


Documentation

| Document | Description | |----------|-------------| | Quick Start | Get running in 5 minutes | | CLI Reference | Command-line usage | | Architecture | System design | | v3.0 Design Contract | Agent compiler guarantees | | Execution System | Task lifecycle and review | | Configuration | Environment and settings | | Custom Providers | Provider plugin system |


Contributing

Contributions welcome! See CONTRIBUTING.md for guidelines.

Navigating the codebase? AgentForge uses a 3-tier navigation pyramid:

  1. CLAUDE.md → Module Map — 29-row table mapping every src/ module to its purpose and entry point. Start here for any architecture or module-location question.
  2. src/*/README.md — per-module index with key files, exports, entry point, and dependency relationships.
  3. Source files — implementation detail.

If you're unsure where a capability lives, the Module Map in CLAUDE.md resolves it without grepping.

git checkout -b feature/amazing-feature
npm test
# Submit pull request

License

Apache License 2.0 - see LICENSE


In One Sentence

AgentForge turns projects into reproducible, verifiable execution pipelines.