npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2025 – Pkg Stats / Ryan Hefner

@gaia-agent/sdk

v0.1.23

Published

Production-ready AI agent library using AI SDK v6 ToolLoopAgent for GAIA benchmarks with swappable providers

Readme

GAIA Super Agent SDK

🤖 Build GAIA-Benchmark-ready Super AI Agents in seconds, not weeks

Production-ready Super AI agent with 18+ tools and swappable providers
Built on AI SDK v6 ToolLoopAgent & ToolSDK.ai with ReAct reasoning

npm version License TypeScript AI SDK

Quick Start · Features · GAIA Benchmark · Documentation


✨ Features

🚀 Zero Configuration

Pre-configured agent ready for GAIA benchmarks out of the box

🧠 ReAct Reasoning Pattern

Built-in Reasoning + Acting framework for structured thinking

� Planning & Verification

Multi-step planning + answer verification for complex tasks

�🔧 18+ Built-in Tools

Organized by category with official SDKs (Tavily, Exa, E2B, BrowserUse, Steel)

🔄 Swappable Providers

Easy provider switching for sandbox, browser, search, and memory

🌐 AI-Powered Search

Integrated Tavily and Exa for intelligent web search

🛡️ Secure Sandbox

E2B cloud sandbox with code execution + filesystem operations

🖥️ Browser Automation

Steel, BrowserUse or AWS AgentCore for web interactions

🧠 Agent Memory

Persistent memory with Mem0 or AWS AgentCore

📦 Tree-Shaking Friendly

ESM with granular exports, TypeScript-first


🎯 Why GAIA Agent?

🌟 Our Mission

Empower developers to build world-class Super AI Agents in minutes, not months.

Whether you're creating a production-ready AI assistant for your product or competing in GAIA benchmarks, GAIA Agent provides the enterprise-grade foundation you need.

❌ Traditional Approach

  • Days/weeks setting up APIs
  • Writing tool wrappers manually
  • Error handling for each service
  • Figuring out which providers to use
  • Integration testing headaches

✅ With GAIA Agent

  • 3 lines of code to get started
  • 16 tools ready with official SDKs
  • GAIA benchmark ready immediately
  • Swap providers with one line
  • Production-tested implementations

Time savings: From weeks of infrastructure setup → 3 lines of code

Result: A world-class, production-ready Super Agent that rivals top AI systems

🌟 What is the GAIA Benchmark?

The GAIA Benchmark is a comprehensive evaluation suite designed to test the capabilities of AI agents across a wide range of tasks, including reasoning, search, code execution, and browser automation.

📖 Read more about GAIA →


🚀 Quick Start

Installation

npm install @gaia-agent/sdk ai @ai-sdk/openai zod

Basic Usage

import { createGaiaAgent } from '@gaia-agent/sdk';

// Create the agent - reads from environment variables
const agent = createGaiaAgent();

const result = await agent.generate({
  prompt: 'Calculate 15 * 23 and search for the latest AI papers',
});

console.log(result.text);

Environment Setup

Create a .env file:

# Required
OPENAI_API_KEY=sk-...

# Default providers (at least one required)
TAVILY_API_KEY=tvly-...      # Search
E2B_API_KEY=...              # Sandbox
STEEL_API_KEY=steel_live_... # Browser

📖 Complete environment variables guide →


🛠️ Built-in Tools

| Category | Tools | Providers | |----------|-------|-----------| | 🧮 Core | calculator, httpRequest | Built-in | | � Planning | planner, verifier | Built-in | | �🔍 Search | tavilySearch, exaSearch, exaGetContents | Tavily (default), Exa | | 🛡️ Sandbox | e2bSandbox, sandockExecute | E2B (default), Sandock | | 🖥️ Browser | steelBrowser, browserUseTool, awsBrowser | Steel (default), BrowserUse, AWS | | 🧠 Memory | mem0Remember, mem0Recall, memoryStore | Mem0 (default), AWS AgentCore |

📖 Full tools documentation →
📖 Provider comparison →
📖 ReAct + Planning guide → ⭐ NEW


🔄 Swap Providers

Switch providers with one line:

import { createGaiaAgent } from '@gaia-agent/sdk';

const agent = createGaiaAgent({
  providers: {
    search: 'exa',              // Use Exa instead of Tavily
    sandbox: 'sandock',         // Use Sandock instead of E2B
    browser: 'browseruse',      // Use BrowserUse instead of Steel
  },
});

Or set via environment variables:

GAIA_AGENT_SEARCH_PROVIDER=exa
GAIA_AGENT_SANDBOX_PROVIDER=sandock
GAIA_AGENT_BROWSER_PROVIDER=browseruse

🎯 GAIA Benchmark

Run official GAIA benchmarks with enhanced results tracking:

# Basic benchmark
pnpm benchmark                  # Run validation set
pnpm benchmark --limit 10       # Test with 10 tasks

# Resume interrupted runs
pnpm benchmark --resume         # Continue from checkpoint

# Filter by capability
pnpm benchmark:files            # Tasks with file attachments
pnpm benchmark:code             # Code execution tasks
pnpm benchmark:search           # Web search tasks
pnpm benchmark:browser          # Browser automation tasks

# Stream mode (real-time thinking)
pnpm benchmark:random --stream  # Watch agent think in real-time

# Wrong answers collection
pnpm benchmark:wrong            # Retry only failed tasks

📚 Wrong Answers Collection

Automatically track and retry failed tasks:

# 1. Run benchmark (auto-creates wrong-answers.json)
pnpm benchmark --limit 20

# 2. View wrong answers
cat benchmark-results/wrong-answers.json

# 3. Retry only failed tasks
pnpm benchmark:wrong --verbose

# 4. Keep retrying until all pass
pnpm benchmark:wrong
# → "🎉 No wrong answers! All previous tasks passed."

📖 Wrong answers guide →
📖 Resume feature guide →
📖 Benchmark module docs →
📖 GAIA setup guide →


📊 Enhanced Benchmark Results

Benchmark results now include full task details:

{
  "taskId": "abc123",
  "question": "What year was X founded?",
  "level": 2,
  "files": ["image.png"],
  "answer": "1927",
  "expectedAnswer": "1927",
  "correct": true,
  "durationMs": 5234,
  "steps": 3,
  "toolsUsed": ["search", "browser"],
  "summary": {
    "totalToolCalls": 5,
    "uniqueTools": ["search", "browser", "calculator"],
    "hadError": false
  },
  "stepDetails": [ /* ... */ ]
}

Easier to analyze and debug! 🎉


📈 Benchmark Results

Latest benchmark performance across different task categories:

| Benchmark Command | Timestamp | Results | Accuracy | Model | Providers | Details | |-------------------|-----------|---------|----------|-------|-----------|---------| | pnpm benchmark | 2025-11-26 08:33 | 22/53 | 41.51% | gpt-4o | Search: tavily, Sandbox: e2b, Browser: steel | View Details | | pnpm benchmark:level1 | 2025-11-27 10:38 | 16/53 | 30.19% | gpt-4o | Search: openai, Sandbox: e2b, Browser: steel, Memory: mem0 | - | | pnpm benchmark:level1 | 2025-12-03 04:12 | 21/53 | 39.62% | Claude Sonnet 4.5 | Search: openai, Sandbox: e2b, Browser: steel, Memory: mem0 | View Details |

📖 See detailed task-by-task results →

Note: Benchmark results are automatically updated after each benchmark run.


🧪 Testing

Run unit tests with Vitest:

pnpm test                # Run all tests
pnpm test:watch          # Watch mode
pnpm test:coverage       # Coverage report

📖 Testing guide →


🎯 Advanced Usage

Custom Tools

import { createGaiaAgent, getDefaultTools } from '@gaia-agent/sdk';
import { tool } from 'ai';
import { z } from 'zod';

const agent = createGaiaAgent({
  tools: {
    ...getDefaultTools(),
    weatherTool: tool({
      description: 'Get weather',
      inputSchema: z.object({ city: z.string() }),
      execute: async ({ city }) => ({ temp: 72, condition: 'sunny' }),
    }),
  },
});

ToolSDK Integration

Integrate thousands of tools from ToolSDK.ai ecosystem:

import { createGaiaAgent, getDefaultTools } from '@gaia-agent/sdk';
import { ToolSDKApiClient } from 'toolsdk/api'; // npm install toolsdk

// Initialize ToolSDK client
const toolSDK = new ToolSDKApiClient({ apiKey: process.env.TOOLSDK_AI_API_KEY });

// Load tools from ToolSDK packages
const emailTool = await toolSDK.package('@toolsdk.ai/mcp-send-email', {
  RESEND_API_KEY: process.env.RESEND_API_KEY,
}).getAISDKTool("send-email");

const agent = createGaiaAgent({
  tools: {
    ...getDefaultTools(),
    emailTool
  },
});

const result = await agent.generate({
  prompt: 'Help me search for the latest AI news and send it to [email protected]',
});

📖 ToolSDK Packages →

Extend GAIAAgent Class

import { GAIAAgent } from '@gaia-agent/sdk';

class ResearchAgent extends GAIAAgent {
  constructor() {
    super({
      instructions: 'Research assistant specialized in AI papers',
      additionalTools: { /* custom tools */ },
    });
  }
}

📖 Advanced usage guide →
📖 API reference →


📚 Documentation

📖 Guides

🔧 Reference


🤝 Contributing

This project uses automated NPM publishing. When changes are merged to main:

  1. ✅ Tests run automatically
  2. 📦 Version bumps to next patch (e.g., 0.1.0 → 0.1.1)
  3. 📝 Changelog created in changelog/
  4. 🚀 Published to NPM
  5. 🏷️ Git tag created

For manual version bumps (minor/major), see docs/NPM_PUBLISH_SETUP.md.


📄 License

Apache License 2.0