@gaia-agent/sdk
v0.1.23
Published
Production-ready AI agent library using AI SDK v6 ToolLoopAgent for GAIA benchmarks with swappable providers
Maintainers
Readme
GAIA Super Agent SDK
🤖 Build GAIA-Benchmark-ready Super AI Agents in seconds, not weeks
Production-ready Super AI agent with 18+ tools and swappable providers
Built on AI SDK v6 ToolLoopAgent & ToolSDK.ai with ReAct reasoning
Quick Start · Features · GAIA Benchmark · Documentation
✨ Features
🚀 Zero Configuration
Pre-configured agent ready for GAIA benchmarks out of the box
🧠 ReAct Reasoning Pattern
Built-in Reasoning + Acting framework for structured thinking
� Planning & Verification
Multi-step planning + answer verification for complex tasks
�🔧 18+ Built-in Tools
Organized by category with official SDKs (Tavily, Exa, E2B, BrowserUse, Steel)
🔄 Swappable Providers
Easy provider switching for sandbox, browser, search, and memory
🌐 AI-Powered Search
Integrated Tavily and Exa for intelligent web search
🛡️ Secure Sandbox
E2B cloud sandbox with code execution + filesystem operations
🖥️ Browser Automation
Steel, BrowserUse or AWS AgentCore for web interactions
🧠 Agent Memory
Persistent memory with Mem0 or AWS AgentCore
📦 Tree-Shaking Friendly
ESM with granular exports, TypeScript-first
🎯 Why GAIA Agent?
🌟 Our Mission
Empower developers to build world-class Super AI Agents in minutes, not months.
Whether you're creating a production-ready AI assistant for your product or competing in GAIA benchmarks, GAIA Agent provides the enterprise-grade foundation you need.
❌ Traditional Approach
- Days/weeks setting up APIs
- Writing tool wrappers manually
- Error handling for each service
- Figuring out which providers to use
- Integration testing headaches
✅ With GAIA Agent
- 3 lines of code to get started
- 16 tools ready with official SDKs
- GAIA benchmark ready immediately
- Swap providers with one line
- Production-tested implementations
Time savings: From weeks of infrastructure setup → 3 lines of code
Result: A world-class, production-ready Super Agent that rivals top AI systems
🌟 What is the GAIA Benchmark?
The GAIA Benchmark is a comprehensive evaluation suite designed to test the capabilities of AI agents across a wide range of tasks, including reasoning, search, code execution, and browser automation.
🚀 Quick Start
Installation
npm install @gaia-agent/sdk ai @ai-sdk/openai zodBasic Usage
import { createGaiaAgent } from '@gaia-agent/sdk';
// Create the agent - reads from environment variables
const agent = createGaiaAgent();
const result = await agent.generate({
prompt: 'Calculate 15 * 23 and search for the latest AI papers',
});
console.log(result.text);Environment Setup
Create a .env file:
# Required
OPENAI_API_KEY=sk-...
# Default providers (at least one required)
TAVILY_API_KEY=tvly-... # Search
E2B_API_KEY=... # Sandbox
STEEL_API_KEY=steel_live_... # Browser📖 Complete environment variables guide →
🛠️ Built-in Tools
| Category | Tools | Providers | |----------|-------|-----------| | 🧮 Core | calculator, httpRequest | Built-in | | � Planning | planner, verifier | Built-in | | �🔍 Search | tavilySearch, exaSearch, exaGetContents | Tavily (default), Exa | | 🛡️ Sandbox | e2bSandbox, sandockExecute | E2B (default), Sandock | | 🖥️ Browser | steelBrowser, browserUseTool, awsBrowser | Steel (default), BrowserUse, AWS | | 🧠 Memory | mem0Remember, mem0Recall, memoryStore | Mem0 (default), AWS AgentCore |
📖 Full tools documentation →
📖 Provider comparison →
📖 ReAct + Planning guide → ⭐ NEW
🔄 Swap Providers
Switch providers with one line:
import { createGaiaAgent } from '@gaia-agent/sdk';
const agent = createGaiaAgent({
providers: {
search: 'exa', // Use Exa instead of Tavily
sandbox: 'sandock', // Use Sandock instead of E2B
browser: 'browseruse', // Use BrowserUse instead of Steel
},
});Or set via environment variables:
GAIA_AGENT_SEARCH_PROVIDER=exa
GAIA_AGENT_SANDBOX_PROVIDER=sandock
GAIA_AGENT_BROWSER_PROVIDER=browseruse🎯 GAIA Benchmark
Run official GAIA benchmarks with enhanced results tracking:
# Basic benchmark
pnpm benchmark # Run validation set
pnpm benchmark --limit 10 # Test with 10 tasks
# Resume interrupted runs
pnpm benchmark --resume # Continue from checkpoint
# Filter by capability
pnpm benchmark:files # Tasks with file attachments
pnpm benchmark:code # Code execution tasks
pnpm benchmark:search # Web search tasks
pnpm benchmark:browser # Browser automation tasks
# Stream mode (real-time thinking)
pnpm benchmark:random --stream # Watch agent think in real-time
# Wrong answers collection
pnpm benchmark:wrong # Retry only failed tasks📚 Wrong Answers Collection
Automatically track and retry failed tasks:
# 1. Run benchmark (auto-creates wrong-answers.json)
pnpm benchmark --limit 20
# 2. View wrong answers
cat benchmark-results/wrong-answers.json
# 3. Retry only failed tasks
pnpm benchmark:wrong --verbose
# 4. Keep retrying until all pass
pnpm benchmark:wrong
# → "🎉 No wrong answers! All previous tasks passed."📖 Wrong answers guide →
📖 Resume feature guide →
📖 Benchmark module docs →
📖 GAIA setup guide →
📊 Enhanced Benchmark Results
Benchmark results now include full task details:
{
"taskId": "abc123",
"question": "What year was X founded?",
"level": 2,
"files": ["image.png"],
"answer": "1927",
"expectedAnswer": "1927",
"correct": true,
"durationMs": 5234,
"steps": 3,
"toolsUsed": ["search", "browser"],
"summary": {
"totalToolCalls": 5,
"uniqueTools": ["search", "browser", "calculator"],
"hadError": false
},
"stepDetails": [ /* ... */ ]
}Easier to analyze and debug! 🎉
📈 Benchmark Results
Latest benchmark performance across different task categories:
| Benchmark Command | Timestamp | Results | Accuracy | Model | Providers | Details |
|-------------------|-----------|---------|----------|-------|-----------|---------|
| pnpm benchmark | 2025-11-26 08:33 | 22/53 | 41.51% | gpt-4o | Search: tavily, Sandbox: e2b, Browser: steel | View Details |
| pnpm benchmark:level1 | 2025-11-27 10:38 | 16/53 | 30.19% | gpt-4o | Search: openai, Sandbox: e2b, Browser: steel, Memory: mem0 | - |
| pnpm benchmark:level1 | 2025-12-03 04:12 | 21/53 | 39.62% | Claude Sonnet 4.5 | Search: openai, Sandbox: e2b, Browser: steel, Memory: mem0 | View Details |
📖 See detailed task-by-task results →
Note: Benchmark results are automatically updated after each benchmark run.
🧪 Testing
Run unit tests with Vitest:
pnpm test # Run all tests
pnpm test:watch # Watch mode
pnpm test:coverage # Coverage report🎯 Advanced Usage
Custom Tools
import { createGaiaAgent, getDefaultTools } from '@gaia-agent/sdk';
import { tool } from 'ai';
import { z } from 'zod';
const agent = createGaiaAgent({
tools: {
...getDefaultTools(),
weatherTool: tool({
description: 'Get weather',
inputSchema: z.object({ city: z.string() }),
execute: async ({ city }) => ({ temp: 72, condition: 'sunny' }),
}),
},
});ToolSDK Integration
Integrate thousands of tools from ToolSDK.ai ecosystem:
import { createGaiaAgent, getDefaultTools } from '@gaia-agent/sdk';
import { ToolSDKApiClient } from 'toolsdk/api'; // npm install toolsdk
// Initialize ToolSDK client
const toolSDK = new ToolSDKApiClient({ apiKey: process.env.TOOLSDK_AI_API_KEY });
// Load tools from ToolSDK packages
const emailTool = await toolSDK.package('@toolsdk.ai/mcp-send-email', {
RESEND_API_KEY: process.env.RESEND_API_KEY,
}).getAISDKTool("send-email");
const agent = createGaiaAgent({
tools: {
...getDefaultTools(),
emailTool
},
});
const result = await agent.generate({
prompt: 'Help me search for the latest AI news and send it to [email protected]',
});Extend GAIAAgent Class
import { GAIAAgent } from '@gaia-agent/sdk';
class ResearchAgent extends GAIAAgent {
constructor() {
super({
instructions: 'Research assistant specialized in AI papers',
additionalTools: { /* custom tools */ },
});
}
}📖 Advanced usage guide →
📖 API reference →
📚 Documentation
📖 Guides
- Quick Start Guide - Get started in 5 minutes
- ReAct + Planning Guide ⭐ NEW - Enhanced reasoning & planning
- Reflection Guide ⭐ NEW - Step-by-step reflection (optional)
- Environment Variables - Complete configuration guide
- GAIA Benchmark - Requirements, setup, tips
- Improving GAIA Scores - Strategies for better performance & self-evolution
- Wrong Answers Collection - Error tracking and retry
- Provider Comparison - Detailed provider comparison
🔧 Reference
- API Reference - Complete API documentation
- Tools Reference - All available tools
- Advanced Usage - Extension examples, patterns
- Benchmark Module - Modular architecture
- Testing Guide - Unit tests with Vitest
🤝 Contributing
This project uses automated NPM publishing. When changes are merged to main:
- ✅ Tests run automatically
- 📦 Version bumps to next patch (e.g., 0.1.0 → 0.1.1)
- 📝 Changelog created in
changelog/ - 🚀 Published to NPM
- 🏷️ Git tag created
For manual version bumps (minor/major), see docs/NPM_PUBLISH_SETUP.md.
📄 License
Apache License 2.0
