@dastbal/nestjs-ai-agent
v1.5.0
Published
Autonomous AI Agent for NestJS - Principal Software Engineer Level with RAG and SQLite persistence
Readme
NestJS AI Agent Lib
Built with ❤️ by David Balladares — Principal Software Engineer level autonomous agent for NestJS.
An autonomous AI agent framework designed specifically for NestJS projects. It analyzes, plans, writes, and verifies code with specialized subagents, all accessible via a premium streaming CLI. Leverages Google Gemini (Vertex AI) or local Ollama models without requiring API keys for local inference.
Table of Contents
- Overview
- Key Features
- Getting Started with NestJS
- CLI — Interactive Streaming Sessions
- Agent Modes
- Architecture
- Core Concepts
- Project Structure
- Roadmap
- License
Overview
This library empowers your NestJS applications with an autonomous AI agent capable of:
- 📋 Intelligent Planning: Classifies task complexity (SMALL/MEDIUM/LARGE) and plans execution accordingly.
- 🔍 Codebase Analysis: Performs semantic search over your project using Retrieval-Augmented Generation (RAG) for deep understanding (X-Ray strategy).
- 💾 Safe Code Writing: Writes code with automatic backups before every file modification.
- 🧪 Automated Testing: Integrates with Jest and
tsc --noEmitfor TDD, self-correcting on failures. - 🤖 Subagent Delegation: Spawns specialized "Researcher" and "Coder" subagents for complex tasks.
- ✋ Human-in-the-Loop (HITL): Prompts for approval on critical operations like file deletion or infrastructure changes.
- 💬 Persistent Memory: Maintains full conversation history via SQLite, enabling continuation across named sessions.
- 🧠 Context Compression: Automatically summarizes long conversations to prevent context overflow.
- 🎨 Beautiful Output: Renders responses in markdown with rich formatting (chalk, icons, code blocks).
- ⚙️ Autonomous Execution: Executes full plans without requiring manual
yes/noconfirmations. - 🩹 Self-Healing: Recovers automatically from corrupted session states.
- 🦙 Local LLM Support: Full integration with Ollama, allowing use of models like Gemma4, Qwen3.6, Llama3.2 locally — free, offline, and no API key needed.
Key Features
- NestJS Native: Designed from the ground up for NestJS projects.
- Domain-Driven Design (DDD) Support: Understands and can generate code following DDD principles.
- Architecture Aware: Can analyze and refactor code while respecting architectural boundaries.
- TDD Workflow: Integrates seamlessly with Jest for Test-Driven Development.
- Multiple LLM Backends: Supports Google Gemini (cloud) and Ollama (local).
- Codebase Indexing (RAG): Enables the agent to understand your project's structure and code through semantic search.
- Safety First: Robust file system safety, HITL approvals for destructive actions.
- Efficient CLI: Real-time token streaming and interactive model switching.
- Skills System (v1.4): 12 keyword-triggered skills — the agent automatically loads the right guide for every task (DDD module, tests, refactor, security audit, architecture validation, and more). Base prompt stays lean regardless of how many skills exist.
- Mentor Mode (v1.4): Always-on lightweight mentoring (root cause + trade-off on every response) plus a deep
/mentortoggle for Socratic dialogue, Forced Output Contract, and architectural decision explanations. - AGENTS.md Context Tiering (v1.4): Separate context files —
ANTIGRAVITY.mdfor the human,AGENTS.mdfor the agent — following OpenHands Context Tiering best practice.
Getting Started with NestJS
Option A — Ollama (Local, Free, No API Key) 🦙
The recommended and fastest way to start, running entirely on your machine.
1. Install Ollama and pull a model:
# Recommended model for a balance of quality and performance
ollama pull gemma4 # ~10 GB
# Or, for faster inference with lower RAM usage:
ollama pull gemma4:e2b # ~7 GB
# Or, for strong reasoning with a compact model:
ollama pull qwen3.6 # ~4 GB2. Install project dependencies:
npm install3. Configure your environment variables:
Create a .env.development file in the project root:
# .env.development
# Use the model you pulled with Ollama
AGENT_MODEL=ollama:gemma4
# Optional: Only if Ollama runs on a non-default port (e.g., 11434)
# OLLAMA_BASE_URL=http://localhost:114344. Run the agent:
npm run agent -- deepThat's it! No Google account or API key needed.
Tip: Inside the agent session, type
/modelto interactively switch between Ollama models or even to Gemini cloud models if you configure them.
Option B — Google Gemini (Cloud)
Leverages Google's powerful Vertex AI models. Requires authentication.
Option B1 — Your personal Google account (Recommended for local development) ✅
# 1. Install the Google Cloud SDK: https://cloud.google.com/sdk/docs/install
# 2. Authenticate and set your GCP project:
gcloud auth application-default login --project YOUR_GCP_PROJECT_ID
# 3. Configure your environment variables:
# Create or update .env.development:
# AGENT_MODEL=gemini-2.5-flash-lite
# 4. Run the agent:
npm run agent -- deepOption B2 — Service Account (CI/CD, Production)
Required IAM Role:
roles/aiplatform.user(Vertex AI User)
Assign this role to your service account in the GCP Console: IAM & Admin → IAM → Grant Access.
# .env.development
# Path to your service account key file
GOOGLE_APPLICATION_CREDENTIALS=/absolute/path/to/your/service-account.json
# Choose your Gemini model
AGENT_MODEL=gemini-2.5-flash-liteCLI — Interactive Streaming Sessions
The agent provides an interactive CLI experience similar to other advanced chatbots, with real-time token streaming and clear status indicators for tool execution.
╭────────────────────────────────────────────────╮
│ │
│ NestJS AI Agent — Deep Mode │
│ Single autonomous agent with planning tools │
│ Model: ollama:gemma4 │
│ Session: auth-module (continuing) │
│ Type your task. Ctrl+C to exit. │
│ │
╰────────────────────────────────────────────────╯
You: Create a UsersModule following DDD principles.
⠋ Thinking...
╭─ 📋 write_todos
│ └─ Creating implementation plan...
╰─ ✓ done in 1.2s
╭─ 🔍 ask_codebase
│ └─ How is AuthModule structured for DDD?
╰─ ✓ done in 3.4s
Agent: I will create a UsersModule following the same DDD pattern as AuthModule...
(tokens stream as they are generated)
──────────────────────────────────────────────────
You: ▋Session Management
Manage conversation history and context using session IDs.
| Command | Behavior |
| :-------------------------------------- | :-------------------------------------------------------------------- |
| npm run agent -- deep | Ephemeral — Starts a fresh session each time. |
| npm run agent -- deep --session auth | Persistent — Reopens or creates the auth session context. |
| npm run agent -- orchestrate --session feature-x | Same persistence for the orchestrator mode. |
| npm run agent -- deep "Your task" | Starts an ephemeral session with an initial human message. |
| npm run agent -- deep --session session-name "Your task" | Starts/resumes a named session with an initial message. |
Note: Session data is stored in
.agent/deep_agent_history.dband.agent/orchestrator_history.db.
Switching Models — /model command
Interact with the agent and switch LLM models on-the-fly without losing your current session context.
You: /model
╭────────────────────────────────────────────────────╮
│ 🔧 Switch LLM Model │
│ Type the number and press Enter. │
│ Press 0 or Enter to cancel. │
╰────────────────────────────────────────────────────╯
Select Provider:
1. ⚡ Vertex AI (Gemini cloud — requires Google credentials)
2. 🦙 Ollama (Local models — free, no API key needed) ← active
Provider: 2
Detecting Ollama models... ✓ (4 found)
Select Ollama Model:
1. gemma4:26b (17 GB)
2. gemma4:e2b (7.2 GB)
3. gemma4:e4b (9.6 GB)
4. gemma4 (9.6 GB)
Model: 1
✅ Switching to ollama:gemma4:26b
💾 Saved to .env
🔄 Restarting agent with new model...The selected model is automatically saved to your .env file for future sessions.
Slash Commands
| Command | Description | State |
|---|---|---|
| /model | Switch the active LLM model interactively (Ollama or Vertex AI) | — |
| /mentor | Toggle deep mentor mode — Forced Output Contract, trade-off analysis, Socratic gates | [ON] / [OFF] |
| /help | Show all available slash commands with their current state | — |
| Ctrl+C | Exit the session cleanly | — |
Mentor Mode in depth
The agent operates with two levels of mentoring:
Level 1 — Always ON (built into the base prompt) Every fix, implementation, or architectural decision includes:
- Root Cause — why it broke (not just what)
- Why this approach — rationale over alternatives for significant decisions
- Trade-off — what's accepted or limited
For changes touching >5 files or public API contracts, the agent pauses and uses ask_human before implementing.
Level 2 — /mentor deep mode
Type /mentor to activate the full skills/mentor-mode.md:
- Forced Output Contract — explicit rationale + trade-offs before every code block
- Architectural Escalation Gate — presents alternatives rejected and why
- Ask-Before HITL Gate — confirms plan before big changes
- Socratic Check — asks if you want to go deeper before implementing concepts
- Pattern Name Callout — names the design pattern being applied (Repository, DDD, CQRS, etc.)
Type /mentor again to return to standard mode. The always-on Level 1 mentor remains active.
Type mentor, teach me, explain why, or trade-off naturally in a message to auto-trigger mentor mode via Progressive Disclosure.
LLM Switching via env var
Alternatively, set the AGENT_MODEL environment variable before running the agent.
# Windows PowerShell
# ── Ollama (local, free) ──────────────────────────────────────────
# Balanced quality/performance
$env:AGENT_MODEL="ollama:gemma4"; npm run agent -- deep
# Fast, low RAM
$env:AGENT_MODEL="ollama:gemma4:e2b"; npm run agent -- deep
# High quality (large download)
$env:AGENT_MODEL="ollama:gemma4:26b"; npm run agent -- deep
# Strong reasoning, compact
$env:AGENT_MODEL="ollama:qwen3.6"; npm run agent -- deep
# General purpose offline
$env:AGENT_MODEL="ollama:llama3.2"; npm run agent -- deep
# ── Vertex AI (cloud) ─────────────────────────────────────────────
# Fast & cheap (default if no GOOGLE_APPLICATION_CREDENTIALS)
$env:AGENT_MODEL="gemini-2.5-flash-lite"; npm run agent -- deep
# Balanced speed + quality
$env:AGENT_MODEL="gemini-2.5-flash"; npm run agent -- deep
# Max capability (architecture, complex refactors)
$env:AGENT_MODEL="gemini-2.5-pro"; npm run agent -- orchestrate# Linux / macOS
# Ollama examples
AGENT_MODEL=ollama:gemma4 npm run agent -- deep
AGENT_MODEL=ollama:qwen3.6 npm run agent -- deep
# Gemini examples
AGENT_MODEL=gemini-2.5-flash-lite npm run agent -- deep
AGENT_MODEL=gemini-2.5-pro npm run agent -- orchestrateAvailable Model Tiers:
| Tier Alias | Model String | Provider | Best For |
| :--------- | :----------------------- | :--------- | :------------------------------------- |
| gemma | ollama:gemma4 | 🦙 Local | Best local model for general coding |
| gemma-2b | ollama:gemma4:e2b | 🦙 Local | Fast, low RAM (~7 GB) |
| gemma-4b | ollama:gemma4:e4b | 🦙 Local | Balance speed/quality (~9.6 GB) |
| gemma-26b| ollama:gemma4:26b | 🦙 Local | Max quality (~17 GB) |
| qwen | ollama:qwen3.6 | 🦙 Local | Strong reasoning, compact (~4 GB) |
| local | ollama:llama3.2 | 🦙 Local | General purpose offline |
| lite | gemini-3.1-flash-lite | ⚡ Cloud | Quick edits, Q&A (cheapest) |
| flash | gemini-3.5-flash | ⚡ Cloud | Balanced speed + quality (recommended) |
| pro | gemini-3.1-pro | ⚡ Cloud | Architecture, complex refactors |
Embeddings Note: For Retrieval-Augmented Generation (RAG), the agent consistently uses Vertex AI's
text-embedding-004model, regardless of the chat model selected. This ensures a stable and high-quality codebase index even when switching between local Ollama and cloud Gemini models.
Agent Modes
Deep Agent (deep) — Single Autonomous Agent
Ideal for most day-to-day tasks: debugging, code analysis, single-file modifications, quick questions, and medium-complexity features.
# Start an ephemeral session
npm run agent -- deep
# Start a persistent session named "my-feature"
npm run agent -- deep --session my-feature
# Ask a specific question about a file
npm run agent -- deep "explain src/core/agent/deep-agent-factory.ts"Task Sizing: The agent automatically classifies tasks before execution:
- SMALL: (1-2 files, straightforward changes) → Executes directly (Read → Write → Done). Max 3 tool calls.
- MEDIUM: (3+ files, new feature) → Creates a brief
write_todosplan and executes it. - LARGE: (Entire module, major refactor) → Follows a detailed, step-by-step plan using
write_todos.
Core Tools:
write_todos: Plans and tracks multi-step tasks.list_files: Lists directory contents.safe_read_file: Reads file content safely.safe_write_file: Writes to files with automatic backups.ask_codebase: Performs semantic search over your codebase using RAG.refresh_project_index: Rebuilds the RAG index (e.g., after bulk file writes).run_integrity_check: Runstsc --noEmitto ensure type safety.run_tests: Executes Jest test suites.
Orchestrator (orchestrate) — Multi-SubAgent Coordinator
Best suited for complex, large-scale tasks such as implementing entire modules, significant refactoring efforts, or adding major features that span multiple files and components.
# Start an ephemeral orchestrator session
npm run agent -- orchestrate
# Start a persistent session for a major refactor
npm run agent -- orchestrate --session big-refactorMandatory Workflow: The orchestrator strictly follows a predefined protocol to ensure thoroughness and quality:
write_todos: Creates a comprehensive plan covering analysis, implementation, and verification.task(researcher): Delegates analysis to theresearchersubagent, which examines the codebase and produces a detailed implementation plan.task(coder): Delegates implementation to thecodersubagent, which follows the researcher's plan, adheres to Test-Driven Development (TDD), and writes tests before implementation.run_integrity_check: Verifies that the entire project is free of TypeScript errors after thecoderfinishes.
Subagents:
- Researcher: A read-only analyst. Uses tools like
ask_codebaseandsafe_read_fileto understand the project and generate structured plans. - Coder: An implementation specialist. Uses
safe_write_file,run_tests, andrun_integrity_check. Writes.spec.tsfiles first, then implements the corresponding code. Self-corrects up to 3 times upon test failures.
Architecture
graph TD
subgraph CLI Interface
A[Interactive Stream]:::cli --> B(Session Management);
B --> C[/model Command];
B --> D[Model Switching Logic];
B --> E[Agent Mode Selection];
end
subgraph Agent Core
E --> F(DeepAgentFactory);
F --> G[LLMProvider];
G -- Ollama --> H(OllamaChatAdapter);
G -- Gemini --> I(ChatVertexAI);
F -- Simple Agent --> J(createDeepAgent);
F -- Orchestrator --> K(createDeepAgent);
K --> L[Researcher Subagent];
K --> M[Coder Subagent];
end
subgraph Services & Tools
J --> N(Core Tools);
K --> N;
N --> O(SafeFilesystemBackend);
N --> P(RAG IndexerService);
N --> Q(Checkpointer SqliteSaver);
J --> Q;
K --> Q;
end
classDef cli fill:#4CAF50,stroke:#333,stroke-width:2px;
classDef session fill:#FFC107,stroke:#333,stroke-width:2px;
classDef modelcmd fill:#2196F3,stroke:#333,stroke-width:2px;
classDef mode fill:#FF9800,stroke:#333,stroke-width:2px;
classDef factory fill:#9C27B0,stroke:#333,stroke-width:2px;
classDef subagent fill:#00BCD4,stroke:#333,stroke-width:2px;
class A cli;
class B session;
class C modelcmd;
class E mode;
class F factory;
class L,M subagent;- CLI: Handles user interaction, model switching (
/model), and session management. - Agent Core:
DeepAgentFactoryorchestrates agent creation, routing requests to either a simpleDeepAgentor a multi-subagentOrchestrator. - LLM Integration:
LLMProviderroutes requests toOllamaChatAdapterfor local models orChatVertexAIfor cloud models. - Services & Tools: Provides core functionalities like safe file operations, RAG indexing, conversation persistence (SQLite), and the underlying LLM tooling.
Core Concepts
NestJS Integration
This library is built with NestJS in mind. It understands NestJS conventions for project structure, modules, services, controllers, and DDD. When you ask the agent to perform tasks like "create a user module" or "add authentication to this service," it leverages its knowledge of NestJS patterns to generate appropriate, idiomatic code.
Safety Features
safe_write_file: Before writing any file, the agent creates a timestamped backup in.agent/backups/. This ensures you can always revert to the previous version if the agent's changes are not as expected.- Project Root Sandboxing: The agent operates strictly within the project's root directory. It cannot access or modify files outside this scope.
- Human-in-the-Loop (HITL): For potentially destructive operations (e.g., deleting files/directories, dropping database tables, modifying infrastructure files like
docker-compose.ymlor.env.production), the agent will pause and explicitly ask for your approval.
RAG X-Ray Strategy
The agent uses Retrieval-Augmented Generation (RAG) to understand your codebase:
- Indexing: The
IndexerServicescans yoursrc/directory on startup. This index is lazily updated — it only rebuilds if it's older than 5 minutes, ensuring fast agent startup times. - Semantic Search: When you ask questions about your code, the
ask_codebasetool performs a vector similarity search against the index. - Contextual Understanding: The search results provide relevant code snippets and dependency information, giving the LLM a deep understanding of your project's structure and logic.
- Ollama Mode: If you're using Ollama without Google Cloud credentials, RAG indexing is gracefully skipped. The agent will still function but without the codebase-aware semantic search capabilities.
Project Structure
The library follows a clean, modular structure:
/nestjs-ai-agent-lib
├── src/
│ ├── bin/ # CLI entry points (deep, orchestrate)
│ │ └── cli.ts
│ ├── core/ # Core agent logic & services
│ │ ├── agent/ # Agent factories (DeepAgentFactory)
│ │ │ ├── factory.ts # Legacy ReAct agent
│ │ │ ├── graph-factory.ts # Legacy StateGraph agent
│ │ │ └── deep-agent-factory.ts # ⭐ Active: Creates DeepAgent & Orchestrator
│ │ ├── config/ # Configuration loading (env vars, model resolution)
│ │ │ ├── model-resolver.ts
│ │ │ └── model-switcher.ts
│ │ ├── llm/ # LLM provider routing & adapters
│ │ │ ├── provider.ts
│ │ │ └── ollama-adapter.ts # Handles Ollama's specific API requirements
│ │ ├── subagents/ # Specialized agents for orchestration
│ │ │ ├── coder.subagent.ts
│ │ │ └── researcher.subagent.ts
│ │ ├── rag/ # Retrieval-Augmented Generation
│ │ │ └── indexer.ts # Codebase indexing service
│ │ └── tools/ # Custom tool implementations (safe file ops, etc.)
│ │ └── index.ts
│ ├── presentation/ # CLI UI and presentation logic
│ │ ├── cli/
│ │ │ ├── chat-session.ts # Main interactive loop + slash command dispatcher
│ │ │ ├── model-menu.ts # Interactive model selection UI
│ │ │ ├── stream-renderer.ts # Output formatting (tokens, tools, Agent header)
│ │ │ ├── markdown-renderer.ts # Markdown → chalk styled terminal output
│ │ │ └── theme.ts # CLI styling and icons
│ │ └── index.ts
├── skills/ # ⭐ Agent skills — keyword-triggered, read-only
│ ├── create-ddd-module.md # DDD module creation protocol
│ ├── write-tests.md # TDD & Jest spec templates
│ ├── refactor-safely.md # Inside-out refactor, find callers first
│ ├── create-endpoint.md # REST endpoint + DTO + Swagger
│ ├── debug-typescript.md # TS error lookup table & fix protocol
│ ├── analyze-codebase.md # Read-only RAG analysis mode
│ ├── evaluate-own-work.md # Self-review checklist before "done"
│ ├── git-workflow.md # Conventional commits & version branching
│ ├── security-audit.md # OWASP API Top 10 for NestJS
│ ├── research-output-format.md # Structured Researcher→Coder handoff (MetaGPT SOP)
│ ├── validate-architecture-boundaries.md # DDD forbidden import detector
│ └── mentor-mode.md # Deep mentor: Forced Output Contract + Socratic gates
├── AGENTS.md # ⭐ Project context for AI agents (read-only)
├── ANTIGRAVITY.md # ADR log & work history for the human developer
├── .agent/ # Agent runtime data
│ ├── deep_agent_history.db # SQLite DB for named deep agent sessions
│ ├── orchestrator_history.db # SQLite DB for named orchestrator sessions
│ ├── index.meta.json # Timestamp for RAG index freshness
│ └── backups/ # Timestamped backups before each file write
├── .env.development # Example environment file
├── package.json
└── tsconfig.jsonRoadmap
| Phase | Feature | Status |
|---|---|---|
| v1.0 | Foundational Deep Agent (deep mode) | ✅ Done |
| | Basic LLM switching (env var) | ✅ Done |
| | Core tools (filesystem, RAG basic) | ✅ Done |
| v1.1 | Orchestrator (orchestrate mode) | ✅ Done |
| | Researcher & Coder subagents | ✅ Done |
| | TDD workflow enforcement | ✅ Done |
| v1.2 | Advanced CLI Features | ✅ Done |
| | Interactive /model switching | ✅ Done |
| | Session persistence & management | ✅ Done |
| | Context compression | ✅ Done |
| | Safety: Auto-recovery, HITL | ✅ Done |
| v1.3 | Ollama Local Inference Support | ✅ Done |
| | Full multi-provider routing | ✅ Done |
| | OllamaChatAdapter for compatibility | ✅ Done |
| | Support for gemma4, qwen3.6, llama3.2 | ✅ Done |
| v1.4 | Skills System & Mentor Mode | ✅ Done |
| | 12 keyword-triggered skills (skills/*.md) | ✅ Done |
| | Progressive Disclosure — keyword map in base prompt | ✅ Done |
| | FILE PROTECTION LAW (skills + ANTIGRAVITY + AGENTS.md) | ✅ Done |
| | AGENTS.md Context Tiering (agent vs. human save state) | ✅ Done |
| | Always-on lightweight mentor (base prompt invariant) | ✅ Done |
| | /mentor deep mode (Forced Output Contract + Socratic gates) | ✅ Done |
| | Structured Researcher→Coder handoff (research-output-format.md) | ✅ Done |
| | DDD layer boundary validator (validate-architecture-boundaries.md) | ✅ Done |
| v1.4.2 | Stability & CLI Polish | ✅ Done |
| | Bug: MODEL_TIERS expansion — resolveModel() never used tiers (silent crash) | ✅ Fixed |
| | Bug: Fake Anthropic — claude removed from tiers, honest error on unsupported provider | ✅ Fixed |
| | Bug: Phase 2 proactive auto-compression — checkAndCompressContext() after every turn | ✅ Fixed |
| | Bug: reflect-metadata CLI crash — removed dead @Injectable() from 3 services | ✅ Fixed |
| | CLI: Richer markdown renderer — code blocks with full ╭──╮ borders, header icons (★, ❖, ›), 2-space global indent | ✅ Done |
| | CLI: ⬆ Agent ──── header replaces plain Agent: label | ✅ Done |
| v1.5.0 | LangSmith Observability | ✅ Done |
| | langsmith SDK integration via auto-instrumentation | ✅ Done |
| | Zero-code tracing: all LLM calls, tool calls, LangGraph steps | ✅ Done |
| | Configure with 3 env vars: LANGCHAIN_TRACING_V2, LANGCHAIN_API_KEY, LANGCHAIN_PROJECT | ✅ Done |
| | No-op when env vars are absent (safe for library consumers) | ✅ Done |
| Future | SSE HTTP API (/agent/stream) | ⏳ Planned |
| | Agent self-evolution — write new skills from patterns it discovers | ⏳ Planned |
| | Anthropic Claude support (@langchain/anthropic) | ⏳ Planned |
| | LangGraph state-level mentor toggle (true per-session stateful mode) | ⏳ Planned |
License
This project is licensed under the MIT License - see the LICENSE file for details.
