@kb-labs/mind-orchestrator
v2.15.0
Published
Agent-optimized RAG query orchestration for KB Labs Mind
Readme
@kb-labs/mind-orchestrator
Agent query orchestration for KB Labs Mind system.
The Mind Orchestrator coordinates complex multi-step queries with different execution strategies (instant, auto, thinking), providing intelligent decomposition, gathering, verification, and synthesis of RAG results.
Features
- 🎯 Agent Query Modes - Instant, auto, and thinking modes for different query complexities
- 🧩 Query Decomposition - LLM-powered breakdown of complex queries into sub-queries
- 📦 Chunk Gathering - Intelligent gathering and filtering of relevant chunks
- ✅ Completeness Checking - Validates if results answer the query fully
- 🔄 Synthesis - LLM-powered response generation from gathered chunks
- 🗜️ Compression - Response optimization for token efficiency
- 🔍 Source Verification - Anti-hallucination checks on sources
- 💾 Query Caching - Cache results for repeated queries
- 📊 Analytics - Track query performance and patterns
Architecture
mind-orchestrator/
├── src/
│ ├── orchestrator.ts # Main AgentQueryOrchestrator
│ ├── types.ts # Orchestrator types
│ │
│ ├── modes/ # Query mode strategies
│ │ ├── instant-mode.ts # Fast, no decomposition
│ │ ├── auto-mode.ts # Complexity detection
│ │ └── thinking-mode.ts # Deep analysis
│ │
│ ├── decomposer/ # Query decomposition
│ │ └── query-decomposer.ts # LLM-powered decomposition
│ │
│ ├── gatherer/ # Chunk gathering
│ │ └── chunk-gatherer.ts # Gather & filter chunks
│ │
│ ├── checker/ # Completeness validation
│ │ └── completeness-checker.ts
│ │
│ ├── synthesizer/ # Response synthesis
│ │ └── response-synthesizer.ts
│ │
│ ├── compressor/ # Response compression
│ │ └── response-compressor.ts
│ │
│ ├── verification/ # Verification layer
│ │ ├── source-verifier.ts # Source verification
│ │ └── field-checker.ts # Field completeness
│ │
│ ├── cache/ # Query caching
│ │ └── query-cache.ts
│ │
│ └── analytics/ # Analytics tracking
│ ├── mind-analytics.ts
│ └── types.tsUsage
Creating Orchestrator
import { AgentQueryOrchestrator } from '@kb-labs/mind-orchestrator';
import { usePlatform } from '@kb-labs/sdk';
const platform = usePlatform();
const orchestrator = new AgentQueryOrchestrator({
llm: platform?.llm,
analyticsAdapter: platform?.analytics,
});Query with Agent Modes
// Instant mode - Fast, no decomposition (~30-40s, 1-2 LLM calls)
const instantResult = await orchestrator.query({
text: 'What is VectorStore interface?',
mode: 'instant',
scope: 'default',
});
// Auto mode - Balanced, automatic complexity detection (~60s, 3-4 LLM calls)
const autoResult = await orchestrator.query({
text: 'How does hybrid search work?',
mode: 'auto',
scope: 'default',
});
// Thinking mode - Deep analysis, multi-step reasoning (~60-90s, 4-5 LLM calls)
const thinkingResult = await orchestrator.query({
text: 'Explain the anti-hallucination architecture end-to-end',
mode: 'thinking',
scope: 'default',
});Understanding Agent Response
import type { AgentResponse } from '@kb-labs/sdk';
const response: AgentResponse = await orchestrator.query({
text: 'How does authentication work?',
mode: 'auto',
});
console.log('Answer:', response.answer);
console.log('Confidence:', response.confidence);
console.log('Sources:', response.sources.length);
// Check warnings (low confidence, missing chunks, etc.)
if (response.warnings && response.warnings.length > 0) {
response.warnings.forEach(warning => {
console.warn(`[${warning.code}] ${warning.message}`);
});
}
// Debug information
if (response.debug) {
console.log('LLM calls:', response.debug.llmCallCount);
console.log('Tokens:', response.debug.totalTokens);
console.log('Duration:', response.debug.durationMs, 'ms');
}Agent Query Modes
Mode Selection Guide
| Mode | Use Case | Performance | LLM Calls | Tokens |
Breaking changes (no legacy compatibility)
MindChunk/MindIntentare canonical public types for orchestrator boundaries.- Legacy
Knowledge*-named public contracts are removed frommind-*package surfaces. - Update integrations to consume
Mindterminology andprofiles[].products.mindconfig. |------|----------|-------------|-----------|--------| | instant | Simple lookups, known entities | ~30-40s | 1-2 | 500-1K | | auto | General queries, let system decide | ~60s | 3-4 | 3-4K | | thinking | Complex architecture, deep analysis | ~60-90s | 4-5 | 4-5K |
instant Mode
Best for:
- "What is [ClassName]?"
- "Where is [feature] located?"
- Quick reference checks
How it works:
- Search engine directly (no decomposition)
- Single synthesis pass
- Basic verification
Example:
const result = await orchestrator.query({
text: 'What is the MindEngine class?',
mode: 'instant',
});auto Mode (Recommended)
Best for:
- Medium complexity questions
- Letting the system decide complexity
- Balanced performance/quality
How it works:
- Query complexity detection
- Adaptive decomposition (if needed)
- Multi-chunk gathering
- Completeness checking
- Synthesis with verification
Example:
const result = await orchestrator.query({
text: 'How does Mind handle embeddings?',
mode: 'auto', // System auto-selects strategy
});thinking Mode
Best for:
- Complex architectural questions
- Multi-step reasoning
- Deep analysis: "Explain how [system] works end-to-end"
- Comparing multiple implementations
How it works:
- Deep query decomposition (3-5 sub-queries)
- Exhaustive chunk gathering
- Multi-pass completeness checking
- Iterative synthesis
- Full verification pipeline
Example:
const result = await orchestrator.query({
text: 'Explain the complete RAG pipeline from indexing to query response',
mode: 'thinking',
});Key Concepts
Query Decomposition
For complex queries, the orchestrator uses LLM to break them into sub-queries:
Original query:
"Explain how Mind handles authentication and authorization"Decomposed into:
1. "What is the authentication mechanism in Mind?"
2. "How does Mind handle authorization?"
3. "What is the relationship between auth and authz?"Each sub-query is executed, results gathered, and synthesized into final answer.
Chunk Gathering
The gatherer collects chunks from search results with:
- Relevance filtering - Remove low-confidence chunks (< 0.5)
- Deduplication - Merge overlapping chunks
- Context expansion - Include surrounding code for better understanding
- Token budget - Respect LLM context limits (4K-8K tokens)
Completeness Checking
Before synthesis, the checker validates:
- ✅ Query fully answered?
- ✅ All key concepts covered?
- ✅ Missing critical information?
- ✅ Need additional chunks?
If incomplete, orchestrator gathers more chunks or marks with warning.
Response Synthesis
LLM-powered synthesis creates final answer:
- Context building - Compile relevant chunks
- Instruction prompting - Guide LLM to answer query
- Source attribution - Link answer to source files
- Markdown formatting - Clean, readable output
Verification Pipeline
Anti-hallucination checks:
- Source verification - Ensure all sources exist
- Field completeness - Validate metadata
- Confidence scoring - Calculate reliability
- Warning generation - Alert on low confidence
Reference: ADR-0031: Anti-Hallucination System
Query Caching
Cache query results for performance:
const orchestrator = new AgentQueryOrchestrator({
engine,
llm,
cacheOptions: {
enabled: true,
ttl: 3600, // 1 hour
},
});Cache key: hash(query.text + query.mode + query.scope)
Configuration
Orchestrator Options
interface OrchestratorOptions {
engine: KnowledgeEngine;
llm: ILLM;
analyticsAdapter?: IAnalytics;
cacheOptions?: {
enabled: boolean;
ttl: number; // seconds
};
tokenBudget?: {
maxContextTokens: number; // Default: 8000
maxResponseTokens: number; // Default: 2000
};
verification?: {
enabled: boolean; // Default: true
strictMode: boolean; // Default: false
};
}Environment Variables
# LLM provider
export OPENAI_API_KEY=sk-...
# Analytics (optional)
export KB_ANALYTICS_ENABLED=true
# Cache (optional)
export KB_QUERY_CACHE_TTL=3600
# Log level
export KB_LOG_LEVEL=debugPerformance
Mode Performance Comparison
| Mode | Avg Duration | LLM Calls | Tokens | Cost (GPT-4) | |------|--------------|-----------|--------|--------------| | instant | 30-40s | 1-2 | 500-1K | ~$0.01 | | auto | 60s | 3-4 | 3-4K | ~$0.03 | | thinking | 60-90s | 4-5 | 4-5K | ~$0.04 |
Optimization Tips
- Use instant for lookups - "What is X?" queries don't need decomposition
- Enable caching - Repeated queries return instantly
- Tune token budget - Reduce maxContextTokens if hitting limits
- Parallelize sub-queries - Orchestrator already does this automatically
Dependencies
{
"dependencies": {
"@kb-labs/sdk": "^1.0.0"
}
}Note: Mind Orchestrator uses SDK-only imports - no internal packages.
Testing
# Run unit tests
pnpm test
# Run with coverage
pnpm test:coverage
# Integration tests
pnpm test:integrationDevelopment
Build
pnpm buildWatch Mode
pnpm devType Check
pnpm typecheckArchitecture Decisions
Key ADRs affecting Mind Orchestrator:
Related Packages
- @kb-labs/mind-engine - Core RAG engine (indexing, search, reasoning)
- @kb-labs/mind-cli - CLI commands with orchestrator integration
Examples
Complete Example with All Features
import {
AgentQueryOrchestrator,
} from '@kb-labs/mind-orchestrator';
import { usePlatform } from '@kb-labs/sdk';
// Setup
const platform = usePlatform();
const orchestrator = new AgentQueryOrchestrator({
llm: platform?.llm,
analyticsAdapter: platform?.analytics,
cacheOptions: { enabled: true, ttl: 3600 },
tokenBudget: { maxContextTokens: 8000, maxResponseTokens: 2000 },
verification: { enabled: true, strictMode: false },
});
// Query
const response = await orchestrator.query({
text: 'How does Mind implement hybrid search?',
mode: 'auto',
scope: 'default',
});
// Handle response
if (response.confidence >= 0.7) {
console.log('✅ High confidence answer');
console.log(response.answer);
} else {
console.warn('⚠️ Low confidence, review sources manually');
}
// Show sources
response.sources.forEach(source => {
console.log(`📄 ${source.path}:${source.range?.start.line}`);
});Contributing
Code Quality Standards
- Single responsibility - Each module focused on one job
- Strategy pattern - Mode selection via strategy objects
- Pipeline pattern - Sequential orchestration steps
- Type safety - No
anytypes - Test coverage - Integration tests for all modes
Before Committing
pnpm build
pnpm testLicense
Private - KB Labs internal use only.
Support
For questions, check:
- ADR-0029: Agent Query Orchestration
- Mind Engine README
- CLAUDE.md - Development guide
Last Updated: 2025-12-09 Version: 0.1.0 Status: 🟡 SDK Migration Pending (Phase 2)
