@intentsolutionsio/ai-ml-engineering-pack
v1.0.0
Published
Professional AI/ML Engineering toolkit: Prompt engineering, LLM integration, RAG systems, AI safety with 12 expert plugins
Maintainers
Readme
AI/ML Engineering Pack
Professional toolkit for building production-ready AI/ML systems with Claude Code
Master prompt engineering, LLM integration, RAG systems, and AI safety with 12 specialized plugins that accelerate AI development by 10x.
What's Included
12 specialized plugins across 4 AI/ML categories:
1. Prompt Engineering (3 plugins)
- prompt-architect (agent) - Expert in CoT reasoning, few-shot learning, and advanced prompt patterns
- prompt-optimizer (agent) - Reduce LLM costs by 60-90% while maintaining quality
- prompt-template-gen (command:
/ptg) - Generate production-ready prompt templates with type safety
2. LLM Integration (3 plugins)
- llm-integration-expert (agent) - Production API patterns, error handling, streaming, rate limiting
- model-selector (agent) - Choose optimal models based on cost, quality, latency requirements
- llm-api-scaffold (command:
/las) - Generate complete LLM API with FastAPI, Docker, monitoring
3. RAG Systems (3 plugins)
- rag-architect (agent) - Design RAG systems, chunking strategies, retrieval optimization
- vector-db-expert (agent) - Select and configure vector databases (Pinecone, Qdrant, Weaviate, etc.)
- rag-pipeline-gen (command:
/rpg) - Generate complete RAG pipeline with embeddings and retrieval
4. AI Safety (3 plugins)
- ai-safety-expert (agent) - Content filtering, PII detection, bias mitigation, compliance
- prompt-injection-defender (agent) - Defend against prompt injection and jailbreak attacks
- ai-monitoring-setup (command:
/ams) - Set up LLM monitoring, cost tracking, and alerts
Quick Start
Installation
# Add the marketplace (if not already added)
claude plugin marketplace add jeremylongshore/claude-code-plugins
# Install AI/ML Engineering Pack
claude plugin install ai-ml-engineering-pack@claude-code-plugins-plus
# Verify installation
claude plugin listFull installation guide: INSTALLATION.md
10-Minute Tutorial
Build your first AI feature in 10 minutes:
# Start Claude Code
claude
# Inside Claude, optimize a prompt
"Optimize this prompt for cost and quality:
'I would like you to create a detailed product description for...'"
# Claude uses prompt-optimizer agent to reduce tokens by 70%
# Generate a reusable prompt template
/ptg
# Build a production LLM API
/las
# Create a complete RAG system
/rpg
# Add AI safety guardrails
"Implement PII detection and toxicity filtering for my chatbot"Complete tutorial: QUICK_START.md
ROI & Value Proposition
Real-world results from production deployments:
| Use Case | Time Saved | Cost Savings | ROI | |----------|-----------|--------------|-----| | E-Commerce Recommendations | 12.5 hours | $249,250/year | 11,891% | | Legal Document Analysis | 12 hours | $781,500/year | 34,192% | | Customer Support Automation | 16 hours | $350,400/year | 11,283% | | Content Moderation | 19 hours | $1,872,000/year | 40,781% | | Code Documentation | 145 hours | $14,100 (one-time) | 2,565% | | Medical Diagnosis Assistant | 28 hours | $44,600,000/year | 75,392% |
Average ROI: 29,351% | Average payback period: 3 days
Detailed case studies: USE_CASES.md
Plugin Reference
Prompt Engineering
prompt-architect (Agent)
Expert in advanced prompt engineering techniques and patterns.
Capabilities:
- Chain-of-Thought (CoT) reasoning
- Few-shot and zero-shot learning
- Prompt composition patterns
- Meta-prompting and self-improvement
- Multi-modal prompts (text + images)
When to use:
- "Design a prompt for [complex task]"
- "Improve this prompt: [existing prompt]"
- "What's the best prompting technique for [use case]?"
Activation triggers: Prompt design, CoT, few-shot learning, prompt patterns
prompt-optimizer (Agent)
Optimize prompts for cost reduction (60-90% savings) while maintaining quality.
Capabilities:
- Token reduction techniques (remove verbosity, use abbreviations)
- Prompt caching strategies
- Model selection guidance (cheap vs expensive)
- Cost-quality trade-off analysis
- ROI calculation
When to use:
- "Reduce the cost of this prompt: [prompt]"
- "Optimize my prompts for $1000/month budget"
- "How can I reduce token usage by 70%?"
Example:
Before (52 tokens): "I would like you to please analyze..."
After (15 tokens): "Analyze and summarize main points."
Savings: 71% token reduction = $0.15/1000 calls (GPT-4)Activation triggers: Cost optimization, token reduction, prompt efficiency
/ptg - Prompt Template Generator (Command)
Generate production-ready prompt templates with type safety and validation.
Usage:
/ptg
# Claude asks:
# - Use case (e.g., product descriptions, customer support, code review)
# - Variables (e.g., product_name, features, tone)
# - Output format (Python, TypeScript)
# - Validation requirementsGenerated output:
- Python: Pydantic models with type safety
- TypeScript: Zod schemas with validation
- Usage examples
- Cost estimation
- Unit tests
Example output:
@dataclass
class ProductDescriptionInput:
product_name: str
features: List[str]
target_audience: str
tone: Literal["professional", "casual"] = "professional"
class ProductDescriptionGenerator:
TEMPLATE = """..."""
def generate(self, input: ProductDescriptionInput) -> str:
# Validates input, generates prompt, calls LLM
...LLM Integration
llm-integration-expert (Agent)
Production patterns for LLM API integration with error handling and reliability.
Capabilities:
- Multi-provider integration (OpenAI, Anthropic, Google, Cohere)
- Exponential backoff retry logic
- Rate limiting (token bucket, sliding window)
- Response streaming (Server-Sent Events)
- Fallback systems (multi-provider)
- Circuit breaker patterns
- Token counting and cost tracking
When to use:
- "Implement LLM API integration with retry logic"
- "Add streaming support to my chatbot"
- "Build multi-provider fallback system"
Code examples:
# Retry with exponential backoff
@retry_with_backoff(max_retries=3, base_delay=1.0)
async def complete(prompt: str):
return await llm.complete(prompt)
# Token bucket rate limiting
rate_limiter = TokenBucketRateLimiter(capacity=100, refill_rate=10)
await rate_limiter.wait_for_token()Activation triggers: LLM API, error handling, streaming, rate limiting, fallback
model-selector (Agent)
Guide model selection based on cost, quality, latency, and use case requirements.
Capabilities:
- Model comparison matrix (GPT-4, Claude 3, Gemini)
- Pricing analysis (per 1M tokens)
- Latency benchmarks
- Quality assessments by task type
- Model cascading strategies
- A/B testing frameworks
When to use:
- "Which model should I use for customer support?"
- "Compare GPT-4 vs Claude 3 Opus for code generation"
- "How can I reduce costs with model cascading?"
Model comparison: | Model | Input ($/1M) | Output ($/1M) | Latency | Best For | |-------|-------------|---------------|---------|----------| | GPT-4 Turbo | $10 | $30 | 3-5s | Complex reasoning | | GPT-3.5 Turbo | $0.50 | $1.50 | 1-2s | Simple tasks | | Claude 3 Opus | $15 | $75 | 4-6s | Highest quality | | Claude 3 Haiku | $0.25 | $1.25 | 0.5-1s | Speed & cost |
Activation triggers: Model selection, cost optimization, performance comparison
/las - LLM API Scaffold (Command)
Generate complete production-ready LLM API integration code.
Usage:
/las
# Claude asks:
# - Provider (OpenAI, Anthropic, Google)
# - Features (streaming, rate limiting, caching, error handling)
# - Framework (FastAPI, Express.js)
# - Deployment (Docker, Kubernetes)Generated files:
llm-api/
├── main.py # FastAPI application
├── llm_client.py # LLM client with retry logic
├── rate_limiter.py # Token bucket rate limiting
├── cache.py # Redis caching
├── monitoring.py # Prometheus metrics
├── Dockerfile # Production container
├── docker-compose.yml # Redis + app
├── requirements.txt # Dependencies
└── tests/ # Unit and integration testsFeatures included:
- Exponential backoff retry (3 attempts)
- Rate limiting (token bucket algorithm)
- Response caching (Redis, 5 min TTL)
- Streaming support (SSE)
- Cost tracking
- Prometheus metrics
- Docker deployment
RAG Systems
rag-architect (Agent)
Expert in designing and optimizing Retrieval-Augmented Generation systems.
Capabilities:
- RAG architecture patterns
- Chunking strategies (fixed, recursive, semantic)
- Embedding model selection
- Retrieval optimization (hybrid search, reranking)
- Query expansion techniques
- Evaluation metrics (MRR, NDCG)
When to use:
- "Design a RAG system for customer support knowledge base"
- "What chunking strategy should I use for legal documents?"
- "How can I improve retrieval accuracy?"
Chunking strategies:
# Fixed-size (simple, fast)
chunks = [text[i:i+512] for i in range(0, len(text), 512)]
# Recursive (respects structure)
splitter = RecursiveCharacterTextSplitter(
chunk_size=512,
chunk_overlap=50,
separators=["\n\n", "\n", ". ", " ", ""]
)
# Semantic (context-aware)
chunks = semantic_splitter.split_by_meaning(text)Activation triggers: RAG architecture, chunking, retrieval, embeddings
vector-db-expert (Agent)
Select and optimize vector databases for RAG systems.
Capabilities:
- Database comparison (Pinecone, Qdrant, Weaviate, ChromaDB, pgvector, Milvus)
- HNSW index tuning
- Scaling strategies (sharding, replication)
- Query optimization
- Migration planning
When to use:
- "Which vector database should I use for 10M documents?"
- "How do I tune HNSW parameters for better performance?"
- "Compare Pinecone vs Qdrant for my use case"
Database comparison: | Database | Best For | Pricing | Hosting | |----------|---------|---------|---------| | Pinecone | Managed, auto-scaling | $0.096/GB/month | Cloud only | | Qdrant | Performance, self-hosted | Open source | Self/cloud | | Weaviate | GraphQL, hybrid search | Open source | Self/cloud | | ChromaDB | Local development | Open source | Local only | | pgvector | Existing PostgreSQL | Open source | Self-hosted |
Activation triggers: Vector database, HNSW, scaling, performance
/rpg - RAG Pipeline Generator (Command)
Generate complete RAG pipeline with all components.
Usage:
/rpg
# Claude asks:
# - Document types (PDFs, docs, web pages)
# - Vector database (Pinecone, Qdrant, Weaviate)
# - Embedding model (OpenAI, open-source)
# - LLM (GPT-4, Claude, Gemini)
# - Features (reranking, hybrid search, caching)Generated files:
rag-system/
├── document_loader.py # PDF/DOCX/TXT loaders
├── chunker.py # Recursive text splitter
├── embedder.py # OpenAI embeddings
├── vector_store.py # Qdrant integration
├── retriever.py # Hybrid search + reranking
├── generator.py # LLM response generation
├── pipeline.py # End-to-end orchestration
├── api.py # FastAPI endpoints
├── docker-compose.yml # Vector DB + app
└── example_usage.py # Complete examplesFeatures included:
- Multi-format document loading (PDF, DOCX, TXT, MD)
- Recursive chunking (512 tokens, 50 overlap)
- Vector similarity search
- Cohere reranking (optional)
- Source attribution with page numbers
- Query expansion
- Caching
- FastAPI REST endpoints
- Docker deployment
AI Safety
ai-safety-expert (Agent)
Comprehensive AI safety with content filtering, PII protection, and bias mitigation.
Capabilities:
- Toxicity detection (BERT-based classification)
- PII detection and redaction (Presidio)
- Bias detection (gender, racial, age)
- Content moderation (OpenAI Moderation API)
- Safety guardrails (input/output filtering)
- GDPR/CCPA/HIPAA compliance
When to use:
- "Implement PII detection for user inputs"
- "Add toxicity filtering to my chatbot"
- "Detect and mitigate bias in LLM outputs"
- "Ensure HIPAA compliance for medical data"
Safety pipeline:
class SafetyGuardrails:
async def safe_completion(self, user_input: str, llm):
# 1. Input checks
if not await self.check_input(user_input):
return {"error": "Input blocked"}
# 2. Redact PII
safe_input = self.pii_detector.redact(user_input)
# 3. Generate response
response = await llm.complete(safe_input)
# 4. Output checks
safe_response = await self.check_output(response)
return safe_responsePII detection:
- Email addresses, phone numbers, SSN
- Credit card numbers
- IP addresses
- Names, addresses
- Medical record numbers (for HIPAA)
Activation triggers: AI safety, PII, toxicity, bias, content moderation
prompt-injection-defender (Agent)
Defend against prompt injection attacks and jailbreaks.
Capabilities:
- Pattern-based detection (regex for common attacks)
- ML classification (fine-tuned BERT model)
- Input sanitization
- Output validation
- System prompt protection
- Jailbreak detection (DAN, Developer Mode, etc.)
When to use:
- "Protect my chatbot from prompt injection"
- "Detect jailbreak attempts"
- "Validate user inputs for manipulation"
Attack patterns detected:
ATTACK_PATTERNS = [
r'ignore\s+(all\s+)?(previous|prior|above)\s+instructions',
r'(repeat|print|show)\s+(your\s+)?(system\s+)?prompt',
r'(pretend|act)\s+(you\'?re|to\s+be)',
r'(DAN|Developer\s+Mode|Jailbreak)',
r'(new\s+role|you\s+are\s+now)',
]Defense strategies:
- Detection: Identify attack patterns
- Sanitization: Remove/escape dangerous inputs
- Validation: Verify outputs don't leak system prompts
- Monitoring: Log and alert on suspicious activity
Activation triggers: Prompt injection, jailbreak, security, input validation
/ams - AI Monitoring Setup (Command)
Set up comprehensive LLM monitoring with cost tracking and alerting.
Usage:
/ams
# Claude asks:
# - Metrics (latency, cost, tokens, errors)
# - Dashboards (Grafana, custom)
# - Alerts (Slack, PagerDuty, email)
# - Budget ($1000/month)Generated files:
monitoring/
├── metrics.py # Prometheus metrics
├── cost_tracker.py # Cost tracking with budget alerts
├── grafana_dashboard.json # Pre-built dashboard
├── alerting_rules.yml # Alert rules
├── prometheus.yml # Prometheus config
├── docker-compose.yml # Prometheus + Grafana
└── README.md # Setup instructionsMetrics collected:
- Request count (by model, status)
- Latency (p50, p95, p99)
- Token usage (input, output)
- Cost per request
- Error rate
- Cache hit rate
Alerts configured:
- Budget threshold (80%, 90%, 100%)
- High error rate (>5%)
- Slow responses (>10s)
- Token limit approaching
Dashboards:
- Real-time request monitoring
- Cost tracking (daily, weekly, monthly)
- Model performance comparison
- Error analysis
Documentation
- Installation Guide - Prerequisites, setup, verification
- Quick Start - 10-minute tutorial with examples
- Use Cases - Real-world applications with ROI
- Troubleshooting - Common issues and solutions
Example Workflows
Build a Customer Support Bot (10 minutes)
claude
# 1. Generate RAG pipeline for knowledge base
/rpg
Requirements: Support docs, Qdrant, GPT-4
# 2. Add safety guardrails
"Implement PII detection and toxicity filtering"
# 3. Set up monitoring
/ams
Requirements: Prometheus, Slack alerts, $5K budget
# 4. Deploy
"Create Docker deployment with all components"Result: Production-ready support bot with 65% ticket automation, 30s response time, comprehensive safety.
Optimize Prompts to Reduce Costs (5 minutes)
claude
# 1. Analyze current prompts
"Analyze my prompts for cost optimization opportunities"
# 2. Optimize individual prompts
"Reduce this prompt to 50% of tokens:
'I would like you to carefully analyze the following customer feedback...'"
# 3. Generate reusable templates
/ptg
Use case: Customer feedback analysis
# 4. Calculate savings
"Calculate ROI if I process 10,000 requests/month"Result: 60-90% cost reduction while maintaining quality.
Build RAG System for Legal Documents (15 minutes)
claude
# 1. Design RAG architecture
"Design RAG system for legal document search with:
- 10,000 contracts
- Clause extraction
- Precedent search
- GDPR compliance"
# 2. Generate complete pipeline
/rpg
Requirements: Legal docs (PDF), Qdrant (self-hosted), GPT-4
# 3. Add PII protection
"Implement PII detection for attorney-client privilege"
# 4. Set up monitoring
/ams
Track: accuracy, retrieval time, cost per queryResult: Legal document analysis system with 94% accuracy, 82ms latency, PII protection.
Learning Resources
Video Tutorials (Coming Soon)
- Prompt Engineering Masterclass (30 min)
- Building Production RAG Systems (45 min)
- AI Safety Best Practices (20 min)
Blog Posts
Community
- Discord - #claude-code channel
- GitHub Discussions
- Stack Overflow -
claude-codetag
Pricing
One-time purchase: $79
What's included:
- All 12 plugins (lifetime access)
- Free updates and new plugins
- Email support
- Community Discord access
- Documentation and examples
Compare to alternatives:
- Manual implementation: 40+ hours ($4,000 at $100/hour)
- Consultants: $150-300/hour × 40 hours = $6,000-12,000
- AI/ML Engineering Pack: $79 (99% cost savings)
Average payback period: 3 days
Buy Now on Gumroad | [Volume Licensing](mailto:[email protected])
🆘 Support
Email: [email protected]
GitHub Issues: https://github.com/jeremylongshore/claude-code-plugins/issues
Response time: Within 24 hours (usually faster)
Community: Join Discord for community support
Updates
Current version: 1.0.0
Update policy: Free updates for life, including new plugins and features
Changelog:
- v1.0.0 (2025-10-10) - Initial release with 12 plugins
To update:
claude plugin update ai-ml-engineering-packLicense
MIT License - See LICENSE for details
Commercial use permitted - Use in commercial projects, redistribute, modify
Acknowledgments
Built with:
- Claude Code - AI-powered development CLI
- LangChain - LLM framework
- Presidio - PII detection
- Qdrant - Vector database
- FastAPI - Modern Python framework
Ready to Get Started?
- Install the pack - 5-minute setup
- Complete Quick Start - Build your first AI feature in 10 minutes
- Explore use cases - See real-world ROI examples
- Join the community - Connect with other AI/ML engineers
Questions? Email [email protected] or open a GitHub issue.
Built by AI engineers, for AI engineers.
