@redline-ai/cli
v0.121.12
Published
AI security, red teaming & eval platform
Readme
Redline — AI Security, Red Teaming & LLM Evaluation
What is Redline?
Redline is a CLI and library for evaluating, red-teaming, and securing LLM applications. Stop guessing whether your AI app is safe and reliable — start shipping with confidence.
- Run automated evals to test prompts, models, and RAG pipelines
- Red team your LLM app to find jailbreaks, prompt injection, and data leakage before attackers do
- Compare models side-by-side across every major provider
- Gate deployments with CI/CD integration — fail the build before bad prompts ship
Quick Start
Requires Node.js ^20.20.0 or >=22.22.0.
npm install -g @redline-ai/cli
redline init --example getting-startedSet your LLM provider API key:
export OPENAI_API_KEY=sk-...
# or ANTHROPIC_API_KEY, GEMINI_API_KEY, etc.Run an eval and open the results viewer:
cd getting-started
redline eval
redline viewFeatures
Evaluation
- Automated scoring — exact match, regex, JSON schema, cost/latency thresholds, LLM-as-judge
- Multi-provider — OpenAI, Anthropic, Google Gemini, Azure, AWS Bedrock, Ollama, DeepSeek, and 50+ more
- Side-by-side model comparison — see which model wins on your specific use case
- RAG & agent testing — evaluate retrieval quality and multi-turn agent behavior
Security & Red Teaming
- Automated red teaming — adaptive adversarial probes: jailbreaks, prompt injection, PII leakage, BOLA
- Vulnerability reports — severity-ranked findings with remediation guidance
- MCP security proxy — audit and control tool calls from AI agents
- Code scanning — PR-level review for LLM security and compliance issues
Developer Experience
- CI/CD integration — fail builds on regressions before they reach production
- Declarative config — YAML-first, no code required for most use cases
- Fast — concurrent execution, result caching, live reload
- 100% local by default — your prompts and data never leave your machine
Results & Collaboration
- Web UI — interactive results viewer with diff views between prompt versions
- Share results with your team
- Export to JSON, CSV, SARIF
Why Redline?
| | Redline | Others | | ------------------ | -------------------------------------------- | ------------------------------------------- | | Vendor-neutral | Works with every model and provider | Often tied to one ecosystem | | Independent | Not owned by any LLM vendor | Redline acquired by OpenAI (Mar 2026) | | Security-first | Red teaming is a core feature, not an add-on | Most eval tools treat security as secondary | | 100% local | Data stays on your machine by default | Many require cloud upload | | Open source | MIT licensed, always | Some have open-core limitations | | CI/CD native | First-class pipeline integration | Often bolted on |
Eval in action
Results matrix — compare models and prompts at a glance:
Command-line output:
Red team vulnerability report:
Supported Providers
OpenAI · Anthropic · Google Gemini · Azure OpenAI · AWS Bedrock · Ollama · DeepSeek · Mistral · Cohere · HuggingFace · Groq · Together AI · Replicate · and 40+ more
Contributing
Contributions welcome. Open an issue or pull request.
Development setup:
git clone https://github.com/RahulSinghai606/redline-ai.git
cd redline-ai
npm install
npm run build
npm run devRun tests:
npm testLicense
MIT — free to use, fork, and build on. See LICENSE.
