@redline-ai/cli

v0.121.12

Published

a month ago

AI security, red teaming & eval platform

0High
0Medium
0Low

rahulsingh_ai

Redline — AI Security, Red Teaming & LLM Evaluation

What is Redline?

Redline is a CLI and library for evaluating, red-teaming, and securing LLM applications. Stop guessing whether your AI app is safe and reliable — start shipping with confidence.

Run automated evals to test prompts, models, and RAG pipelines
Red team your LLM app to find jailbreaks, prompt injection, and data leakage before attackers do
Compare models side-by-side across every major provider
Gate deployments with CI/CD integration — fail the build before bad prompts ship

Quick Start

Requires Node.js ^20.20.0 or >=22.22.0.

npm install -g @redline-ai/cli
redline init --example getting-started

Set your LLM provider API key:

export OPENAI_API_KEY=sk-...
# or ANTHROPIC_API_KEY, GEMINI_API_KEY, etc.

Run an eval and open the results viewer:

cd getting-started
redline eval
redline view

Features

Evaluation

Automated scoring — exact match, regex, JSON schema, cost/latency thresholds, LLM-as-judge
Multi-provider — OpenAI, Anthropic, Google Gemini, Azure, AWS Bedrock, Ollama, DeepSeek, and 50+ more
Side-by-side model comparison — see which model wins on your specific use case
RAG & agent testing — evaluate retrieval quality and multi-turn agent behavior

Security & Red Teaming

Automated red teaming — adaptive adversarial probes: jailbreaks, prompt injection, PII leakage, BOLA
Vulnerability reports — severity-ranked findings with remediation guidance
MCP security proxy — audit and control tool calls from AI agents
Code scanning — PR-level review for LLM security and compliance issues

Developer Experience

CI/CD integration — fail builds on regressions before they reach production
Declarative config — YAML-first, no code required for most use cases
Fast — concurrent execution, result caching, live reload
100% local by default — your prompts and data never leave your machine

Results & Collaboration

Web UI — interactive results viewer with diff views between prompt versions
Share results with your team
Export to JSON, CSV, SARIF

Why Redline?

| | Redline | Others | | ------------------ | -------------------------------------------- | ------------------------------------------- | | Vendor-neutral | Works with every model and provider | Often tied to one ecosystem | | Independent | Not owned by any LLM vendor | Redline acquired by OpenAI (Mar 2026) | | Security-first | Red teaming is a core feature, not an add-on | Most eval tools treat security as secondary | | 100% local | Data stays on your machine by default | Many require cloud upload | | Open source | MIT licensed, always | Some have open-core limitations | | CI/CD native | First-class pipeline integration | Often bolted on |

Eval in action

Results matrix — compare models and prompts at a glance:

Command-line output:

Red team vulnerability report:

Supported Providers

OpenAI · Anthropic · Google Gemini · Azure OpenAI · AWS Bedrock · Ollama · DeepSeek · Mistral · Cohere · HuggingFace · Groq · Together AI · Replicate · and 40+ more

Contributing

Contributions welcome. Open an issue or pull request.

Development setup:

git clone https://github.com/RahulSinghai606/redline-ai.git
cd redline-ai
npm install
npm run build
npm run dev

Run tests:

npm test

License

MIT — free to use, fork, and build on. See LICENSE.