@microsoft/m365-copilot-eval
v1.0.1-preview.1
Published
Zero-config Node.js wrapper for M365 Copilot Agent Evaluations CLI (Python-based Azure AI Evaluation SDK)
Readme
M365 Copilot Agent Evaluations
🔒 PRIVATE PREVIEW: This tool is currently in private preview. And the instructions below are for Private Preview.
A zero-configuration CLI for evaluating M365 Copilot agents. Send prompts to your agent, get responses, and automatically score them with Azure AI Evaluation metrics (relevance, coherence, groundedness).
- Send a batch (or interactive set) of prompts to a configured chat API endpoint.
- Collect agent responses and evaluate them locally using Azure AI Evaluation SDK.
- Metrics produced per prompt:
- Relevance (1–5)
- Coherence (1–5)
- Groundedness (1–5)
- Multiple input modes: command‑line list, JSON file, interactive.
- Multiple output formats: console (colorized), JSON, CSV, HTML (auto‑opens report).
📋 Prerequisites
- M365 Copilot Agent deployed to your tenant (can be created with M365 Agents Toolkit or any other method)
- Node.js 24.12.0+ (check:
node --version) - Environment file with your credentials and agent ID (see Environment Setup below)
- Your Tenant ID, Azure OpenAI endpoint, and API key (see Getting Variables below)
Note: Authentication is currently supported on Windows only. Support for other operating systems is coming soon.
🔧 Environment Setup
Install the Tool
- Go to the releases → https://github.com/microsoft/M365-Copilot-Agent-Evals/releases and click on the most latest release.
- Click on the
Source code (tar.gz) - This should download the package to your device.
- Go to the folder where this tar.gz file is. This will now be your project root folder.
- Run
npm install -g <filename.tar.gz>e.g.,npm install -g M365-Copilot-Agent-Evals-<version>.tar.gz
Setup Steps
Now, set up where you'll store your environment variables:
Are you using M365 Agents Toolkit (ATK)?
- ✅ Yes → You already have
.env.localin your project withM365_TITLE_ID. You'll add Azure OpenAI variables to this file. - ✅ No → Create a new
env/.env.devfile in your project directory. You'll add all variables there.
The CLI loads environment variables from multiple sources (in order of precedence):
.env.localin current directory (auto-detected, ideal for ATK projects)env/.env.{environment}via--envflag (e.g.,--env devloadsenv/.env.dev)- System environment variables
Option 1: For M365 Agents Toolkit (ATK) Projects
If you're working in an ATK project, you already have .env.local with M365_TITLE_ID. Just add your Azure credentials and tenant ID:
# .env.local (existing ATK project file)
# Already present from ATK:
M365_TITLE_ID="T_your-title-id-here" # Auto-generated by ATK
# You'll add these (see Getting Variables section below):
AZURE_AI_OPENAI_ENDPOINT="<your-azure-openai-endpoint>"
AZURE_AI_API_KEY="<your-api-key-from-azure-portal>"
TENANT_ID="<your-tenant-id>"Option 2: For Non-ATK Projects
Create env/.env.dev in your project directory:
# env/.env.dev (new file you create)
# Your agent ID (Optional):
M365_AGENT_ID="your-agent-id" # e.g., U_0dc4a8a2-b95f-edac-91c8-d802023ec2d4
# You'll add these (see Getting Variables section below):
AZURE_AI_OPENAI_ENDPOINT="<your-azure-openai-endpoint>"
AZURE_AI_API_KEY="<your-api-key-from-azure-portal>"
TENANT_ID="<your-tenant-id>"Optional Overrides
AZURE_AI_API_VERSION="2024-12-01-preview" # default
AZURE_AI_MODEL_NAME="gpt-4o-mini" # defaultYou can also override the agent ID at runtime: runevals --agent-id "custom-id"
🔑 Getting Variables
Now that you know what's needed, here's how to get the required values:
1. Tenant ID
Your Azure Active Directory (AAD) tenant ID.
How to obtain:
- Go to Azure Portal
- Search for "Azure Active Directory" or "Microsoft Entra ID"
- In the Overview section, you'll see Tenant ID
- Copy this value - this is your
TENANT_ID
Alternatively, if you have the Azure CLI installed:
az account show --query tenantId2. Agent ID (Only for MSIT)
- If you have created your agent using Agents Toolkit, then the agent-id is the M365_TITLE_ID in .env.local file
- If you did not, then you can get your agent-id by
1. Open aka.ms/devui in your browser
2. Click on `Configuration`
3. In the dialog that opens, click on `Untitled Config`
4. Click on the `Payload` tab
5. If you scroll down on this tab, you will see `DA (Declarative Agent)`
6. In this dropdown, you will see all the agents that are installed for you.
7. Select the agent that you want to evaluate.
8. Copy the `gpts.id` value before .declarativeAgent.
9. This is your `agent-id`. It would look like `U_0dc4a8a2-b95f-edac-91c8-d802023ec2d4`3. Azure OpenAI Endpoint and API Key
You need both the endpoint URL and API key from your Azure OpenAI resource for "LLM as a Judge" evaluations.
How to obtain:
- Go to Azure Portal
- Navigate to your Azure OpenAI service
- Path: Portal → All Services → Search "OpenAI" → Select your resource
- Or create new: Portal → Create a resource → Search "OpenAI"
- In the Overview section, copy the Endpoint value
- Format:
https://YOUR-RESOURCE-NAME.openai.azure.com/ - This is your
AZURE_AI_OPENAI_ENDPOINT
- Format:
- In the left sidebar, click Keys and Endpoint
- Copy KEY 1 or KEY 2
- This is your
AZURE_AI_API_KEY
- This is your
- Add both values to your
.env.devfile as shown in the Setup Steps above
Required model: Ensure you have gpt-4o-mini (or similar) deployed in your Azure OpenAI resource.
Security tip: Store keys and endpoints securely and never commit to source control.
🚀 Quick Start
Now that you have your environment variables set up, you're ready to run evaluations!
Important: Run this tool FROM your M365 agent project directory (where your agent code lives), not from this repository. You don't need to clone or download this repo.
# Navigate to YOUR agent project directory
cd /path/to/your-agent-project
# Run evaluations (auto-discovers .env.local for ATK projects)
runevals
# Or specify an environment file
runevals --env devNo prompts file? If you don't have a prompts file yet, the tool will offer to create a starter file with example prompts for you.
Environment file lookup:
- Checks
.env.localfirst (ATK projects) - Then checks
env/.env.{name}if--env {name}is specified - Prompts file auto-discovery works the same for all projects
📝 Creating Prompts Files
The CLI auto-discovers prompts files in your project:
Auto-Discovery
When you run runevals, it searches:
- Current directory:
prompts.json,evals.json,tests.json ./evals/subdirectory:prompts.json,evals.json,tests.json
Example project structure:
my-agent/
├── .env.local # Your credentials
├── evals/
│ └── evals.json # Your test prompts (auto-discovered!)
└── .evals/
└── 2025-12-03_14-30-45.html # Generated reportsStarter File Creation
If no file is found:
⚠️ No prompts file found in current directory or ./evals/
Create a starter evals file with sample prompts? (Y/n):Answering "Y" creates ./evals/evals.json with 2 starter prompts:
[
{
"prompt": "What is Microsoft 365?",
"expected_response": "Microsoft 365 is a cloud-based productivity suite..."
},
{
"prompt": "How can I share a file in Teams?",
"expected_response": "You can share a file in Teams by uploading it..."
}
]Edit this file with your own prompts and run again!
Manual Creation
Create ./evals/prompts.json:
[
{
"prompt": "Your test prompt here",
"expected_response": "Expected agent response"
}
]🎯 Usage Examples
Remember: All commands below assume you're running them FROM your agent project directory, not from this repository.
What to Expect
When you run an evaluation from your agent project directory, you'll see:
🚀 M365 Copilot Agent Evaluations CLI
📂 Loading environment: dev
🤖 Agent ID (from M365_TITLE_ID): T_my-agent.declarativeAgent
📄 Using prompts file: ./evals/evals.json
📊 Running evaluations...
─────────────────────────────────────────────────────────────
✓ Evals completed successfully!
Results saved to: ./evals/2025-12-03_14-30-45.htmlCommands to run from your project root:
# Use .env.local (checked in current dir, then env/ folder)
runevals
# Use env/.env.dev configuration
runevals --env dev
# Use specific prompts file in your project
runevals --prompts-file ./evals/my-tests.json
# Inline prompts (no file needed, useful for quick tests)
runevals --prompts "What is Microsoft Graph?" --expected "Gateway to M365 data"
# Interactive mode (enter prompts interactively)
runevals --interactive
# Custom output location in your project
runevals --output ./reports/results.htmlOptional: Add Shortcuts to package.json
You can add shortcuts (npm scripts) to your agent project's package.json:
{
"scripts": {
"eval": "runevals",
"eval:local": "runevals --env local",
"eval:dev": "runevals --env dev"
}
}Then use shorter commands:
# Uses .env.local (ATK default)
npm run eval
# Uses env/.env.local
npm run eval:local
# Uses env/.env.dev
npm run eval:devProduction note: For production environments, use CI/CD pipelines instead of local npm run commands. See CICD_CACHE_GUIDE.md for examples.
📊 Output Formats
Results are automatically saved to ./evals/YYYY-MM-DD_HH-MM-SS.html with:
- Relevance score (1-5)
- Coherence score (1-5)
- Groundedness score (1-5)
- Per-prompt details and aggregate metrics
Other formats:
# JSON output
runevals --output results.json
# CSV output
runevals --output results.csv🔧 Command Reference
Options:
-V, --version output version number
-v, --verbose show detailed processing steps
-q, --quiet minimal output
--prompts <prompts...> inline prompts to evaluate
--expected <responses...> expected responses (with --prompts)
--prompts-file <file> JSON file with prompts
-o, --output <file> output file (JSON, CSV, or HTML)
-i, --interactive interactive prompt entry mode
--agent-id <id> override agent ID
--env <environment> environment name (default: dev)
--init-only just setup, don't run evals
-h, --help display help
Cache Commands:
cache-info show cache statistics
cache-clear remove cached Python runtime
cache-dir print cache directory path❓ Troubleshooting
Pre-cache Python Environment (Optional)
If you want to set up the Python environment ahead of time without running evaluations:
runevals --init-onlyThis is useful for:
- Pre-warming the cache in CI/CD pipelines
- Testing the setup without running evaluations
- Troubleshooting installation issues
Cache Issues
# View cache info
runevals cache-info
# Clear and rebuild
runevals cache-clear
runevals --init-only --verboseNetwork/Proxy Issues
# Set proxy
export HTTPS_PROXY=http://proxy:8080
# Retry with verbose output
runevals --init-only --verbosePermission Issues
# Check cache directory
runevals cache-dir
# Fix permissions (Unix/macOS)
chmod -R u+w $(runevals cache-dir)📚 Advanced Documentation
- CI/CD Integration - GitHub Actions, Azure DevOps caching
- Testing Guide - Cross-platform testing procedures
- Python CLI Guide - Direct Python usage (without Node.js)
Contributing
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit Contributor License Agreements.
When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.
Trademarks
This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.
