@root-signals/scorable-cli

v0.5.0

Published

3 days ago

CLI for Scorable

0High
0Medium
0Low

The scorable CLI is a command-line tool for interacting with the Scorable API. It lets you manage and execute Judges and Evaluators, view execution logs, and run prompt testing experiments directly from the terminal.

Requires Node.js 20 or higher.

Installation

curl -sSL https://scorable.ai/cli/install.sh | sh

Or install directly with npm:

npm install -g @root-signals/scorable-cli

Or run without installing via npx:

npx @root-signals/scorable-cli judge list

Authentication

Option 1 — Free demo key (no registration required):

scorable auth demo-key

Creates a temporary key and saves it to ~/.scorable/settings.json.

Option 2 — Permanent key from scorable.ai/register:

# Interactively
scorable auth set-key

# From argument
scorable auth set-key sk-your-api-key

Option 3 — Environment variable (takes precedence over saved key):

export SCORABLE_API_KEY="sk-your-api-key"

The key lookup order is: SCORABLE_API_KEY env var → api_key in ~/.scorable/settings.json → temporary_api_key in ~/.scorable/settings.json.

Scorable Skills for AI Coding Agents

Install Scorable skills into your project so your AI coding agent (Claude Code, Cursor, etc.) can integrate evaluators automatically:

scorable skills-add

Once installed, open your coding agent in your AI powered project and use the prompt:

"Integrate scorable evaluators"

Judge Management

List judges

scorable judge list

Options: --page-size, --cursor, --search, --name, --ordering

Get a judge

scorable judge get <judge_id>

Create a judge

scorable judge create --name "My Judge" --intent "Evaluate response quality."

Options: --name (required), --intent (required), --stage, --evaluator-references (JSON string, e.g. '[{"id": "eval-id"}]')

Update a judge

scorable judge update <judge_id> --name "Updated Name"

Options: --name, --stage, --evaluator-references (use "[]" to clear)

Delete a judge

scorable judge delete <judge_id>

Prompts for confirmation. Use --yes to skip.

Duplicate a judge

scorable judge duplicate <judge_id>

Judge Execution

Execute by ID

scorable judge execute <judge_id> --request "What is the capital of France?" --response "Paris"

Options: --request, --response, --turns (JSON array of conversation turns), --contexts (JSON list), --expected-output, --tag (repeatable), --user-id, --session-id, --system-prompt

Pipe a response via stdin:

echo "Paris" | scorable judge execute <judge_id> --request "What is the capital of France?"
cat response.txt | scorable judge execute <judge_id>

For multi-turn conversations, pass the full history as a JSON array:

scorable judge execute <judge_id> --turns '[{"role":"user","content":"Hi"},{"role":"assistant","content":"Hello!"}]'

Execute by name

scorable judge execute-by-name "My Judge" --request "What is the capital of France?" --response "Paris"

Accepts the same options as execute. Stdin piping and --turns work the same way.

Evaluator Management

List evaluators

scorable evaluator list

Options: --page-size, --cursor, --search, --name, --ordering

Get an evaluator

scorable evaluator get <evaluator_id>

Create an evaluator

scorable evaluator create \
  --name "My Evaluator" \
  --scoring-criteria "Does the {{ response }} directly answer the user's question?" \
  --intent "Evaluate response relevance"

Options: --name (required), --scoring-criteria (required — must contain {{ request }} and/or {{ response }}), --intent or --objective-id (one required), --system-message, --models (JSON array), --overwrite, --objective-version-id

Update an evaluator

scorable evaluator update <evaluator_id> --name "Updated Name"

Options: --name, --scoring-criteria, --system-message, --models (JSON array), --objective-id, --objective-version-id

Delete an evaluator

scorable evaluator delete <evaluator_id>

Prompts for confirmation. Use --yes to skip.

Duplicate an evaluator

scorable evaluator duplicate <evaluator_id>

Evaluator Execution

Execute by ID

scorable evaluator execute <evaluator_id> --request "What is 2+2?" --response "4"

Options: --request, --response, --turns (JSON array of conversation turns), --contexts (JSON list), --expected-output, --tag (repeatable), --user-id, --session-id, --system-prompt, --variables (JSON object of extra template variables)

Stdin piping and --turns work the same way as for judge execution.

For evaluators with custom template placeholders beyond {{request}}/{{response}}:

scorable evaluator execute <evaluator_id> --request "Hello" --variables '{"lang":"EN","topic":"science"}'

Execute by name

scorable evaluator execute-by-name "My Evaluator" --request "What is 2+2?" --response "4"

Accepts the same options as execute, including --variables.

Execution Logs

List execution logs

scorable execution-log list

Options: --page-size, --cursor, --search, --evaluator-id, --judge-id, --model, --tags, --score-min, --score-max, --created-at-after, --created-at-before, --owner-email

Get an execution log

scorable execution-log get <log_id>

Prompt Testing

Initialize a config file and run experiments:

scorable pt init
scorable pt run

Use a custom config path:

scorable pt run --config path/to/prompt-tests.yaml

The prompt-test command is an alias for pt.

Config file format

prompts:
  - "Extract info from: {{text}}"

inputs:
  - vars:
      text: "John Doe, [email protected]"

# Or use a dataset instead of inline inputs:
# dataset_id: "<uuid>"

models:
  - gpt-4o-mini
  - gemini-2.5-flash-lite

evaluators:
  - name: Precision
  - name: Confidentiality

# Optional: enforce structured output
# response_schema:
#   type: object
#   properties:
#     name: { type: string }

Results are displayed in a table and a browser link is printed for the full comparison view.

Development

npm install
npm run build       # compile TypeScript
npm test            # run tests
npm run typecheck   # type-check without emitting
npm run lint        # lint with oxlint
npm run fmt         # format with oxfmt

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

Installation

Authentication

Scorable Skills for AI Coding Agents

Judge Management

List judges

Get a judge

Create a judge

Update a judge

Delete a judge

Duplicate a judge

Judge Execution

Execute by ID

Execute by name

Evaluator Management

List evaluators

Get an evaluator

Create an evaluator

Update an evaluator

Delete an evaluator

Duplicate an evaluator

Evaluator Execution

Execute by ID

Execute by name

Execution Logs

List execution logs

Get an execution log

Prompt Testing

Config file format

Development