@sanity/ailf-studio

v2.13.1

Published

6 days ago

AI Literacy Framework — Sanity Studio dashboard plugin

0High
0Medium
0Low

sanity-svc.npm

sanity-io

sanity sanity-plugin ai-literacy evaluation dashboard

@sanity/ailf-studio

⚠️ Internal package. This package is published publicly for convenience but is intended for internal Sanity use only. APIs and schemas may change without notice. No support is provided for external consumers.

Sanity Studio dashboard plugin for the AI Literacy Framework. Visualizes evaluation reports, score trends, comparisons, and content impact — directly inside Sanity Studio with no external backend.

All data is read from the Sanity Content Lake via GROQ.

Installation

Install the plugin into any Sanity Studio that has access to the dataset where AILF reports are stored.

1. Add the dependency

pnpm add @sanity/ailf-studio

Within the monorepo

pnpm add @sanity/ailf-studio@workspace:*

2. Register the plugin

The recommended approach registers both the document schemas and the dashboard tool in one call:

// sanity.config.ts
import { defineConfig } from "sanity"
import { ailfPlugin } from "@sanity/ailf-studio"

export default defineConfig({
  // ... your existing config
  plugins: [
    ailfPlugin(),
    // ... other plugins
  ],
})

This registers:

The ailf.report document type (read-only evaluation reports)
The ailf.webhookConfig document type (webhook-triggered evaluation settings)
The ailf.task document type (evaluation task definitions)
The ailf.featureArea document type (feature area groupings)
The ailf.referenceSolution document type (gold-standard reference implementations)
The ailf.evalRequest document type (evaluation request triggers)
The AI Literacy Framework dashboard tool in the Studio sidebar

Document Actions

The plugin registers two document actions for triggering evaluations directly from Studio:

Run Task Eval (on ailf.task documents) — evaluates a single task. Click ▶ in the document actions menu to run all test cases for the task against the current documentation. The button shows the score when complete (~10–15 min). No secrets needed — it creates an ailf.evalRequest document that a server-side webhook dispatches to the pipeline.
Run AI Eval (on content releases) — evaluates all tasks affected by a content release. Appears in the release detail page's action bar. Answers "did my doc changes help or hurt AI agent performance?" Shows score and delta vs baseline when complete.

Both actions use the same mechanism: they create an ailf.evalRequest document in the Content Lake with status: "pending". A server-side Sanity webhook picks up the document and dispatches the pipeline via GitHub Actions. The Studio component polls for the resulting report and updates the button label with the score.

3. Alternative: tool-only installation

If you only want the dashboard tool without the document schemas (e.g., the schemas are already registered elsewhere):

// sanity.config.ts
import { defineConfig } from "sanity"
import { ailfTool } from "@sanity/ailf-studio"

export default defineConfig({
  // ... your existing config
  tools: [ailfTool()],
})

4. Alternative: schema-only installation

If you want the document schemas without the dashboard (e.g., to query reports programmatically):

// sanity.config.ts
import { defineConfig } from "sanity"
import {
  reportSchema,
  webhookConfigSchema,
  taskSchema,
  featureAreaSchema,
  referenceSolutionSchema,
} from "@sanity/ailf-studio"

export default defineConfig({
  // ... your existing config
  schema: {
    types: [
      reportSchema,
      webhookConfigSchema,
      taskSchema,
      featureAreaSchema,
      referenceSolutionSchema,
    ],
  },
})

Task Execution Workflows

Tasks created in Studio are automatically included in every pipeline run — no registration step needed. There are four ways to execute tasks:

| Method | Trigger | Scope | | ------------------------------ | ------------------------------------------------------------ | ----------------------------- | | Run Task Eval action | Click ▶ on any ailf.task document | Single task | | Run AI Eval release action | Click button on a content release page | Tasks affected by the release | | CLI | ailf run (with optional --area/--task/--tag filters) | All enabled tasks | | Scheduled pipeline | GitHub Actions cron (daily + weekly) | All enabled tasks |

See the CONTRIBUTING_TASKS guide for the full execution flow and details on each method.

Dashboard Views

The plugin provides three tab views plus a detail drill-down, accessible from the AI Literacy Framework tool in the Studio sidebar.

Latest Reports

A card list of the most recent evaluation reports. Each card shows:

Overall score, doc lift, and lowest-scoring area
Evaluation mode, source, and trigger type
Git metadata (branch, PR number, origin repo) when available
Auto-comparison delta against the previous run

Click any card to navigate to the Report Detail view.

The view includes a search bar for filtering reports by document slug, area, or content release perspective.

Score Timeline

A line chart of overall and per-area scores over time. Filterable by:

Source — which documentation source was evaluated (e.g., production, branch deploy)
Mode — evaluation mode (baseline, observed, agentic)

Data points are interactive — click to jump to the full report.

Compare

Side-by-side comparison of any two reports. Select a baseline and experiment report from dropdowns, then view:

Overall score and doc-lift deltas
Per-area deltas (improved / regressed / unchanged)
Per-model deltas (when both reports include per-model breakdowns)
Noise threshold classification

Report Detail

Full drill-down into a single report (navigated from Latest Reports or a direct URL):

Overview stats — composite score, doc lift, cost, duration
Per-area score table with all dimensions (task completion, code correctness, doc coverage, lift from docs)
Three-layer table — floor / ceiling / actual decomposition (when available)
Per-model breakdowns with cost-per-quality-point
Judgment list — individual grader verdicts with reasoning
Recommendations — gap analysis remediation suggestions (when available)
Provenance card — trigger, git info (branch, PR, origin repo), grader model, context hash, eval fingerprint
Auto-comparison summary against the previous comparable run

Filtering

The Dashboard and Score Timeline views share global filters:

Source filter — values are auto-populated from distinct provenance.source.name values across all reports
Mode filter — values are auto-populated from distinct provenance.mode values

Filters are applied via GROQ query parameters, so only matching reports are fetched.

Dataset Configuration

The plugin reads reports from whatever dataset the Studio is configured to use. To point it at a dedicated report dataset, configure the Studio's dataset:

export default defineConfig({
  projectId: "<your-project-id>",
  dataset: "my-report-dataset",
  plugins: [ailfPlugin()],
})

Reports are written by the evaluation pipeline (ailf run --publish).

Exported API

The plugin exports building blocks for custom views or extensions.

Plugin & Tool

| Export | Description | | ------------ | ---------------------------- | | ailfPlugin | Full plugin (schemas + tool) | | ailfTool | Dashboard tool only |

Schemas

| Export | Description | | ------------------------- | ------------------------------------------------- | | reportSchema | ailf.report document type definition | | webhookConfigSchema | ailf.webhookConfig document type | | taskSchema | ailf.task document type definition | | featureAreaSchema | ailf.featureArea document type definition | | referenceSolutionSchema | ailf.referenceSolution document type definition | | evalRequestSchema | ailf.evalRequest document type definition |

Components

| Export | Description | | ------------------- | --------------------------------------------------------------------------------------------- | | AssertionInput | Custom input for task assertions with contextual type descriptions and monospace code styling | | CanonicalDocInput | Custom input for canonical doc references with polymorphic resolution type help | | ReleasePicker | Content release perspective picker for evaluation scoping | | MirrorBanner | Banner showing repo source, sync status, and provenance for mirrored tasks | | SyncStatusBadge | Colored badge (green/yellow/red) showing sync freshness of mirrored tasks |

Document Actions

| Export | Description | | --------------------------- | -------------------------------------------------------------------------------------------- | | GraduateToNativeAction | Converts a mirrored (read-only) task to a native (editable) task by removing origin | | RunTaskEvaluationAction | Triggers a pipeline evaluation scoped to a single task (registered on ailf.task documents) | | createRunEvaluationAction | Factory for creating a Studio action that triggers release-scoped evaluations |

Glossary

| Export | Description | | ---------- | ------------------------------------------------------------------------ | | GLOSSARY | Centralized tooltip descriptions for all evaluation metrics and concepts |

GROQ Queries

| Export | Description | | ------------------------------ | ------------------------------------------ | | latestReportsQuery | N most recent reports (filterable) | | scoreTimelineQuery | Score data points over time | | reportDetailQuery | Full report with all fields | | comparisonPairQuery | Two reports for side-by-side comparison | | contentImpactQuery | Reports related to a document ID | | recentDocumentEvalsQuery | Recent evaluations for a specific document | | articleSearchQuery | Full-text search across article documents | | distinctSourcesQuery | All unique source names | | distinctModesQuery | All unique evaluation modes | | distinctAreasQuery | All unique feature areas | | distinctModelsQuery | All unique model identifiers | | distinctPerspectivesQuery | All unique content release perspectives | | distinctTargetDocumentsQuery | All unique target document slugs |

Types

| Export | Description | | ---------------------------- | --------------------------------------------------------------------- | | ReportListItem | Shape returned by latestReportsQuery | | ReportDetail | Shape returned by reportDetailQuery | | TimelineDataPoint | Shape returned by scoreTimelineQuery | | ComparisonData | Auto-comparison data embedded in reports | | ContentImpactItem | Shape returned by contentImpactQuery | | ProvenanceData | Report provenance metadata | | SummaryData | Score summary (overall + per-area + per-model) | | ScoreItem | Individual area score entry | | JudgmentData | Individual grader judgment with reasoning | | DocumentRef | Canonical document reference (re-exported from @sanity/ailf-shared) | | ScoreGrade | Letter grade type (re-exported from @sanity/ailf-shared) | | scoreGrade | Function to compute letter grade from numeric score | | RunEvaluationActionOptions | Options for createRunEvaluationAction factory |

Utility Functions

| Export | Description | | -------------------- | --------------------------------------------------------- | | formatPercent | Format a number as a percentage string | | formatRelativeTime | Format an ISO timestamp as relative time (e.g., "2h ago") | | formatDelta | Format a score delta with +/− sign | | formatDuration | Format milliseconds as human-readable duration |

Development

# Build the plugin
pnpm --filter @sanity/ailf-studio build

# Watch mode (rebuilds on file changes)
pnpm --filter @sanity/ailf-studio dev

# Build everything (from repo root)
turbo build

The plugin uses tsup for bundling. The consuming Studio's bundler (Vite) handles the final bundle.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@sanity/ailf-studio

Installation

1. Add the dependency

Within the monorepo

2. Register the plugin

Document Actions

3. Alternative: tool-only installation

4. Alternative: schema-only installation

Task Execution Workflows

Dashboard Views

Latest Reports

Score Timeline

Compare

Report Detail

Filtering

Dataset Configuration

Exported API

Plugin & Tool

Schemas

Components

Document Actions

Glossary

GROQ Queries

Types

Utility Functions

Development

Related Documentation