@sanity/ailf-studio
v1.0.0
Published
AI Literacy Framework — Sanity Studio dashboard plugin
Maintainers
Readme
@sanity/ailf-studio
⚠️ Internal package. This package is published publicly for convenience but is intended for internal Sanity use only. APIs and schemas may change without notice. No support is provided for external consumers.
Sanity Studio dashboard plugin for the AI Literacy Framework. Visualizes evaluation reports, score trends, comparisons, and content impact — directly inside Sanity Studio with no external backend.
All data is read from the Sanity Content Lake via GROQ.
Installation
Install the plugin into any Sanity Studio that has access to the dataset where AILF reports are stored.
1. Add the dependency
pnpm add @sanity/ailf-studioWithin the monorepo
pnpm add @sanity/ailf-studio@workspace:*2. Register the plugin
The recommended approach registers both the document schemas and the dashboard tool in one call:
// sanity.config.ts
import { defineConfig } from "sanity"
import { ailfPlugin } from "@sanity/ailf-studio"
export default defineConfig({
// ... your existing config
plugins: [
ailfPlugin(),
// ... other plugins
],
})This registers:
- The
ailf.reportdocument type (read-only evaluation reports) - The
ailf.webhookConfigdocument type (webhook-triggered evaluation settings) - The
ailf.taskdocument type (evaluation task definitions) - The
ailf.featureAreadocument type (feature area groupings) - The
ailf.referenceSolutiondocument type (gold-standard reference implementations) - The
ailf.evalRequestdocument type (evaluation request triggers) - The AI Literacy Framework dashboard tool in the Studio sidebar
Document Actions
The plugin registers two document actions for triggering evaluations directly from Studio:
Run Task Eval (on
ailf.taskdocuments) — evaluates a single task. Click ▶ in the document actions menu to run all test cases for the task against the current documentation. The button shows the score when complete (~10–15 min). No secrets needed — it creates anailf.evalRequestdocument that a server-side webhook dispatches to the pipeline.Run AI Eval (on content releases) — evaluates all tasks affected by a content release. Appears in the release detail page's action bar. Answers "did my doc changes help or hurt AI agent performance?" Shows score and delta vs baseline when complete.
Both actions use the same mechanism: they create an ailf.evalRequest document
in the Content Lake with status: "pending". A server-side Sanity webhook picks
up the document and dispatches the pipeline via GitHub Actions. The Studio
component polls for the resulting report and updates the button label with the
score.
3. Alternative: tool-only installation
If you only want the dashboard tool without the document schemas (e.g., the schemas are already registered elsewhere):
// sanity.config.ts
import { defineConfig } from "sanity"
import { ailfTool } from "@sanity/ailf-studio"
export default defineConfig({
// ... your existing config
tools: [ailfTool()],
})4. Alternative: schema-only installation
If you want the document schemas without the dashboard (e.g., to query reports programmatically):
// sanity.config.ts
import { defineConfig } from "sanity"
import {
reportSchema,
webhookConfigSchema,
taskSchema,
featureAreaSchema,
referenceSolutionSchema,
} from "@sanity/ailf-studio"
export default defineConfig({
// ... your existing config
schema: {
types: [
reportSchema,
webhookConfigSchema,
taskSchema,
featureAreaSchema,
referenceSolutionSchema,
],
},
})Task Execution Workflows
Tasks created in Studio are automatically included in every pipeline run — no registration step needed. There are four ways to execute tasks:
| Method | Trigger | Scope |
| ------------------------------ | ----------------------------------------------------------------- | ----------------------------- |
| Run Task Eval action | Click ▶ on any ailf.task document | Single task |
| Run AI Eval release action | Click button on a content release page | Tasks affected by the release |
| CLI pipeline | ailf pipeline (with optional --area/--task/--tag filters) | All enabled tasks |
| Scheduled pipeline | GitHub Actions cron (daily + weekly) | All enabled tasks |
See the CONTRIBUTING_TASKS guide for the full execution flow and details on each method.
Dashboard Views
The plugin provides three tab views plus a detail drill-down, accessible from the AI Literacy Framework tool in the Studio sidebar.
Latest Reports
A card list of the most recent evaluation reports. Each card shows:
- Overall score, doc lift, and lowest-scoring area
- Evaluation mode, source, and trigger type
- Git metadata (branch, PR number, origin repo) when available
- Auto-comparison delta against the previous run
Click any card to navigate to the Report Detail view.
The view includes a search bar for filtering reports by document slug, area, or content release perspective.
Score Timeline
A line chart of overall and per-area scores over time. Filterable by:
- Source — which documentation source was evaluated (e.g., production, branch deploy)
- Mode — evaluation mode (baseline, observed, agentic)
Data points are interactive — click to jump to the full report.
Compare
Side-by-side comparison of any two reports. Select a baseline and experiment report from dropdowns, then view:
- Overall score and doc-lift deltas
- Per-area deltas (improved / regressed / unchanged)
- Per-model deltas (when both reports include per-model breakdowns)
- Noise threshold classification
Report Detail
Full drill-down into a single report (navigated from Latest Reports or a direct URL):
- Overview stats — composite score, doc lift, cost, duration
- Per-area score table with all dimensions (task completion, code correctness, doc coverage, lift from docs)
- Three-layer table — floor / ceiling / actual decomposition (when available)
- Per-model breakdowns with cost-per-quality-point
- Judgment list — individual grader verdicts with reasoning
- Recommendations — gap analysis remediation suggestions (when available)
- Provenance card — trigger, git info (branch, PR, origin repo), grader model, context hash, eval fingerprint
- Auto-comparison summary against the previous comparable run
Filtering
The Dashboard and Score Timeline views share global filters:
- Source filter — values are auto-populated from distinct
provenance.source.namevalues across all reports - Mode filter — values are auto-populated from distinct
provenance.modevalues
Filters are applied via GROQ query parameters, so only matching reports are fetched.
Dataset Configuration
The plugin reads reports from whatever dataset the Studio is configured to use. To point it at a dedicated report dataset, configure the Studio's dataset:
export default defineConfig({
projectId: "<your-project-id>",
dataset: "my-report-dataset",
plugins: [ailfPlugin()],
})Reports are written by the evaluation pipeline (ailf pipeline --publish).
Exported API
The plugin exports building blocks for custom views or extensions.
Plugin & Tool
| Export | Description |
| ------------ | ---------------------------- |
| ailfPlugin | Full plugin (schemas + tool) |
| ailfTool | Dashboard tool only |
Schemas
| Export | Description |
| ------------------------- | ------------------------------------------------- |
| reportSchema | ailf.report document type definition |
| webhookConfigSchema | ailf.webhookConfig document type |
| taskSchema | ailf.task document type definition |
| featureAreaSchema | ailf.featureArea document type definition |
| referenceSolutionSchema | ailf.referenceSolution document type definition |
| evalRequestSchema | ailf.evalRequest document type definition |
Components
| Export | Description |
| ------------------- | --------------------------------------------------------------------------------------------- |
| AssertionInput | Custom input for task assertions with contextual type descriptions and monospace code styling |
| CanonicalDocInput | Custom input for canonical doc references with polymorphic resolution type help |
| ReleasePicker | Content release perspective picker for evaluation scoping |
| MirrorBanner | Banner showing repo source, sync status, and provenance for mirrored tasks |
| SyncStatusBadge | Colored badge (green/yellow/red) showing sync freshness of mirrored tasks |
Document Actions
| Export | Description |
| --------------------------- | -------------------------------------------------------------------------------------------- |
| GraduateToNativeAction | Converts a mirrored (read-only) task to a native (editable) task by removing origin |
| RunTaskEvaluationAction | Triggers a pipeline evaluation scoped to a single task (registered on ailf.task documents) |
| createRunEvaluationAction | Factory for creating a Studio action that triggers release-scoped evaluations |
Glossary
| Export | Description |
| ---------- | ------------------------------------------------------------------------ |
| GLOSSARY | Centralized tooltip descriptions for all evaluation metrics and concepts |
GROQ Queries
| Export | Description |
| ------------------------------ | ------------------------------------------ |
| latestReportsQuery | N most recent reports (filterable) |
| scoreTimelineQuery | Score data points over time |
| reportDetailQuery | Full report with all fields |
| comparisonPairQuery | Two reports for side-by-side comparison |
| contentImpactQuery | Reports related to a document ID |
| recentDocumentEvalsQuery | Recent evaluations for a specific document |
| articleSearchQuery | Full-text search across article documents |
| distinctSourcesQuery | All unique source names |
| distinctModesQuery | All unique evaluation modes |
| distinctAreasQuery | All unique feature areas |
| distinctModelsQuery | All unique model identifiers |
| distinctPerspectivesQuery | All unique content release perspectives |
| distinctTargetDocumentsQuery | All unique target document slugs |
Types
| Export | Description |
| ---------------------------- | --------------------------------------------------------------------- |
| ReportListItem | Shape returned by latestReportsQuery |
| ReportDetail | Shape returned by reportDetailQuery |
| TimelineDataPoint | Shape returned by scoreTimelineQuery |
| ComparisonData | Auto-comparison data embedded in reports |
| ContentImpactItem | Shape returned by contentImpactQuery |
| ProvenanceData | Report provenance metadata |
| SummaryData | Score summary (overall + per-area + per-model) |
| ScoreItem | Individual area score entry |
| RecommendationGap | Single gap analysis recommendation |
| RecommendationsData | Full recommendations payload |
| JudgmentData | Individual grader judgment with reasoning |
| DocumentRef | Canonical document reference (re-exported from @sanity/ailf-shared) |
| ScoreGrade | Letter grade type (re-exported from @sanity/ailf-shared) |
| scoreGrade | Function to compute letter grade from numeric score |
| RunEvaluationActionOptions | Options for createRunEvaluationAction factory |
Utility Functions
| Export | Description |
| -------------------- | --------------------------------------------------------- |
| formatPercent | Format a number as a percentage string |
| formatRelativeTime | Format an ISO timestamp as relative time (e.g., "2h ago") |
| formatDelta | Format a score delta with +/− sign |
| formatDuration | Format milliseconds as human-readable duration |
Development
# Build the plugin
pnpm --filter @sanity/ailf-studio build
# Watch mode (rebuilds on file changes)
pnpm --filter @sanity/ailf-studio dev
# Build everything (from repo root)
turbo buildThe plugin uses tsup for bundling. The consuming Studio's bundler (Vite) handles the final bundle.
Related Documentation
- Report Store Design — full architecture and implementation plan
- Visibility & Workflows — design rationale for the dashboard views
- Report Store Architecture — Sanity Content Lake as the system of record
