@galileodev/meta
v0.9.4
Published
**Cognitive strategy layer — structured prompt composition, self-consistency validation, and autonomous prompt evolution.**
Readme
@galileodev/meta
Cognitive strategy layer — structured prompt composition, self-consistency validation, and autonomous prompt evolution.
Overview
@galileodev/meta is the meta-prompting engine for the Galileo ecosystem. While @galileodev/core handles what the pipeline does (generate, reflect, curate), meta handles how prompts are composed and improved over time. It replaces ad-hoc string concatenation with a structured template system, validates that LLM outputs respect declared constraints, and autonomously evolves templates to improve pipeline performance.
The package implements three middleware components: PromptBuilder for template-driven prompt composition, ConsistencyValidator for two-tier post-generation checking, and RatchetOptimizer for autonomous prompt evolution through controlled experiments.
Architecture
┌─────────────────────────────────────────────────┐
│ @galileodev/meta │
│ │
│ ┌──────────────────────────────────────────┐ │
│ │ PromptBuilder │ │
│ │ ┌──────────┐ ┌──────────┐ ┌────────┐ │ │
│ │ │ Template │ │ Registry │ │Tokenizer│ │ │
│ │ │ Defaults │ │ (JSONL) │ │(tiktoken│ │ │
│ │ └──────────┘ └──────────┘ └────────┘ │ │
│ └──────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────┐ │
│ │ ConsistencyValidator │ │
│ │ ┌──────────────┐ ┌──────────────────┐ │ │
│ │ │ Rule-Based │ │ LLM-Judged │ │ │
│ │ │ Constraints │ │ Constraints │ │ │
│ │ └──────────────┘ └──────────────────┘ │ │
│ └──────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────┐ │
│ │ RatchetOptimizer │ │
│ │ ┌───────────┐ ┌──────────┐ ┌───────┐ │ │
│ │ │ Experiment│ │ Evaluator│ │Ratchet│ │ │
│ │ │ Runner │ │ (Metrics)│ │ Guard │ │ │
│ │ └───────────┘ └──────────┘ └───────┘ │ │
│ └──────────────────────────────────────────┘ │
└─────────────────────────────────────────────────┘Core Modules
PromptBuilder
Replaces ad-hoc prompt string building with a declarative template system. Templates use {{slot}} placeholders, declare their own constraints, are versioned, and track lineage via parentId.
PromptTemplate — The core data structure:
id/version— Unique identifier and version numberstage— Which pipeline stage this template serves (generator, reflector, curator, decomposer, project-planner)slots— Typed slot definitions (string,entries,artifacts,lessons) with required/optional flagssections— Ordered content blocks with{{slot}}placeholdersconstraints— Declared validation rules (output-format,content-rule,language,consistency)metadata.parentId— Lineage tracking for template evolution
PromptBuilder — The rendering engine:
build(stage, slots)→RenderedPrompt— Fills all slots, validates required fields, assembles sections in ordergetTemplate(stage)→ Active template for a stageregisterTemplate(template)— Used by the optimizer to install new template versions
TemplateRegistry — JSONL-backed persistent storage:
- Append-only
.galileo/templates.jsonlfor full history .galileo/templates-active.jsonmaps stage → active template ID- Supports listing all templates, querying by stage, and activating specific versions
Default Templates
Five built-in templates ship with the package, covering all pipeline stages:
| Template | Stage | Purpose |
|----------|-------|---------|
| DEFAULT_GENERATOR_TEMPLATE | generator | XML-structured prompt for reasoning trajectories + code artifacts |
| DEFAULT_REFLECTOR_TEMPLATE | reflector | Lesson extraction with confidence scoring |
| DEFAULT_CURATOR_TEMPLATE | curator | Utility/harmfulness scoring and delta production |
| DEFAULT_DECOMPOSER_TEMPLATE | decomposer | Breaking user requests into independent, testable steps |
| DEFAULT_PROJECT_PLANNER_TEMPLATE | project-planner | Multi-phase project planning from user goals |
All templates are accessible via ALL_DEFAULT_TEMPLATES for bulk registration.
Token Counting
Local token estimation via js-tiktoken (GPT-4o tokenizer). Used for pre-call budget enforcement so the system doesn't need an API call just to measure cost.
import { countTokens } from '@galileodev/meta';
const estimate = countTokens(promptText); // number of tokensConsistencyValidator
Two-tier post-generation validation that checks LLM outputs against the template's declared constraints:
Tier 1 — Rule-based checks (fast, no LLM call):
checkLanguageConstraint— Verifies output uses the expected programming languagecheckRangeConstraint— Validates numeric scores fall within declared bounds (e.g., utility ∈ [0, 1])- Zod schema validation — Parses structured output against declared schemas (e.g.,
zod:ReflectionResponseSchema)
Tier 2 — LLM-judged checks (deeper, requires LLM call):
consistencyconstraints — An LLM judge verifies that the output doesn't contradict itself or the prompt's intent- Only triggered when rule-based checks pass, to avoid wasting tokens on obviously invalid outputs
RatchetOptimizer
Autonomous prompt evolution through controlled experiments. The optimizer generates template variants, evaluates them against historical pipeline inputs, and keeps only improvements — a monotonic ratchet that never regresses.
Optimization flow:
- Load the current active template for the target stage
- Cold-start guard — Requires ≥3 playbook entries (aborts with helpful error if not met)
- Ask an LLM to generate a variant using a configurable directive
- Run both templates against a test suite (past inputs from the playbook)
- Evaluate composite metric with accuracy gatekeeper
- If variant scores higher → commit the new template, register as active
- If equal or lower → discard (no commit)
- Feed a sliding window of the last 3 failures to the next variant generation
- Repeat up to
maxExperiments, respectingtokenBudget
Composite scoring:
score = 0 if accuracy < threshold
score = accuracy.weight × a + efficiency.weight × e if accuracy ≥ thresholdEvaluators:
computeScore— Composite metric calculation with accuracy gatekeepercomputeEfficiency— Token efficiency relative to a baseline
Experiment tracking:
.galileo/experiments.jsonl— Full audit trail of every variant tried and scored- Each
ExperimentResultrecords:templateId,parentId,score,accuracy,efficiency,tokensUsed,commitSha,reverted
API Surface
// Builder
export { PromptBuilder, TemplateRegistry, countTokens };
export { DEFAULT_GENERATOR_TEMPLATE, DEFAULT_REFLECTOR_TEMPLATE,
DEFAULT_CURATOR_TEMPLATE, DEFAULT_DECOMPOSER_TEMPLATE,
DEFAULT_PROJECT_PLANNER_TEMPLATE, ALL_DEFAULT_TEMPLATES };
// Validator
export { ConsistencyValidator, checkLanguageConstraint, checkRangeConstraint };
// Optimizer
export { RatchetOptimizer, runExperiment, computeScore, computeEfficiency, validateMetric };
// Types & Schemas
export type { PromptTemplate, SlotDefinition, TemplateSection, Constraint, RenderedPrompt };
export type { ValidationResult, ConstraintViolation, ValidationOptions };
export type { ExperimentConfig, ExperimentResult, MetricDefinition };
export { PromptTemplateSchema, SlotDefinitionSchema, TemplateSectionSchema, ConstraintSchema };Dependencies
| Dependency | Purpose |
|------------|---------|
| @galileodev/core | Pipeline types, LLMProvider interface, playbook types |
| zod | Schema validation for templates and experiment configs |
| js-tiktoken | Local token counting (GPT-4o tokenizer) |
| ulid | Unique ID generation for templates and experiments |
Usage
Building prompts
import { PromptBuilder, TemplateRegistry, DEFAULT_GENERATOR_TEMPLATE } from '@galileodev/meta';
const registry = new TemplateRegistry('.galileo');
await registry.register(DEFAULT_GENERATOR_TEMPLATE);
const builder = new PromptBuilder(registry);
const rendered = builder.build('generator', {
instruction: 'Add rate limiting to the API',
playbookContext: selectedEntries,
});
console.log(rendered.text); // The assembled prompt string
console.log(rendered.tokenEstimate); // Pre-call token count
console.log(rendered.constraints); // Constraints for validationValidating outputs
import { ConsistencyValidator } from '@galileodev/meta';
const validator = new ConsistencyValidator(llm);
const result = await validator.validate(llmOutput, rendered.constraints);
if (!result.valid) {
console.log(result.violations); // Array of ConstraintViolation
}Evolving templates
import { RatchetOptimizer } from '@galileodev/meta';
const optimizer = new RatchetOptimizer(registry, llm, '.galileo');
const results = await optimizer.run(
{
targetStage: 'generator',
tokenBudget: 50000,
maxExperiments: 5,
directive: 'Improve code quality and reduce verbosity',
accuracyThreshold: 0.7,
concurrency: 4,
metric: {
accuracy: { weight: 0.7, evaluator: 'schema-pass-rate' },
efficiency: { weight: 0.3, baseline: 2000 },
},
},
historicalInputs,
);
const wins = results.filter(r => !r.reverted).length;
console.log(`${wins}/${results.length} experiments improved the template`);Testing
npm test -w packages/metaTests cover template rendering, slot validation, registry persistence, token counting, constraint checking (both rule-based and LLM-judged), experiment execution, and metric evaluation.
License
See the root LICENSE file.
