cmte

v1.1.4

Published

a year ago

Design by Committee™ except it's just you and LLMs

0High
0Medium
0Low

adamavenir

design feedback simulation ai llm llms committee design committee

Committee

Overview

This framework enables users to assemble surgical context and create iterative prompts using templates to create chained LLM workflows.

Core Example: Service Analysis

Let's illustrate the core workflow with an example designed to analyze different microservices based on their specific documentation and source code.

1. workflow.yaml:

Defines file collections, global context, and the structured services object intended for iteration.

name: "service-analysis-workflow"
description: "Analyze multiple services using their specific docs and code"
outputPath: "_output/service-analysis"

# Define file collections
files:
  # General Docs
  architectureDoc: "docs/ARCHITECTURE.md"
  # Auth Service Files
  authConfigDoc: "docs/AUTH-CONFIG.md"
  authCode: ["src/auth/**/*.js", "!src/auth/legacy/**"]
  # Data Service Files
  dataModelsDoc: "docs/DATA-MODELS.md"
  dataCode: "src/data/**/*.js"

# Define universally accessible global variables
global_variables:
  # General context available to all tasks
  overallArchitecture: "{{ files.architectureDoc }}"

# Define data structures for set iteration
iterable_objects:
  # Structured object containing service-specific context
  services: # Target for 'for_each: services' in a set
    auth: # Key becomes 'item.key' during iteration
      # Value becomes 'item.value'
      description: "Authentication and Authorization Service"
      contact: "[email protected]"
      # Embed CONTENT of auth-specific files
      configDocContent: "{{ files.authConfigDoc }}"
      codeContent: "{{ files.authCode }}"
    data: # Key becomes 'item.key'
      # Value becomes 'item.value'
      description: "Data Processing and Storage Service"
      contact: "[email protected]"
      # Embed CONTENT of data-specific files
      modelsDocContent: "{{ files.dataModelsDoc }}"
      codeContent: "{{ files.dataCode }}"

# Define the sequence of sets
sets:
  - useSet: analyze-service # Iterate over 'services' defined in iterable_objects
    for_each: services

(Note: The {{ files.collectionName }} syntax within global_variables or iterable_objects embeds the formatted content of the files.)

2. sets/analyze-service.set.yaml:

Defines a set that iterates over the services object defined in the workflow's iterable_objects.

name: "analyze-service"
description: "Run analysis tasks for each service defined in the context"
# Iterates over the 'services' object from workflow.yaml's iterable_objects
# Each item will be { key: serviceName, value: serviceObject }
for_each: services

tasks:
  # These tasks run in parallel for each service
  - useTask: identify-service-patterns
    # Task context automatically includes 'item', 'item.key', 'item.value'
    # and variables from 'global_variables' like 'overallArchitecture'
  - useTask: suggest-service-improvements

3. tasks/analyze-service.md:

A task template showing how to access the context provided by the iteration and global variables.

Analyze the service: **{{ item.key }}**

**Service Description:** {{ item.value.description }}
**Contact:** {{ item.value.contact }}

**Overall Architecture Context:**
```
{{ overallArchitecture }} # Accessing a global_variable
```

**Service-Specific Configuration Documentation:**
```
{{ item.value.configDocContent }} # Accessing data from item.value
```

**Service-Specific Code:**
```
{{ item.value.codeContent }} # Accessing data from item.value
```

**Analysis Request:**

Based on the overall architecture and the specific documentation and code for the `{{ item.key }}` service, please perform the analysis requested by the calling task (e.g., identify patterns, suggest improvements).

This example demonstrates how to:

Define multiple file sources.
Define global_variables accessible everywhere.
Structure data for iteration under iterable_objects.
Iterate over this structured data using for_each.
Access the iteration key (item.key), iteration value (item.value.*), and global variables within a task template.

Key Concepts

Now let's dive deeper into the core components illustrated above.

Workflows

A workflow is the top-level container defined in workflow.yaml, as seen in the Core Example. It specifies:

global_variables accessible throughout the workflow. These form the base context.
iterable_objects defining data structures (arrays/objects) intended for set iteration via for_each.
Named file collections (files:) to gather context using glob patterns. File content is typically embedded into global_variables or iterable_objects.
An ordered sequence of sets to be executed.

Sets

Sets group related tasks. Sets defined in the workflow's sets: list are executed sequentially in the order they appear.

Within a single set, the tasks listed are executed in parallel. Sets can optionally iterate over arrays or objects defined in the workflow's iterable_objects using for_each.

Tasks

Tasks are templated prompts (stored as .md files) that perform a specific action using an LLM, like the analyze-service.md template in the Core Example. Each task runs with a context including:

global_variables (from workflow.yaml).
Iteration variables (item, item.key, item.value if the set uses for_each).
Outputs from tasks in previous sets, accessed via prior_outputs defined in the set file.

Important: Due to parallel execution within a set, a task cannot access the output of another task running in the same set. Input/output dependencies must be managed by sequencing tasks across different sets.

Escaping Template Syntax: If you need to include literal {{ or }} characters in your template without them being interpreted as variables, you can escape them with a backslash: \{{ will render as {{, and \}} will render as }}.

Referencing Output from Iterated Sets

When dealing with outputs from previous iterated sets, there are two main scenarios:

Accessing Corresponding Iteration Output: When both the previous set (e.g., set1) and the current set (e.g., set2) iterate over the same for_each target, you often need to access the output from the previous set's task corresponding to the current item being processed in the current set.
- Syntax: setName.taskName[this].output
- Use Case: An iterated set needs the specific output from the same iteration of a previous iterated set.
- Result: Resolves to the single output value for the current iteration.
Example (set2 iterated, needs corresponding output from iterated set1):
```
# In sets/set2.set.yaml (for_each: services)
prior_outputs:
  # Get the analyze-service output for the current service
  analysis_result: "{{ set1.analyze-service[this].output }}"
```
Collecting All Iteration Outputs: When a subsequent set (often a non-iterated set, e.g., setB) needs to gather all the individual outputs generated by a task within a previous iterated set (e.g., setA).
- Syntax: setName.taskName[*].output
- Use Case: A later set needs to aggregate or process the results from all iterations of a previous iterated task.
- Result: Resolves to an array containing all the output values generated across all iterations of the specified task.
Example (setB non-iterated, needs all outputs from iterated setA):
```
# In sets/setB.set.yaml (NOT iterated)
prior_outputs:
  # Gather all results from setA's analyze-item task into an array
  all_analysis_results: "{{ setA.analyze-item[*].output }}"
```
(Note: The task template using {{ all_analysis_results }} will receive these outputs as a newline-separated string by default. Handle accordingly in your prompt.)

Referencing Output from Non-Iterated Sets

If the previous set was not iterated, you simply reference its task output directly:

setName.taskName.output: Output from a non-iterated task in a previous set. The taskName used here must match the useTask value from the task definition in the previous set's YAML file.

Example (set2 non-iterated, needs output from non-iterated set1):

# In sets/set2.set.yaml (NOT iterated)
prior_outputs:
  taskA_result: "{{ set1.taskA.output }}"

Why other syntaxes fail:

"{{ set1.analyze-service.output }}": Refers to the entire array of outputs from the iterated task, not the specific one needed.
"{{ set1.analyze-service[item.key].output }}": The prior_outputs resolver doesn't evaluate {{item.key}} within the reference string; it looks for a literal key item.key.

Important Convention: Task outputs are always stored and referenced using the exact name specified in the useTask field. There is no option to rename outputs.

Example Set Configuration (*.set.yml):

If set1 (non-iterated) contains a task useTask: taskA, and set2 (non-iterated) needs its output:

name: set2
tasks:
  - useTask: process-output
    prior_outputs:
      # Map the reference to a local variable name for use in the task template
      taskA_result: "{{ set1.taskA.output }}" # Reference uses the original task name 'taskA'

Example Task Template (tasks/process-output.md):

Processing output for file {{ item.path }}.

Result from Task A in Set 1:
{{ taskA_result }} # Access the output via the name defined in prior_outputs

Note: Referencing outputs from tasks within the same parallel set execution is unreliable and should be avoided. Structure your workflow with sequential sets for dependencies.

Where Data Comes From: Defining Your Context

Understanding where different types of data are defined and accessed is important for using Committee. The framework uses the following structure:

Global Variables: Defined in the top-level global_variables: block of your workflow.yaml. These are accessible to all sets and tasks throughout the workflow execution.
File Collections & Content: File sources are defined in the files: block of workflow.yaml. To make file content available for LLM analysis, embed it into variables within the workflow.yaml global_variables: or iterable_objects: blocks using {{ files.collectionName }}. Task templates (.md) can reference {{ files.collectionName }} to get a list of paths.
Iteration Data (item): Data structures (arrays or objects) intended for iteration using for_each are defined in the iterable_objects: block of workflow.yaml. The for_each: objectName directive within a *.set.yml file targets one of these workflow iterable objects. Tasks within that set then access the current iteration's data via the item object (or item.key / item.value for object iteration).
Task Outputs (via Prior Outputs): Outputs from previous tasks are made available to a subsequent task via the prior_outputs: block defined under that task in its *.set.yml file. This block maps a local name (used in the task template) to the structured output reference string (e.g., setName.taskName[iterationKey].output).

Essentially, workflow.yaml is the primary location for defining the initial context (global_variables), data sources (files), and data for iteration (iterable_objects), while *.set.yml files orchestrate the execution flow and manage dependencies on previously generated task outputs via prior_outputs.

File Collection Handling

You define named file collections in workflow.yaml using file paths or glob patterns (include/exclude):

# workflow.yaml
name: "code-review-workflow"
files:
  sourceCode:
    include: ["src/**/*.js"]
    exclude: ["src/vendor/**"]
  testFiles: "test/**/*.test.js"
  docs: ["README.md", "CONTRIBUTING.md"]
# ... global_variables, iterable_objects, and sets follow ...

These collections are primarily used to inject context into your workflow. The way you reference a collection using {{ files.collectionName }} has two behaviors depending on where it is used:

In workflow.yaml (global_variables: or iterable_objects:):
- Behavior: Embeds the full content of each file within the collection directly into the variable's string value. Each file's content is automatically prefixed with a Markdown header indicating its path (e.g., # path/to/file.js).
- Purpose: This is the primary mechanism for injecting substantial file content (like source code, documentation) into the context, making it available to subsequent sets and tasks for direct LLM analysis.
- Example (workflow.yaml):
```
global_variables:
  # Embeds the content of all files matching src/**/*.js,
  # each block prefixed with '# filepath'
  sourceContext: "{{ files.sourceCode }}"
  # Embeds content of README.md and CONTRIBUTING.md
  docsContext: "{{ files.docs }}"
```
In Task Templates (*.md files):
- Behavior: Renders a newline-separated list of the file paths belonging to that collection. It does not embed the file content here.
- Purpose: Useful for providing informational context within a task prompt, such as listing related files for the LLM's reference, without including their potentially large content directly in that specific prompt.
- Example (tasks/review-code.md):
```
Review the following source code file `{{ item.path }}`:

```javascript
  {{ item.content }} # Assuming iteration over a file collection
```
Consider related test files (paths listed below): {{ files.testFiles }} # Lists paths from the 'testFiles' collection

Key Distinction: Use {{ files.collectionName }} in workflow.yaml (global_variables or iterable_objects) to provide the content needed for LLM analysis. Use it in task templates (.md) when you only need to reference the paths of the files.

(Note: Advanced pattern filtering within the template tag like {{ files.collectionName:*.js }} is not currently implemented.)

Two-Phase Thinking

Tasks can optionally perform a preliminary "thinking" step before generating the final response. This is useful for complex analysis or reasoning tasks. Configure this using YAML frontmatter at the top of your task's .md file:

--- 
name: "complex-analysis-task" # Optional: Task name for clarity
thinking: true                 # REQUIRED: Enables the thinking phase
thinking_prompt: "path/to/thinking-prompt.md" # Optional: Use a separate prompt file for the thinking phase
thinking_instruction: "Analyze the input step-by-step..." # Optional: Specific instruction for the thinking phase
thinking_params:
  temperature: 0.2             # Optional: LLM parameters specifically for the thinking phase
---

# Main Task Prompt

Based on the preceding analysis, provide the final answer.

Context:
{{ context }}

If thinking: true, the framework first runs the thinking phase (using the main prompt or thinking_prompt if provided, potentially guided by thinking_instruction).
The output of the thinking phase is then automatically prepended to the context provided to the main task prompt for generating the final response.
You can control LLM parameters specifically for the thinking step using thinking_params.

Using the Framework

Installation

# Navigate to the project root directory

# Install globally (recommended for CLI use)
npm install -g . 

# Or install locally
npm install .

Basic Usage

Create a workflow directory (e.g., my-workflow/) containing:
- workflow.yaml (workflow definition)
- sets/ directory (with .set.yaml or .set.yml set definitions)
- tasks/ directory (with .md task prompt files)

Configure your environment variables (e.g., in a .env file in your project or system):

# Required for using Anthropic API (if not using --local)
ANTHROPIC_API_KEY=your_api_key_here
# Optional: Specify default model (defaults exist, e.g., Claude 3 Haiku for --lite, Sonnet otherwise)
# DEFAULT_MODEL=claude-3-sonnet-20240229 

# Optional: Set maximum tokens for LLM responses (default: 10000)
# MAX_TOKENS=100000

# Optional: Set maximum number of concurrent API requests (default: 10)
# MAX_PARALLEL_REQUESTS=15

# Optional: Set minimum delay between starting parallel API requests (in seconds, default: 0.1)
# Useful for proactively avoiding rate limits based on request frequency.
# REQUEST_DELAY_SECONDS=0.5

# Optional: Set maximum number of retries for failed API calls (default: 20)
# LLM_MAX_RETRIES=10

# Optional: For using a local LLM (requires --local flag)
# Needs a running server compatible with OpenAI API spec (e.g., Ollama, LM Studio)
LOCAL_LLM_URL=http://localhost:11434 # Default Ollama URL example
# Optional: Specify model served by local URL (required if server hosts multiple)
# LOCAL_LLM_MODEL=llama3

Run the workflow from your terminal: