markdown-agent-studio

v0.6.9

Published

3 months ago

A visual IDE for building, orchestrating, and observing autonomous AI agent teams.

0High
0Medium
0Low

robthepcguy

ai-agents multi-agent markdown react vite visualization orchestration

Markdown Agent Studio (MAS)

Stop writing boilerplate Python. Stop wiring together visual spaghetti graphs. Start building AI teams that actually learn from their mistakes.

Markdown Agent Studio is a local-first, browser-based IDE for building self-improving AI agent systems. An agent is not a stateless API call - it is a living document that can work, remember, collaborate, and evolve.

Markdown Agent Studio Demo

The Problem

Every AI agent built on the same model starts with the same intelligence. The industry tries to differentiate them through prompt engineering (telling them what to be) and fine-tuning (showing them what others have done). Neither is actual learning. A prompted agent doesn't get better at writing stories by writing stories. It gets the same result every time, from the same static starting point.

Humans don't work this way. We learn by doing, failing, reflecting, and carrying that experience forward. AI agents have had no equivalent - until now.

What MAS Does

You give an agent a task. It runs, produces output, and reflects on what it did. On the next run, its memory from the previous session feeds back in. It sees what it tried, what fell flat, what worked. It spawns sub-agents to research or review. When context fills up, a summarizer compresses working memory into long-term knowledge - deduplicating what it already knows, preserving what's new.

Run after run, the agent's accumulated knowledge grows deeper and more refined. Not because a human engineered the right prompt, but because the agent earned its expertise through iterative practice.

Why Markdown

Most agent tooling forces a choice: write code (powerful but inaccessible) or use a visual builder (accessible but opaque). Agents defined in Markdown sit in the middle. They're plain text files you can read, edit, version-control, and share. The YAML frontmatter configures behavior; the body is the system prompt. No framework lock-in, no proprietary format, no deployment step.

---
name: Story Writer
model: gemini-2.5-flash
safety_mode: balanced
reads: ["**"]
writes: [artifacts/**, memory/**]
permissions:
  spawn_agents: true
  web_access: true
autonomous:
  max_cycles: 20
  resume_mission: true
---

You are a story writer developing your craft through practice.
Read your memory for lessons from previous sessions before starting.
Write drafts to files. Reflect on what works and what doesn't.
Spawn a critic agent to review your output. Incorporate feedback.
Record what you learned to memory before finishing.

That file is the agent.

How the Learning Loop Works

Run - The agent executes its task, using tools to research, write, and collaborate with sub-agents.
Reflect - Before the session ends, the agent records what it accomplished, what failed, and what to try next.
Compress - When context fills up, a summarizer distills working memory into long-term knowledge. Duplicates are discarded; new insights are preserved.
Resume - On the next run, accumulated memory feeds back in. The agent picks up where it left off, building on everything it has learned.

Each cycle makes the agent more capable at its specific task. Not because the model changed, but because the agent's experiential knowledge grew.

Getting Started

MAS runs entirely locally. No backend infrastructure required.

Prerequisites

Git: https://git-scm.com/downloads
Node.js 20.19+: https://nodejs.org/

Setup

git clone https://github.com/RobThePCGuy/markdown-agent-studio.git
cd markdown-agent-studio
npm install
npm run dev

Open http://localhost:5173. Pick an agent, enter a prompt, click Run.

No API key? That's fine - the app ships with a scripted demo provider so you can explore the full experience first.

Provider Keys

cp .env.example .env.local

Add your provider keys (any or all):

VITE_GEMINI_API_KEY=your_key_here
VITE_OPENAI_API_KEY=your_key_here
VITE_ANTHROPIC_API_KEY=your_key_here

If no key is set, demo mode runs automatically. Select your provider and model in Settings.

The Sample Project

On first launch, a six-agent team is loaded to demonstrate multi-agent orchestration. The task: build a portfolio website from scratch.

| Agent | Role | Safety Mode | |-------|------|-------------| | Project Lead | Plans the project, delegates to specialists, writes the final summary | balanced | | UX Researcher | Searches the web for current design trends, writes research report | safe | | Designer | Reads research findings, produces a design spec with tokens and layout | balanced | | HTML Developer | Builds semantic HTML from the design spec | safe | | CSS Developer | Creates responsive CSS with custom properties (works in parallel with HTML Dev) | safe | | QA Reviewer | Audits HTML/CSS using a custom design_review tool, produces a scored report | gloves_off |

Hit Run with the Project Lead selected and watch the team coordinate: delegation, parallel execution, signaling, and consolidation - all visualized on the graph in real time.

The demo produces real output: site/index.html, site/styles.css, artifacts/design-spec.md, artifacts/qa-report.md, and artifacts/summary.md.

Agent File Reference

Agent files live in agents/*.md. The YAML frontmatter configures behavior; everything below the closing --- is the system prompt.

Frontmatter Schema

| Field | Type | Default | Description | |-------|------|---------|-------------| | name | string | required | Display name for the agent | | model | string | Settings default | LLM model override (e.g. gemini-2.5-flash, gpt-4o, claude-sonnet-4-20250514) | | safety_mode | string | gloves_off | Permission tier: safe, balanced, or gloves_off | | reads | string[] | mode default | Glob patterns the agent can read (e.g. ["agents/**", "memory/**"]) | | writes | string[] | mode default | Glob patterns the agent can write (e.g. ["artifacts/**"]) | | permissions | object or string[] | mode default | Fine-grained permission overrides (see Safety Modes) | | allowed_tools | string[] | all | Whitelist of built-in tools this agent can use | | blocked_tools | string[] | none | Blacklist of built-in tools this agent cannot use | | gloves_off_triggers | string[] | none | Keywords in the mission prompt that auto-escalate to gloves_off | | tools | object[] | none | Custom tool definitions (see Custom Tools) | | autonomous | object | none | Autonomous cycle config (see Autonomous Mode) | | mcp_servers | object[] | none | MCP server connections (see MCP Server Integration) |

Safety Modes

Every agent runs under one of three safety modes that control what it's allowed to do. Set safety_mode in the frontmatter, or let it default to gloves_off.

| Permission | safe | balanced | gloves_off | |------------|--------|------------|--------------| | Spawn agents | - | ✓ | ✓ | | Edit agents | - | - | ✓ | | Delete files | - | - | ✓ | | Web access | - | ✓ | ✓ | | Signal parent | ✓ | ✓ | ✓ | | Custom tools | - | ✓ | ✓ | | Default reads | agents/**, memory/**, artifacts/** | ** | ** | | Default writes | memory/**, artifacts/** | memory/**, artifacts/** | ** |

Aliases: street maps to safe; autonomous, track, gloves-off, and glovesoff map to gloves_off.

You can override individual permissions regardless of mode:

safety_mode: safe
permissions:
  web_access: true    # grant web access even in safe mode

Trigger-based escalation: If you set gloves_off_triggers, the agent automatically escalates to gloves_off when any trigger keyword appears in the mission prompt:

safety_mode: safe
gloves_off_triggers:
  - "delete"
  - "modify agents"
  - "full access"

Built-in Tools

Agents access tools based on their safety mode and permission settings. The full tool inventory:

File System

| Tool | Description | |------|-------------| | vfs_read | Read file contents from the virtual file system | | vfs_write | Write or overwrite a file | | vfs_list | List files by directory prefix | | vfs_delete | Delete a file (requires delete permission) |

Agent Orchestration

| Tool | Description | |------|-------------| | spawn_agent | Create and queue a new agent for execution with a task | | delegate | Hand off a task to an existing agent with context | | signal_parent | Send a message back to the agent that spawned you |

Web

| Tool | Description | |------|-------------| | web_search | Search the web via Gemini's grounding API. Requires a Gemini API key regardless of which provider is selected. | | web_fetch | Fetch and parse a URL's content. Subject to browser CORS restrictions — most third-party URLs will be blocked unless they serve permissive CORS headers. |

Memory

| Tool | Description | |------|-------------| | memory_write | Write an entry to working memory with tags | | memory_read | Search working memory by query or tags |

Knowledge Base

| Tool | Description | |------|-------------| | knowledge_query | Semantic search across all agents' long-term memory (requires vector memory) | | knowledge_contribute | Add typed knowledge as tagged working memory (skill, fact, procedure, observation, mistake, preference) |

Messaging

| Tool | Description | |------|-------------| | publish | Broadcast a message to a named channel | | subscribe | Listen to a channel and check for pending messages |

Shared State

| Tool | Description | |------|-------------| | blackboard_write | Write a key-value pair visible to all agents in the current run | | blackboard_read | Read from the shared blackboard (omit key to list all) |

Task Management (autonomous mode only)

| Tool | Description | |------|-------------| | task_queue_write | Add, update, or remove tasks (actions: add, update, remove). Only registered during autonomous runs. | | task_queue_read | Query the task queue with filters (pending, in_progress, done, blocked, all). Only registered during autonomous runs. |

Custom Tools

You can define custom tools in the agent's frontmatter. Each custom tool spawns a temporary sub-agent when invoked, with template variables substituted from the caller's arguments.

tools:
  - name: design_review
    description: Evaluate HTML and CSS against a design specification
    parameters:
      html_path:
        type: string
        description: Path to the HTML file
      css_path:
        type: string
        description: Path to the CSS file
    prompt: |
      Review the HTML at {{html_path}} and CSS at {{css_path}}.
      Score accessibility, responsiveness, performance, and design fidelity.
      Return a structured report with scores out of 100.
    model: gemini-2.5-flash          # optional: override model for this tool
    result_schema:                    # optional: guide output shape (not validated)
      type: object
      properties:
        overall_score:
          type: number
        breakdown:
          type: object

When an agent calls design_review, a temporary agent is created at agents/_custom_design_review_<timestamp>.md, runs the prompt with parameters injected, and returns the result to the caller.

Workflow Files

Workflow files live in workflows/*.md and define multi-step pipelines with dependency ordering.

---
name: Research Pipeline
steps:
  - id: research
    agent: agents/researcher.md
    prompt: "Research {topic}"
    outputs: [findings, sources]
  - id: synthesis
    agent: agents/synthesizer.md
    depends_on: [research]
    prompt: "Synthesize {research.findings} with sources from {research.sources}"
    outputs: [synthesis]
  - id: review
    agent: agents/reviewer.md
    depends_on: [synthesis]
    prompt: "Review {synthesis.synthesis} for accuracy"
---

Steps execute in topological order. Steps with no unmet dependencies run in parallel (controlled by the Workflow Parallel Steps setting, default 1). Circular dependencies are detected and rejected. Variables use {step_id.output_name} syntax for upstream data access.

Autonomous Mode

Autonomous mode runs an agent through multiple cycles, with memory carrying forward between each one.

autonomous:
  max_cycles: 20              # 1-1000, default: 10
  stop_when_complete: true    # stop early if agent assesses task is done
  resume_mission: true        # load previous mission state and continue
  seed_task_when_idle: true   # auto-generate follow-up tasks when queue empties

| Option | What it does | |--------|-------------| | max_cycles | Hard limit on iteration count. Each cycle is a full run → reflect → compress loop. | | stop_when_complete | The agent self-assesses completion after each cycle. If satisfied, it stops without exhausting all cycles. | | resume_mission | Loads the previous mission state from _mission_state_<agentId>.json in the VFS. The agent picks up where it left off with full task queue and cycle history. | | seed_task_when_idle | When the task queue empties mid-run, the agent generates continuation tasks to keep progressing. Prevents premature stops on open-ended missions. |

Mission state - including task queue, cycle notes (last 12), and token totals - persists across browser sessions.

Memory System

Memory is what makes agents learn. The system operates in three layers.

Working memory is written during a run via memory_write. It holds observations, plans, and intermediate results scoped to the current session.

Post-run summarization happens automatically after each completed run. A summarizer agent reviews everything - files written, working memory entries, conversation history - and extracts structured memories typed as skill, fact, procedure, observation, mistake, or preference. Mistakes are prioritized because they prevent repeated failures.

Long-term memory stores extracted knowledge across runs. When new memories are consolidated with existing ones, the system operates in capacity tiers:

| Tier | Condition | Behavior | |------|-----------|----------| | Generous | < 30% of budget used | Freely add new memories; only skip exact duplicates | | Selective | 30-50% of budget | Add only high-value knowledge; merge duplicates via UPDATE | | Heavy cut | > 50% of budget | Aggressively compress; target 10-20% reduction; merge related memories |

Each memory is tagged, timestamped, and access-counted. Frequently accessed memories are prioritized for retention. Vector memory (opt-in via Settings) enables semantic retrieval using Transformers.js embeddings backed by IndexedDB, so agents can find related knowledge even with different phrasing.

Shared memory is visible to all agents in a project. Private memory is scoped to a single agent.

Inter-Agent Communication

Agents coordinate through four communication primitives.

Signal parent - A spawned agent sends a message back to its creator when it finishes or needs attention. The simplest coordination pattern.

Pub/sub messaging - Agents publish messages to named channels and subscribe to receive them. Messages include timestamps and author IDs. Subscribers only receive messages published after their subscription, with acknowledgment tracking to prevent duplicates.

Blackboard - A shared key-value store visible to all agents in the current run. Useful for coordination flags, shared config, and status tracking between parallel agents. Cleared when the run ends.

Task queue (autonomous mode only) - A priority-based task list that survives across autonomous cycles. These tools are not part of the default built-in registry; they are injected when an agent runs in autonomous mode. Agents can add, update, and remove tasks with statuses (pending, in_progress, done, blocked). Lower priority numbers execute first.

Observability

MAS is designed to make agent thinking visible, not buried in terminal output.

Graph Visualization

The graph view shows agents as color-coded nodes connected by activity edges:

| Node border color | Meaning | |-------------------|---------| | Green (pulsing) | Running | | Cyan | Completed | | Yellow | Paused | | Orange | Aborted | | Red | Error | | Gray | Idle |

Activity nodes appear as agents work, colored by type: green for thinking, blue for web search, cyan for web fetch, orange for signals, yellow for spawns, purple for file system operations, and teal for tool calls.

The HUD overlay (top-left) shows live stats: agent count, running/thinking/web activity counts, spawns, signals, and errors. Total token consumption is shown in workflow mode.

Inspector Panel

Three tabs on the right side:

Chat - Streaming output from the selected agent's session, with session picker for multi-session agents.
Events - Timeline of all events (activation, tool calls, file changes, spawns, signals, errors, workflow steps, MCP connections, pub/sub, blackboard operations). Includes checkpoint restore and replay controls.
Memory - Working memory entries (current run), long-term memories (cross-run), and shared knowledge.

A policy banner above the tabs shows the selected agent's safety mode and permissions at a glance.

Run Timeline

A horizontal timeline below the graph shows the duration and overlap of all agent activations. Each bar represents one agent's execution, colored by agent identity.

Audio Feedback

MAS includes sonification - distinct audio signals for agent events so you can hear your system working even while focused elsewhere. Spawn triggers a rising chime, tool calls get a soft click, signals are a double blip, completion plays a C-E-G chord, and errors sound a warning tone with vibrato. Toggle with MUTE in the top bar.

Agent Templates

The template picker offers seven starting points:

| Template | Description | |----------|-------------| | Blank Agent | Minimal skeleton - empty sections with default permissions | | Autonomous Learner | Persistent multi-cycle missions with task queue and memory | | Researcher | Web search and sub-agent delegation for deep investigation | | Writer | Safe-mode agent that reads artifacts and writes refined prose | | Orchestrator | Gloves-off coordinator that breaks tasks into sub-agent work | | Critic | Safe-mode reviewer that reads output and signals feedback | | Tool Builder | Demonstrates custom tool definitions with parameters and prompts |

You can also save any agent as a template with Save as Template, and create new agents from saved templates. User templates are stored in templates/*.md.

Configuration

Settings Reference

Open Settings (⚙ in the top bar) to configure:

API - Provider (Gemini, Anthropic, OpenAI), API key, model selection.

Kernel limits - Max Concurrency (1-10, default 3), Max Depth (1-20, default 5), Max Fanout (1-20, default 8), Token Budget (default 500,000), Workflow Parallel Steps (1-10, default 1).

Agent persistence - Min Turns Before Stop (0-25, default 5), Force Reflection (auto-inject reflection prompt), Auto-Record Failures (write tool failures to memory).

Memory - Enable Memory, Use Vector Memory (vector embeddings vs JSON-based), Memory Token Budget (500-8000, default 2000).

Autonomous defaults - Default Max Cycles, Resume Previous Mission, Stop When Complete, Seed Continuation Tasks.

Danger zone - Reset to Sample Project, Clear Workspace.

Keyboard Shortcuts

| Shortcut | Action | |----------|--------| | Ctrl/Cmd+K | Command palette | | Ctrl/Cmd+Enter | Run once | | Ctrl/Cmd+Shift+Enter | Run autonomous | | Ctrl/Cmd+Shift+P | Pause / resume | | Ctrl/Cmd+Shift+K | Kill all | | Ctrl/Cmd+Shift+L | Focus prompt box |

The command palette supports scope prefixes: agent:, file:, action:, nav:.

MCP Server Integration

Connect external tools via the Model Context Protocol. Supported transports: http, sse, and stdio.

Configure in agent frontmatter:

mcp_servers:
  - name: docs
    transport: http
    url: http://localhost:3000/mcp
  - name: local-tools
    transport: sse
    url: http://localhost:3001/sse
  - name: cli-tools
    transport: stdio
    command: npx
    args: [my-mcp-server]
    gatewayUrl: http://localhost:3002/mcp

Stdio servers can't run directly in the browser. Use gatewayUrl to point to an HTTP bridge that wraps the stdio process. MCP tools are dynamically registered and appear alongside built-in tools.

Running as an npm Package

npx markdown-agent-studio

Or install globally:

npm install -g markdown-agent-studio
markdown-agent-studio

Or import the dist path programmatically:

import distPath from 'markdown-agent-studio';

Options: --port 4173, --host 127.0.0.1, --no-open

Architecture

src/
├── core/           Execution engine: kernel, providers, memory, summarizer,
│                   autonomous runner, workflow engine, MCP client, plugins
├── stores/         Zustand state: sessions, VFS, memory, events, pub/sub,
│                   blackboard, task queue, project metadata
├── components/     React UI: graph visualization, Monaco editor, inspector,
│                   workspace explorer, command palette, settings
├── hooks/          React hooks: useKernel, useGraphData, useOnboarding
├── types/          TypeScript definitions: agent, session, memory, events
├── utils/          Helpers: agent parser, validator, templates, diff engine
└── styles/         CSS modules

Tech stack: React, TypeScript, Vite, Zustand, React Flow, Monaco Editor, MCP SDK, Transformers.js, IndexedDB.

Development

npm run dev          # local dev server
npm run lint         # lint checks
npm test             # test suite (52 test files)
npm run build        # typecheck + production build
npm run check:all    # lint + test + build + bundle guard

CI runs lint, tests, build, bundle-size guard, and npm dry-run on every push and PR.

Release

npm run release:patch
npm run release:minor
npm run release:major

Troubleshooting

| Problem | Solution | |---------|----------| | App does not start | Confirm Node version is 20.19+ with node -v | | No AI responses | Add your API key to .env.local and select the matching provider in Settings | | Demo mode won't activate | Clear browser storage and reload - the sample project loads on first visit | | MCP stdio server unavailable | Stdio can't run in the browser directly; configure a gatewayUrl HTTP bridge | | Slow first vector search | Expected - the embedding model downloads and warms up on first use | | Agent can't write files | Check writes patterns and safety_mode permissions in the agent frontmatter | | Workflow steps won't parallelize | Increase Workflow Parallel Steps in Settings (default is 1 = sequential) | | Agent stops too early | Increase Min Turns Before Stop in Settings or set stop_when_complete: false |