hemagent

v1.5.10

Published

2 months ago

Bioinformatics AI Agent Plugin for OpenCode

0High
0Medium
0Low

liyc.stjude

HemAgent

Bioinformatics AI Agent Plugin for OpenCode.

HemAgent transforms any LLM into a domain-aware bioinformatics agent with multi-agent orchestration, skill-first workflows, knowledge retrieval, structured planning, and agentic loop enforcement. Built as a pure TypeScript plugin for OpenCode.

Installation

For Humans

Paste this into your LLM agent session:

Install and configure HemAgent by following the instructions here:
https://raw.githubusercontent.com/YichaoOU/hemagent_dev/main/docs/guide/installation.md

For LLM Agents

Fetch the installation guide and follow it:

curl -fsSL https://raw.githubusercontent.com/YichaoOU/hemagent_dev/main/docs/guide/installation.md

Architecture

Multi-Agent System

HemAgent uses a single-agent UX with internal multi-agent orchestration — the user talks to one agent, but complex analyses are parallelized behind the scenes.

| Agent | Role | When | |-------|------|------| | HemAgent (primary) | Orchestrator. Classifies intent, loads skills, plans, executes, delegates. | Every bioinformatics request | | Bio-Worker (subagent) | Parallel executor. Runs independent analysis subtasks concurrently. | Complex multi-step analyses |

User → HemAgent (plan + orchestrate)
              ├→ Bio-Worker 1 (QC)           ← parallel
              ├→ Bio-Worker 2 (Alignment)     ← parallel
              └→ Bio-Worker 3 (Counting)      ← parallel
       HemAgent (synthesize + report)

The orchestration follows the coordinator pattern: HemAgent synthesizes worker findings itself before directing follow-up work — never delegates blindly.

Agent Orchestration Flow

User Message
    │
    ▼
┌─ Phase 0: Intent Gate ─────────────────────┐
│  • 300+ bioinformatics keywords             │
│  • 3-tier classification (keyword→regex→    │
│    confidence scoring)                      │
│  • Domain detection (transcriptomics,       │
│    epigenomics, genomics, proteomics, etc.) │
│  • Skill matching (semantic, not name-only) │
└─────────────────────────────────────────────┘
    │
    ▼
┌─ Phase 1: Skill-First Routing ─────────────┐
│  Priority chain:                            │
│  1. User skills (.opencode/skills/)         │
│  2. Built-in skills (hemtools)              │
│  3. Documentation search (WebFetch)         │
│  4. Free execution (Bash)   ← BLOCKED      │
│     until steps 1-3 exhausted               │
└─────────────────────────────────────────────┘
    │
    ▼
┌─ Phase 2: Knowledge Retrieval (RAG) ───────┐
│  Selects relevant context by domain:        │
│  • Best practices (RNA-seq, scRNA, variant  │
│    calling protocols)                       │
│  • Software reference (tools per assay)     │
│  • Database reference (NCBI, KEGG, etc.)    │
│  • Python library reference                 │
│  • User defaults (genome, species, aligner) │
│  • External tools catalog                   │
│  • Data lake catalog                        │
│  All injected into system prompt.           │
└─────────────────────────────────────────────┘
    │
    ▼
┌─ Phase 3: Structured Planning ─────────────┐
│  Checklist format with tracking:            │
│  1. [✓] Step completed                      │
│  2. [ ] Step pending                        │
│  3. [✗] Step failed (reason)                │
│                                             │
│  Agent MUST complete all steps or mark them │
│  as explicitly failed with a reason.        │
└─────────────────────────────────────────────┘
    │
    ▼
┌─ Phase 4: Agentic Loop Enforcement ────────┐
│  After each tool execution:                 │
│  • Parse checklist state from output        │
│  • Count completed/pending/failed steps     │
│  • If pending > 0: inject continuation      │
│    prompt → force agent to keep going       │
│  • If 3+ consecutive failures: inject       │
│    recovery prompt → re-evaluate approach   │
│  • Hard cap at 200 iterations (safety)      │
│                                             │
│  Skill-First Guard:                         │
│  • Intercepts Bash calls for bio commands   │
│  • Blocks if no skill was loaded first      │
│  • Agent must load skill → then retry       │
└─────────────────────────────────────────────┘

State Management

HemAgent tracks state across the session lifecycle via hooks:

| State | Tracked In | Purpose | |-------|-----------|---------| | Bio intent detected | skill-first-guard.ts | Know when to enforce skill-first | | Skills loaded | skill-first-guard.ts | Know when to allow Bash execution | | Last user message | system-transform-hook.ts | Enable system prompt injection | | Plan progress | tool-execute-after-hook.ts | Track checklist completion | | Iteration count | continuation-enforcer.ts | Safety cap at 200 iterations | | Session lifecycle | event-hook.ts | Cleanup on session end |

Knowledge Retrieval (RAG)

When a bioinformatics request is detected, HemAgent retrieves and injects relevant domain knowledge into the system prompt:

// Domain detection: "run hemtools rna-seq" → transcriptomics
// Retrieves:
{
  knowHow: ["RNA-seq Best Practices", "Single-cell RNA-seq Best Practices"],
  software: ["STAR/HISAT2 (alignment)", "DESeq2/edgeR (DE)", ...],
  databases: ["NCBI/RefSeq", "Ensembl/GENCODE", "Gene Ontology"],
  libraries: ["scanpy", "pandas", "numpy", "matplotlib"],
  defaults: { genome: "hg38", species: "human" },  // from user config
  externalTools: ["query_pubmed.py", "query_gene.py", ...],
  dataLake: ["DepMap_CRISPRGeneDependency.csv", ...]
}

This retrieval is domain-aware — an epigenomics request gets ChIP-seq/ATAC-seq knowledge, while a transcriptomics request gets RNA-seq/scRNA-seq knowledge.

Hook System

HemAgent plugs into OpenCode via 5 hooks:

| Hook | Phase | What It Does | |------|-------|-------------| | chat.message | Intent | Classifies bio intent, matches skills, injects <system-reminder> | | experimental.chat.system.transform | Knowledge | Injects domain knowledge, defaults, tools, data lake into system prompt | | tool.execute.before | Guard | Skill-First Guard — blocks bio Bash commands if skill not loaded | | tool.execute.after | Loop | Tracks plan progress, injects continuation/recovery prompts | | event | Lifecycle | Cleans up session state |

Quick Start

Install

cd hemagent
bun install && bun run build

# Install into OpenCode's plugin cache
cd ~/.cache/opencode && bun add file:///path/to/hemagent

Configure OpenCode

Add to ~/.config/opencode/opencode.jsonc:

{
  "plugin": [
    "hemagent"
  ]
}

Configure HemAgent (optional)

Create .opencode/hemagent.json in your project:

{
  "defaults": {
    "genome": "hg38",
    "species": "human",
    "aligner": "HISAT2"
  },
  "hemagent_tools_dir": "/path/to/hemagent-tools",
  "data_lake_dir": "/path/to/data_lake"
}

Example: "run hemtools rna-seq"

User: "run hemtools rna-seq"

→ Intent Gate: bio=true, domain=transcriptomics, confidence=0.9
→ Skill Match: hemtools (built-in)
→ System Prompt: +4700 chars of RNA-seq knowledge injected

→ Agent loads hemtools skill
→ Skill: search hemtools.readthedocs.io → find top 3 programs
→ Skill: confident? use it. unsure? ask user.
→ Skill: read usage page → check parameters
→ Skill: genome not set? ASK (unless defaults configured)

→ Executes with checklist:
  1. [✓] Check hemtools installed
  2. [✓] Verify input files
  3. [✓] Run hemtools rna-seq pipeline
  4. [ ] Verify output ← continuation enforcer: "Continue with step 4"
  5. [ ] Report results

Example: "run rnaseq" (no hemtools)

→ Intent Gate: bio=true, domain=transcriptomics
→ Skill Match: NONE (no hemtools mentioned, no rnaseq skill)
→ System Prompt: RNA-seq best practices injected (DESeq2, STAR, QC thresholds...)
→ Agent uses domain knowledge + Bash to execute

Skill-First Priority Chain

1. User-defined skills (.opencode/skills/)    ← highest priority
2. Built-in skills (hemtools)
3. Search documentation (WebFetch/WebSearch)
4. Free Bash                                   ← BLOCKED until 1-3 exhausted

The Skill-First Guard intercepts Bash calls for bioinformatics commands (hemtools, samtools, STAR, GATK, etc.) and blocks them if no skill has been loaded for the session.

Adding Your Own Skills

Create .opencode/skills/my-pipeline/SKILL.md:

---
name: my-pipeline
description: "My custom analysis pipeline"
when-to-use: "When user mentions my-pipeline or analysis X"
allowed-tools: ["Bash", "Read", "Write"]
---

<skill-instruction>
Your pipeline instructions here.
Use checklist format for multi-step tasks.
</skill-instruction>

HemAgent discovers skills automatically from:

.opencode/skills/ (project-level)
~/.opencode/skills/ (user-level)
~/.claude/skills/ (user-level)

External Tools

hemagent-tools/ contains standalone Python CLI scripts callable via Bash:

| Tool | Usage | |------|-------| | query_pubmed.py | python3 query_pubmed.py "TP53 AML" --max 5 | | query_arxiv.py | python3 query_arxiv.py "single cell deep learning" | | query_gene.py | python3 query_gene.py TP53 | | query_uniprot.py | python3 query_uniprot.py BRCA1 | | query_kegg.py | python3 query_kegg.py "cell cycle" | | query_enrichment.py | python3 query_enrichment.py "TP53,BRCA1,EGFR" --db GO_Biological_Process_2021 | | data_catalog.py | python3 data_catalog.py --search "cancer" |

Set hemagent_tools_dir in config to make them available to the agent.

Plugin Structure

hemagent/
├── src/                          # Pure TypeScript plugin (zero Python)
│   ├── index.ts                  # Plugin entry (@opencode-ai/plugin)
│   ├── config.ts                 # Zod config schema + loader
│   ├── types.ts                  # Shared types
│   ├── plugin-interface.ts       # Hook handler wiring
│   ├── create-tools.ts           # Tool registration + dynamic skill discovery
│   ├── create-hooks.ts           # Hook factory
│   ├── agents/
│   │   ├── hemagent.ts           # Primary agent (orchestrator)
│   │   ├── bio-worker.ts         # Worker subagent (parallel tasks)
│   │   └── builtin-agents.ts     # Agent factory
│   ├── domain/
│   │   ├── bio-keywords.ts       # 300+ keywords + domain rules
│   │   ├── bio-intent-classifier.ts  # 3-tier intent classification
│   │   ├── bio-skill-matcher.ts  # Semantic skill matching
│   │   ├── bio-catalogs.ts       # Domain knowledge + know-how docs
│   │   ├── bio-context-builder.ts    # System prompt builder
│   │   └── bio-resource-retriever.ts # Domain-aware knowledge selection
│   ├── hooks/
│   │   ├── chat-message-hook.ts      # Intent + skill injection
│   │   ├── system-transform-hook.ts  # Knowledge retrieval + defaults + tools
│   │   ├── skill-first-guard.ts      # Blocks Bash until skill loaded
│   │   ├── tool-execute-after-hook.ts # Plan tracking + continuation
│   │   └── event-hook.ts            # Session lifecycle
│   ├── completion/
│   │   ├── checklist-tracker.ts      # Parse [✓]/[ ]/[✗]
│   │   ├── continuation-enforcer.ts  # Force plan completion
│   │   └── stop-hooks.ts            # Prevent premature stop
│   └── skills/hemtools/SKILL.md  # Built-in hemtools skill
├── hemagent-tools/               # External Python CLI tools
├── package.json
└── tsconfig.json

Configuration Reference

| Field | Default | Description | |-------|---------|-------------| | bio_mode | "auto" | "auto" / "always" / "never" | | defaults.genome | — | Default genome build (hg38, mm10, etc.) | | defaults.species | — | Default species | | defaults.aligner | — | Preferred aligner | | defaults.strandedness | — | Library strandedness | | hemagent_tools_dir | — | Path to external Python tools | | data_lake_dir | — | Path to curated datasets | | plan_completion_enforcement | true | Track + enforce plan completion | | max_plan_iterations | 200 | Safety cap on iterations | | custom_keywords | [] | Additional bio keywords |

License

MIT