yiyan-browser-agent

v1.10.2

Published

14 days ago

AI coding agent powered by Yiyan (文心一言) via browser automation — no API key needed. Performance-optimized (30-40% faster). New --headless parameter for interactive mode. Enhanced with comprehensive security.

0High
0Medium
0Low

readfor

ai agent yiyan wenxin baidu browser-automation coding-agent cli llm security command-validation safe-execution path-protection file-backup

🤖 Yiyan Browser Agent (文心一言)

An autonomous AI coding agent that runs entirely for free — no API key required.

It drives a real browser to talk to Yiyan (文心一言), giving you a Claude Code / Cursor-style coding agent powered by Baidu's AI models at zero cost.

Installation · Quick Start · Usage · HTTP API · Configuration · Tools · Contributing

⚠️ This project is currently in active development. Core functionality works, but you may encounter rough edges. Bug reports and contributions are very welcome — see Contributing.

🧠 How It Works

Most AI coding agents talk to a paid API. This one doesn't.

Instead, it uses Playwright to control a real Chromium browser, navigates to yiyan.baidu.com, sends your task, waits for the response, and parses it to extract tool calls — all automatically. Your local files and terminal are wired up as tools the AI can use, so it can read code, write files, run commands, and and build complete projects step by step.

Your Terminal
     │
     ▼
 Agent Core          ← orchestrates the loop
     │
     ├──► Browser (Playwright)  ← talks to yiyan.baidu.com
     │         │
     │    Yiyan AI (文心一言)  ← thinks, decides what tool to use
     │         │
     └──► Tool Executor  ← reads/writes files, runs commands
              │
         Your Project

📦 Installation

Windows

# 安装 Node.js (如果没有)
# 从 https://nodejs.org 下载安装，或使用 winget:
winget install OpenJS.NodeJS.LTS

# 全局安装 yiyan-browser-agent
npm install -g yiyan-browser-agent

# 安装 Chromium 浏览器 (首次安装后自动执行，约 150MB)
npx playwright install chromium

Ubuntu / Linux

# 安装 Node.js (如果没有)
curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash -
sudo apt install -y nodejs

# 全局安装 yiyan-browser-agent
npm install -g yiyan-browser-agent

# 安装 Chromium 和依赖
npx playwright install chromium
npx playwright install-deps chromium  # 安装系统依赖

Requirements: Node.js ≥ 18

🚀 Quick Start

Windows

1. 首次运行 — 登录文心一言:

yiyan-agent -i

浏览器窗口打开后，登录你的百度账号，然后回到终端按 Enter。会话会保存 — 只需登录一次。

2. 给任务:

yiyan-agent "创建一个 Express REST API，带用户认证"

3. 发送任务到已运行的服务器:

# 终端 1: 启动交互模式 (作为 HTTP 服务器)
yiyan-agent -i

# 终端 2: 发送任务 (转发到服务器，不启动新浏览器)
yiyan-agent "上海天气，20个字"

Ubuntu / Linux

1. 首次运行 — 登录文心一言:

yiyan-agent -i

2. 给任务:

yiyan-agent "build a REST API in Express with user authentication"

3. 从任意目录使用短别名 ya:

cd ~/my-project
ya "add input validation to all my API routes"

💻 Usage

yiyan-agent [OPTIONS] [TASK]

  -t, --task <task>    Task to run (or just type it as the last argument)
  -i, --interactive    Keep browser open, run multiple tasks (starts HTTP server)
  -d, --dir <path>     Set working directory (default: current directory)
  --debug              Print raw AI responses to the terminal
  --show-browser       Show browser window (non-interactive mode)
  --calibrate          Auto-detect DOM selectors (run if agent breaks)
  -h, --help           Show help

Aliases:
  ya                   Short form of yiyan-agent

Examples

# Single task — runs and exits
yiyan-agent "create a Python script that scrapes Hacker News"

# Interactive mode — keeps browser open, starts HTTP server on port 9527
yiyan-agent -i

# Run on a specific project
ya --dir ~/projects/my-app "refactor all callbacks to async/await"

# Debug mode (shows what Yiyan is actually outputting)
ya --debug "build a calculator"

# In interactive mode, type 'quit' or 'q' to exit:
❯ quit

🌐 HTTP API (v1.5.0+)

When interactive mode (-i) is running, an HTTP server starts on port 9527, allowing external services to send tasks.

Task Queue (v1.5.2+)

多客户端并发支持： 所有请求进入队列串行处理，确保每个客户端收到正确的响应。

客户端A → 请求"北京天气" → 队列位置1 → 处理 → 返回"北京天气答案"
客户端B → 请求"上海天气" → 队列位置2 → 等待 → 处理 → 返回"上海天气答案"
客户端C → 请求"广州天气" → 队列位置3 → 等待 → 处理 → 返回"广州天气答案"

API Endpoints

| Endpoint | Method | Description | |----------|--------|-------------| | /task | POST | 提交任务 | | /status | GET | 获取队列状态 | | /queue | GET | 获取队列详情 | | /task/:id | GET | 获取单个任务状态 | | / | GET | 服务信息 |

Process Communication

# Terminal 1: Start interactive mode (HTTP server)
yiyan-agent -i
# → Server listening on port 9527

# Terminal 2: Send task (forwarded to server, no new browser)
yiyan-agent "北京天气，15个字"
# → Found running server on port 9527, forwarding task...

HTTP POST API

Endpoint: POST http://localhost:9527/task

Request Body:

{
  "task": "你的任务描述",
  "newChat": true  // 可选，是否开启新对话，默认 true
}

newChat 参数说明：

| 值 | 行为 | 适用场景 | |---|---|---| | true (默认) | 点击"新对话"按钮，开启全新对话，AI 无历史记忆 | 新任务、独立问题 | | false | 在当前对话中继续，AI 会记住之前的内容 | 多轮对话、上下文关联任务 |

Response:

{
  "question": "上海天气，20个字",
  "answer": "上海今日晴，气温25°C...",
  "duration": 5234,
  "status": "success"
}

newChat 参数使用案例

验证 newChat 参数的效果（连续对话 vs 新对话）：

Windows CMD:

# 第一次请求 - 告诉AI你的名字
curl -X POST http://localhost:9527/task -H "Content-Type: application/json" -d "{\"task\":\"我叫小明\"}"

# 第二次请求 - newChat=false，AI应记得你叫小明
curl -X POST http://localhost:9527/task -H "Content-Type: application/json" -d "{\"task\":\"我叫什么名字\",\"newChat\":false}"

# 第三次请求 - newChat=true，AI应不记得你叫小明（新对话）
curl -X POST http://localhost:9527/task -H "Content-Type: application/json" -d "{\"task\":\"我叫什么名字\",\"newChat\":true}"

Ubuntu / Linux:

# 第一次请求 - 告诉AI你的名字
curl -X POST http://localhost:9527/task -H "Content-Type: application/json" -d '{"task":"我叫小明"}'

# 第二次请求 - newChat=false，AI应记得你叫小明
curl -X POST http://localhost:9527/task -H "Content-Type: application/json" -d '{"task":"我叫什么名字","newChat":false}'

# 第三次请求 - newChat=true，AI应不记得你叫小明（新对话）
curl -X POST http://localhost:9527/task -H "Content-Type: application/json" -d '{"task":"我叫什么名字","newChat":true}'

预期结果：

第二次请求 (newChat=false)：AI 回答"小明"
第三次请求 (newChat=true)：AI 回答"不知道"或"你没有告诉我"

GET /status - 队列状态查询

curl http://localhost:9527/status

Response:

{
  "queueLength": 2,
  "isProcessing": true,
  "currentTask": {
    "id": "abc123",
    "task": "北京天气",
    "elapsed": 3000
  },
  "pendingTasks": [
    { "id": "def456", "task": "上海天气", "waitTime": 2000 },
    { "id": "ghi789", "task": "广州天气", "waitTime": 1000 }
  ],
  "stats": {
    "totalProcessed": 10,
    "averageProcessTime": 5000
  }
}

GET /queue - 队列详情

curl http://localhost:9527/queue

GET /task/:id - 单个任务状态

curl http://localhost:9527/task/abc123

Response (processing):

{
  "id": "abc123",
  "task": "北京天气",
  "status": "processing",
  "startedAt": 1735001234,
  "elapsed": 3000
}

Response (pending):

{
  "id": "def456",
  "task": "上海天气",
  "status": "pending",
  "queuePosition": 2,
  "waitTime": 5000
}

Windows CMD (curl)

curl -X POST http://localhost:9527/task -H "Content-Type: application/json" -d "{\"task\":\"上海天气，20个字\"}"

PowerShell

Invoke-RestMethod -Uri "http://localhost:9527/task" -Method POST -Body '{"task":"上海天气"}' -ContentType "application/json"

Ubuntu / Linux (curl)

curl -X POST http://localhost:9527/task -H "Content-Type: application/json" -d '{"task":"上海天气，20个字"}'

From Other Programming Languages

Python:

import requests
import json

response = requests.post(
    'http://localhost:9527/task',
    json={'task': '上海天气，20个字'}
)
result = response.json()
print(result)

Node.js:

const http = require('http');

const body = JSON.stringify({ task: '上海天气，20个字' });

const req = http.request({
  hostname: 'localhost',
  port: 9527,
  path: '/task',
  method: 'POST',
  headers: { 'Content-Type': 'application/json' }
}, (res) => {
  let data = '';
  res.on('data', chunk => data += chunk);
  res.on('end', () => console.log(JSON.parse(data)));
});

req.write(body);
req.end();

Lock file: ~/.yiyan-agent/server.lock

⚙️ Configuration

Global config — applies everywhere

Create ~/.yiyan-agent/config.json:

{
  "HEADLESS": true,
  "MAX_ITERATIONS": 50,
  "STABLE_DELAY": 3000,
  "DEBUG": false
}

Per-project config — overrides global

Drop yiyan-agent.config.json in your project root:

{
  "MAX_ITERATIONS": 60,
  "MAX_OUTPUT_LENGTH": 12000
}

All settings

| Setting | Default | Description | |---|---|---| | HEADLESS | true | Hide the browser window (performance optimized) | | MAX_ITERATIONS | 40 | Max agent steps per task before stopping | | RESPONSE_TIMEOUT | 120000 | Max ms to wait for a response (120s, performance optimized) | | STABLE_DELAY | 1500 | Ms of silence that means Yiyan is done (performance optimized) | | SEND_DELAY | 100 | Ms between typing and pressing Enter (optimized) | | MAX_OUTPUT_LENGTH | 8000 | Truncate long command outputs sent to AI | | DEBUG | false | Print raw AI responses to terminal | | SESSION_DIR | ~/.yiyan-agent/session | Where browser cookies are saved | | COMMAND_SECURITY_ENABLED | true | Enable command execution security validation | | COMMAND_MODE | strict | Security mode: strict | moderate | permissive | | COMMAND_WHITELIST_ONLY | true | Only allow whitelisted commands | | COMMAND_LOG_ENABLED | true | Audit log all command executions | | PATH_TRAVERSAL_PROTECTION | true | Block path traversal attacks (../../../etc/passwd) | | FILE_OVERWRITE_PROTECTION | true | Warn and backup before overwriting files | | FILE_BACKUP_ENABLED | true | Auto-backup overwritten files to ~/.yiyan-agent/session/backups/ | | ALLOW_SYSTEM_FILE_ACCESS | false | Block access to system directories (/etc, /usr, /System, etc.) |

⚡ Performance Note: Default configuration is now optimized for speed (30-40% faster than previous version).
STABLE_DELAY reduced from 2500ms to 1500ms
RESPONSE_TIMEOUT reduced from 180s to 120s
HEADLESS enabled by default for faster rendering
If you experience stability issues (incomplete responses), restore conservative settings in your config file:
{
  "STABLE_DELAY": 2500,
  "RESPONSE_TIMEOUT": 180000,
  "HEADLESS": false
}

🛠️ Available Tools

The agent can use these tools autonomously to complete your task:

| Tool | Description | |---|---| | read_file | Read file contents with path traversal protection | | write_file | Write file with path validation and overwrite protection | | append_to_file | Append text to an existing file | | replace_in_file | Find and replace text in a file (regex supported) | | delete_file | Permanently delete a file | | list_directory | List directory contents, optionally recursive | | create_directory | Create a directory and all parents | | move_file | Move or rename a file or directory | | copy_file | Copy a file to a new location | | get_file_info | Get file metadata (size, line count, dates) | | run_command | Execute shell commands with security validation | | find_files | Find files by name pattern (e.g. *.ts) | | search_in_files | Search text inside files (like grep -r) | | read_url | Fetch and read the content of a URL | | write_files | Batch write files with security validation |

🔒 Security Note: All file operations now include comprehensive security validation:
Path traversal protection: Blocks ../../../etc/passwd attacks
System file protection: Blocks access to /etc, /usr, /System, etc.
File overwrite protection: Automatic backup for large files (>10KB)
Command validation: Dangerous commands blocked before execution
Audit logging: All operations logged to ~/.yiyan-agent/logs/
See Security Guide for configuration details.

📂 Where Data is Stored

Everything lives in ~/.yiyan-agent/ in your home directory:

~/.yiyan-agent/
├── session/        ← Browser cookies (login once, runs forever)
├── logs/           ← Session logs (only saved with --save-log)
├── server.lock     ← HTTP server process lock
└── config.json     ← Your global settings

🔧 Troubleshooting

Agent responds but creates no files

The browser DOM rendered the AI's response in a way the parser didn't catch. Run with --debug to see exactly what's being received:

yiyan-agent --debug "build a calculator"

Agent stops responding / loops

Yiyan's UI may have changed. Run the calibration tool — it inspects the live DOM and prints updated selectors:

yiyan-agent --calibrate

Login session expired

Just run without --headless — the browser opens and you log in again:

yiyan-agent --interactive

Chromium didn't download automatically

Windows:

npx playwright install chromium

Ubuntu / Linux:

npx playwright install chromium
npx playwright install-deps chromium

Response times out on long tasks

Increase the timeout in your config:

{ "RESPONSE_TIMEOUT": 300000, "STABLE_DELAY": 4000 }

HTTP server not detected by other processes

The lock file may be stale. Kill the old process and restart:

# Check process
cat ~/.yiyan-agent/server.lock

# Kill if needed
kill <PID>  # Linux
taskkill /PID <PID> /F  # Windows

# Restart
yiyan-agent -i

🗂️ Project Structure

yiyan-browser-agent/
├── src/
│   ├── index.js          ← CLI entry point and argument parsing
│   ├── agent.js          ← Core agent loop (send → wait → parse → execute)
│   ├── browser.js        ← Playwright controller for yiyan.baidu.com
│   ├── server.js         ← HTTP server for process communication (v1.5.0+)
│   ├── client.js         ← HTTP client to forward tasks (v1.5.0+)
│   ├── tools.js          ← All 15 filesystem and shell tools
│   ├── parser.js         ← Extracts tool calls from AI responses (6 strategies)
│   ├── prompt.js         ← System prompt and conversation history manager
│   ├── config.js         ← Configuration loader (global + per-project)
│   ├── logger.js         ← ANSI-colored terminal output
│   ├── calibrate.js      ← DOM selector inspector / auto-fix tool
│   └── postinstall.js    ← Auto-downloads Chromium after npm install
├── LICENSE
├── README.md
└── package.json

🤝 Contributing

Contributions are very welcome — this project is in active development and there's plenty of room to grow.

Setting up locally

git clone https://github.com/YOUR_USERNAME/yiyan-browser-agent
cd yiyan-browser-agent
npm install
npx playwright install chromium
node src/index.js --interactive

Areas that need work

🧪 Tests — there are currently no automated tests; a test suite would be a great contribution
🎨 UI selector resilience — Yiyan updates their UI occasionally; better selector strategies are welcome
🔌 More tools — image generation, browser control, database tools, etc.
🌐 Other AI frontends — adapting the browser layer to work with other free AI chats
📝 Better error messages — making failures easier to diagnose

How to contribute

Fork the repo
Create a branch: git checkout -b feature/my-improvement
Make your changes
Open a Pull Request with a clear description

Please keep PRs focused — one feature or fix per PR makes review much faster.

Reporting bugs

Open an issue on GitHub with:

What you ran
What you expected
What actually happened
Output of yiyan-agent --debug "your task" if relevant

⚠️ Disclaimer

This project automates a web browser to interact with yiyan.baidu.com. Automating web UIs may violate the terms of service of the website being automated. Use this tool for personal and development purposes only. The authors take no responsibility for account suspensions or other consequences of use.

📄 License

MIT — see LICENSE for details.

Built with Playwright · Powered by Yiyan (文心一言) · Free forever

If this project helped you, consider giving it a ⭐ on GitHub!

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

🤖 Yiyan Browser Agent (文心一言)

🧠 How It Works

📦 Installation

Windows

Ubuntu / Linux

🚀 Quick Start

Windows

Ubuntu / Linux

💻 Usage

Examples

🌐 HTTP API (v1.5.0+)

Task Queue (v1.5.2+)

API Endpoints

Process Communication

HTTP POST API

newChat 参数使用案例

GET /status - 队列状态查询

GET /queue - 队列详情

GET /task/:id - 单个任务状态

Windows CMD (curl)

PowerShell

Ubuntu / Linux (curl)

From Other Programming Languages

⚙️ Configuration

Global config — applies everywhere

Per-project config — overrides global

All settings

🛠️ Available Tools

📂 Where Data is Stored

🔧 Troubleshooting

Agent responds but creates no files

Agent stops responding / loops

Login session expired

Chromium didn't download automatically

Response times out on long tasks

HTTP server not detected by other processes

🗂️ Project Structure

🤝 Contributing

Setting up locally

Areas that need work

How to contribute

Reporting bugs

⚠️ Disclaimer

📄 License