yiyan-browser-agent
v1.10.2
Published
AI coding agent powered by Yiyan (文心一言) via browser automation — no API key needed. Performance-optimized (30-40% faster). New --headless parameter for interactive mode. Enhanced with comprehensive security.
Maintainers
Readme
🤖 Yiyan Browser Agent (文心一言)
An autonomous AI coding agent that runs entirely for free — no API key required.
It drives a real browser to talk to Yiyan (文心一言), giving you a Claude Code / Cursor-style coding agent powered by Baidu's AI models at zero cost.
Installation · Quick Start · Usage · HTTP API · Configuration · Tools · Contributing
⚠️ This project is currently in active development. Core functionality works, but you may encounter rough edges. Bug reports and contributions are very welcome — see Contributing.
🧠 How It Works
Most AI coding agents talk to a paid API. This one doesn't.
Instead, it uses Playwright to control a real Chromium browser, navigates to yiyan.baidu.com, sends your task, waits for the response, and parses it to extract tool calls — all automatically. Your local files and terminal are wired up as tools the AI can use, so it can read code, write files, run commands, and and build complete projects step by step.
Your Terminal
│
▼
Agent Core ← orchestrates the loop
│
├──► Browser (Playwright) ← talks to yiyan.baidu.com
│ │
│ Yiyan AI (文心一言) ← thinks, decides what tool to use
│ │
└──► Tool Executor ← reads/writes files, runs commands
│
Your Project📦 Installation
Windows
# 安装 Node.js (如果没有)
# 从 https://nodejs.org 下载安装,或使用 winget:
winget install OpenJS.NodeJS.LTS
# 全局安装 yiyan-browser-agent
npm install -g yiyan-browser-agent
# 安装 Chromium 浏览器 (首次安装后自动执行,约 150MB)
npx playwright install chromiumUbuntu / Linux
# 安装 Node.js (如果没有)
curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash -
sudo apt install -y nodejs
# 全局安装 yiyan-browser-agent
npm install -g yiyan-browser-agent
# 安装 Chromium 和依赖
npx playwright install chromium
npx playwright install-deps chromium # 安装系统依赖Requirements: Node.js ≥ 18
🚀 Quick Start
Windows
1. 首次运行 — 登录文心一言:
yiyan-agent -i浏览器窗口打开后,登录你的百度账号,然后回到终端按 Enter。会话会保存 — 只需登录一次。
2. 给任务:
yiyan-agent "创建一个 Express REST API,带用户认证"3. 发送任务到已运行的服务器:
# 终端 1: 启动交互模式 (作为 HTTP 服务器)
yiyan-agent -i
# 终端 2: 发送任务 (转发到服务器,不启动新浏览器)
yiyan-agent "上海天气,20个字"Ubuntu / Linux
1. 首次运行 — 登录文心一言:
yiyan-agent -i2. 给任务:
yiyan-agent "build a REST API in Express with user authentication"3. 从任意目录使用短别名 ya:
cd ~/my-project
ya "add input validation to all my API routes"💻 Usage
yiyan-agent [OPTIONS] [TASK]
-t, --task <task> Task to run (or just type it as the last argument)
-i, --interactive Keep browser open, run multiple tasks (starts HTTP server)
-d, --dir <path> Set working directory (default: current directory)
--debug Print raw AI responses to the terminal
--show-browser Show browser window (non-interactive mode)
--calibrate Auto-detect DOM selectors (run if agent breaks)
-h, --help Show help
Aliases:
ya Short form of yiyan-agentExamples
# Single task — runs and exits
yiyan-agent "create a Python script that scrapes Hacker News"
# Interactive mode — keeps browser open, starts HTTP server on port 9527
yiyan-agent -i
# Run on a specific project
ya --dir ~/projects/my-app "refactor all callbacks to async/await"
# Debug mode (shows what Yiyan is actually outputting)
ya --debug "build a calculator"
# In interactive mode, type 'quit' or 'q' to exit:
❯ quit🌐 HTTP API (v1.5.0+)
When interactive mode (-i) is running, an HTTP server starts on port 9527, allowing external services to send tasks.
Task Queue (v1.5.2+)
多客户端并发支持: 所有请求进入队列串行处理,确保每个客户端收到正确的响应。
客户端A → 请求"北京天气" → 队列位置1 → 处理 → 返回"北京天气答案"
客户端B → 请求"上海天气" → 队列位置2 → 等待 → 处理 → 返回"上海天气答案"
客户端C → 请求"广州天气" → 队列位置3 → 等待 → 处理 → 返回"广州天气答案"API Endpoints
| Endpoint | Method | Description |
|----------|--------|-------------|
| /task | POST | 提交任务 |
| /status | GET | 获取队列状态 |
| /queue | GET | 获取队列详情 |
| /task/:id | GET | 获取单个任务状态 |
| / | GET | 服务信息 |
Process Communication
# Terminal 1: Start interactive mode (HTTP server)
yiyan-agent -i
# → Server listening on port 9527
# Terminal 2: Send task (forwarded to server, no new browser)
yiyan-agent "北京天气,15个字"
# → Found running server on port 9527, forwarding task...HTTP POST API
Endpoint: POST http://localhost:9527/task
Request Body:
{
"task": "你的任务描述",
"newChat": true // 可选,是否开启新对话,默认 true
}newChat 参数说明:
| 值 | 行为 | 适用场景 |
|---|---|---|
| true (默认) | 点击"新对话"按钮,开启全新对话,AI 无历史记忆 | 新任务、独立问题 |
| false | 在当前对话中继续,AI 会记住之前的内容 | 多轮对话、上下文关联任务 |
Response:
{
"question": "上海天气,20个字",
"answer": "上海今日晴,气温25°C...",
"duration": 5234,
"status": "success"
}newChat 参数使用案例
验证 newChat 参数的效果(连续对话 vs 新对话):
Windows CMD:
# 第一次请求 - 告诉AI你的名字
curl -X POST http://localhost:9527/task -H "Content-Type: application/json" -d "{\"task\":\"我叫小明\"}"
# 第二次请求 - newChat=false,AI应记得你叫小明
curl -X POST http://localhost:9527/task -H "Content-Type: application/json" -d "{\"task\":\"我叫什么名字\",\"newChat\":false}"
# 第三次请求 - newChat=true,AI应不记得你叫小明(新对话)
curl -X POST http://localhost:9527/task -H "Content-Type: application/json" -d "{\"task\":\"我叫什么名字\",\"newChat\":true}"Ubuntu / Linux:
# 第一次请求 - 告诉AI你的名字
curl -X POST http://localhost:9527/task -H "Content-Type: application/json" -d '{"task":"我叫小明"}'
# 第二次请求 - newChat=false,AI应记得你叫小明
curl -X POST http://localhost:9527/task -H "Content-Type: application/json" -d '{"task":"我叫什么名字","newChat":false}'
# 第三次请求 - newChat=true,AI应不记得你叫小明(新对话)
curl -X POST http://localhost:9527/task -H "Content-Type: application/json" -d '{"task":"我叫什么名字","newChat":true}'预期结果:
- 第二次请求 (
newChat=false):AI 回答"小明" - 第三次请求 (
newChat=true):AI 回答"不知道"或"你没有告诉我"
GET /status - 队列状态查询
curl http://localhost:9527/statusResponse:
{
"queueLength": 2,
"isProcessing": true,
"currentTask": {
"id": "abc123",
"task": "北京天气",
"elapsed": 3000
},
"pendingTasks": [
{ "id": "def456", "task": "上海天气", "waitTime": 2000 },
{ "id": "ghi789", "task": "广州天气", "waitTime": 1000 }
],
"stats": {
"totalProcessed": 10,
"averageProcessTime": 5000
}
}GET /queue - 队列详情
curl http://localhost:9527/queueGET /task/:id - 单个任务状态
curl http://localhost:9527/task/abc123Response (processing):
{
"id": "abc123",
"task": "北京天气",
"status": "processing",
"startedAt": 1735001234,
"elapsed": 3000
}Response (pending):
{
"id": "def456",
"task": "上海天气",
"status": "pending",
"queuePosition": 2,
"waitTime": 5000
}Windows CMD (curl)
curl -X POST http://localhost:9527/task -H "Content-Type: application/json" -d "{\"task\":\"上海天气,20个字\"}"PowerShell
Invoke-RestMethod -Uri "http://localhost:9527/task" -Method POST -Body '{"task":"上海天气"}' -ContentType "application/json"Ubuntu / Linux (curl)
curl -X POST http://localhost:9527/task -H "Content-Type: application/json" -d '{"task":"上海天气,20个字"}'From Other Programming Languages
Python:
import requests
import json
response = requests.post(
'http://localhost:9527/task',
json={'task': '上海天气,20个字'}
)
result = response.json()
print(result)Node.js:
const http = require('http');
const body = JSON.stringify({ task: '上海天气,20个字' });
const req = http.request({
hostname: 'localhost',
port: 9527,
path: '/task',
method: 'POST',
headers: { 'Content-Type': 'application/json' }
}, (res) => {
let data = '';
res.on('data', chunk => data += chunk);
res.on('end', () => console.log(JSON.parse(data)));
});
req.write(body);
req.end();Lock file: ~/.yiyan-agent/server.lock
⚙️ Configuration
Global config — applies everywhere
Create ~/.yiyan-agent/config.json:
{
"HEADLESS": true,
"MAX_ITERATIONS": 50,
"STABLE_DELAY": 3000,
"DEBUG": false
}Per-project config — overrides global
Drop yiyan-agent.config.json in your project root:
{
"MAX_ITERATIONS": 60,
"MAX_OUTPUT_LENGTH": 12000
}All settings
| Setting | Default | Description |
|---|---|---|
| HEADLESS | true | Hide the browser window (performance optimized) |
| MAX_ITERATIONS | 40 | Max agent steps per task before stopping |
| RESPONSE_TIMEOUT | 120000 | Max ms to wait for a response (120s, performance optimized) |
| STABLE_DELAY | 1500 | Ms of silence that means Yiyan is done (performance optimized) |
| SEND_DELAY | 100 | Ms between typing and pressing Enter (optimized) |
| MAX_OUTPUT_LENGTH | 8000 | Truncate long command outputs sent to AI |
| DEBUG | false | Print raw AI responses to terminal |
| SESSION_DIR | ~/.yiyan-agent/session | Where browser cookies are saved |
| COMMAND_SECURITY_ENABLED | true | Enable command execution security validation |
| COMMAND_MODE | strict | Security mode: strict | moderate | permissive |
| COMMAND_WHITELIST_ONLY | true | Only allow whitelisted commands |
| COMMAND_LOG_ENABLED | true | Audit log all command executions |
| PATH_TRAVERSAL_PROTECTION | true | Block path traversal attacks (../../../etc/passwd) |
| FILE_OVERWRITE_PROTECTION | true | Warn and backup before overwriting files |
| FILE_BACKUP_ENABLED | true | Auto-backup overwritten files to ~/.yiyan-agent/session/backups/ |
| ALLOW_SYSTEM_FILE_ACCESS | false | Block access to system directories (/etc, /usr, /System, etc.) |
⚡ Performance Note: Default configuration is now optimized for speed (30-40% faster than previous version).
STABLE_DELAYreduced from 2500ms to 1500msRESPONSE_TIMEOUTreduced from 180s to 120sHEADLESSenabled by default for faster renderingIf you experience stability issues (incomplete responses), restore conservative settings in your config file:
{ "STABLE_DELAY": 2500, "RESPONSE_TIMEOUT": 180000, "HEADLESS": false }
🛠️ Available Tools
The agent can use these tools autonomously to complete your task:
| Tool | Description |
|---|---|
| read_file | Read file contents with path traversal protection |
| write_file | Write file with path validation and overwrite protection |
| append_to_file | Append text to an existing file |
| replace_in_file | Find and replace text in a file (regex supported) |
| delete_file | Permanently delete a file |
| list_directory | List directory contents, optionally recursive |
| create_directory | Create a directory and all parents |
| move_file | Move or rename a file or directory |
| copy_file | Copy a file to a new location |
| get_file_info | Get file metadata (size, line count, dates) |
| run_command | Execute shell commands with security validation |
| find_files | Find files by name pattern (e.g. *.ts) |
| search_in_files | Search text inside files (like grep -r) |
| read_url | Fetch and read the content of a URL |
| write_files | Batch write files with security validation |
🔒 Security Note: All file operations now include comprehensive security validation:
- Path traversal protection: Blocks
../../../etc/passwdattacks- System file protection: Blocks access to
/etc,/usr,/System, etc.- File overwrite protection: Automatic backup for large files (>10KB)
- Command validation: Dangerous commands blocked before execution
- Audit logging: All operations logged to
~/.yiyan-agent/logs/See Security Guide for configuration details.
📂 Where Data is Stored
Everything lives in ~/.yiyan-agent/ in your home directory:
~/.yiyan-agent/
├── session/ ← Browser cookies (login once, runs forever)
├── logs/ ← Session logs (only saved with --save-log)
├── server.lock ← HTTP server process lock
└── config.json ← Your global settings🔧 Troubleshooting
Agent responds but creates no files
The browser DOM rendered the AI's response in a way the parser didn't catch. Run with --debug to see exactly what's being received:
yiyan-agent --debug "build a calculator"Agent stops responding / loops
Yiyan's UI may have changed. Run the calibration tool — it inspects the live DOM and prints updated selectors:
yiyan-agent --calibrateLogin session expired
Just run without --headless — the browser opens and you log in again:
yiyan-agent --interactiveChromium didn't download automatically
Windows:
npx playwright install chromiumUbuntu / Linux:
npx playwright install chromium
npx playwright install-deps chromiumResponse times out on long tasks
Increase the timeout in your config:
{ "RESPONSE_TIMEOUT": 300000, "STABLE_DELAY": 4000 }HTTP server not detected by other processes
The lock file may be stale. Kill the old process and restart:
# Check process
cat ~/.yiyan-agent/server.lock
# Kill if needed
kill <PID> # Linux
taskkill /PID <PID> /F # Windows
# Restart
yiyan-agent -i🗂️ Project Structure
yiyan-browser-agent/
├── src/
│ ├── index.js ← CLI entry point and argument parsing
│ ├── agent.js ← Core agent loop (send → wait → parse → execute)
│ ├── browser.js ← Playwright controller for yiyan.baidu.com
│ ├── server.js ← HTTP server for process communication (v1.5.0+)
│ ├── client.js ← HTTP client to forward tasks (v1.5.0+)
│ ├── tools.js ← All 15 filesystem and shell tools
│ ├── parser.js ← Extracts tool calls from AI responses (6 strategies)
│ ├── prompt.js ← System prompt and conversation history manager
│ ├── config.js ← Configuration loader (global + per-project)
│ ├── logger.js ← ANSI-colored terminal output
│ ├── calibrate.js ← DOM selector inspector / auto-fix tool
│ └── postinstall.js ← Auto-downloads Chromium after npm install
├── LICENSE
├── README.md
└── package.json🤝 Contributing
Contributions are very welcome — this project is in active development and there's plenty of room to grow.
Setting up locally
git clone https://github.com/YOUR_USERNAME/yiyan-browser-agent
cd yiyan-browser-agent
npm install
npx playwright install chromium
node src/index.js --interactiveAreas that need work
- 🧪 Tests — there are currently no automated tests; a test suite would be a great contribution
- 🎨 UI selector resilience — Yiyan updates their UI occasionally; better selector strategies are welcome
- 🔌 More tools — image generation, browser control, database tools, etc.
- 🌐 Other AI frontends — adapting the browser layer to work with other free AI chats
- 📝 Better error messages — making failures easier to diagnose
How to contribute
- Fork the repo
- Create a branch:
git checkout -b feature/my-improvement - Make your changes
- Open a Pull Request with a clear description
Please keep PRs focused — one feature or fix per PR makes review much faster.
Reporting bugs
Open an issue on GitHub with:
- What you ran
- What you expected
- What actually happened
- Output of
yiyan-agent --debug "your task"if relevant
⚠️ Disclaimer
This project automates a web browser to interact with yiyan.baidu.com. Automating web UIs may violate the terms of service of the website being automated. Use this tool for personal and development purposes only. The authors take no responsibility for account suspensions or other consequences of use.
📄 License
MIT — see LICENSE for details.
Built with Playwright · Powered by Yiyan (文心一言) · Free forever
If this project helped you, consider giving it a ⭐ on GitHub!
