vision-navigator
v1.0.0
Published
AI-powered browser testing and automation framework
Downloads
13
Maintainers
Readme
Vision Navigator
A raw CDP (Chrome DevTools Protocol) browser automation tool powered by local LLMs (Ollama) or OpenRouter. It natively handles navigation, simplified DOM extraction, and browser interaction via a custom CDP driver, avoiding heavy dependencies like Puppeteer or Playwright.
Features
- Custom CDP Driver: Communicates directly with Chromium over WebSockets.
- Local AI Powered: Uses
qwen2.5:3bvia Ollama by default, capable of running entirely locally. - Web UI Dashboard: Includes a sleek dashboard to submit YAML workflows and view step-by-step results, screenshots, and logs.
- AI Diagnostics: Automatically monitors console errors, network issues, and performance metrics, and uses the AI to provide usability/performance suggestions.
- CLI Interface: Can be run as an NPM package/CLI to execute workflows headlessly and get a pass/fail report.
🚀 How to Run (3 Ways)
Vision Navigator is flexible and can be run in three different modes depending on your needs.
Environment Configuration (.env)
You can configure the model, LLM provider, and storage settings using environment variables. Create a .env file in the root of the vision_navigator_ts directory (or wherever you are running the CLI from):
# Default is qwen2.5:3b via local Ollama
OLLAMA_URL=http://localhost:11434
MODEL=qwen2.5:3b
# If using OpenRouter instead of local Ollama
OPENROUTER_API_KEY=your_api_key_here
OPENROUTER_MODEL=google/gemini-2.0-flash-exp:free
# Storage configurations (optional)
POCKETBASE_URL=http://127.0.0.1:8091
MINIO_ENDPOINT=127.0.0.1
MINIO_PORT=9002
MINIO_ACCESS_KEY=minioadmin
MINIO_SECRET_KEY=minioadmin
MINIO_BUCKET=store-runs1. As a Standalone NPM CLI
If you just want to run tests locally via your terminal, you can install and use it as an NPM package.
# Install dependencies and build
npm install
npm run build
# Link the package globally
npm link
# Run a single workflow
vision-navigator run ./workflows/riskely-test.yaml
# Run a directory of tests
vision-navigator test ./workflows
# Start the built-in web server (runs on port 8000)
vision-navigator serve2. Standalone Docker Container
If you want an isolated environment without installing Node.js or Chromium on your host machine, you can run the tool as a standalone Docker container.
# Build the Docker image
docker build -t vision-navigator .
# Run a single workflow (mount your local workflows directory)
docker run --rm -v $(pwd)/workflows:/usr/src/app/workflows vision-navigator npm run start -- run workflows/riskely-test.yaml
# Start the Web UI only
docker run -p 8000:8000 vision-navigatorNote: If you use a local Ollama instance on your host, make sure to pass the correct OLLAMA_URL environment variable (e.g. -e OLLAMA_URL=http://host.docker.internal:11434).
3. Full Stack with Docker Compose (Recommended for Self-Hosting)
The most comprehensive way to run Vision Navigator. This spins up the full environment including the main server, a dedicated Ollama container (pre-configured with qwen2.5:3b), PocketBase (for test run history), and Minio (for storing screenshot artifacts).
- Ensure you have Docker and Docker Compose installed.
- Run the following command in the root directory:
docker-compose up -d --buildThis will automatically:
- Start Vision Navigator Web UI at
http://localhost:8000 - Start PocketBase at
http://localhost:8091 - Start Minio at
http://localhost:9002 - Start Ollama and automatically pull the
qwen2.5:3bmodel.
Running CLI tests within Docker Compose:
# Run a single workflow
docker exec -it vision-navigator npm run start -- run workflows/riskely-test.yaml
# Run all workflows in a directory
docker exec -it vision-navigator npm run start -- test workflows📝 Writing Workflows
Workflows are written in simple YAML format.
steps:
- instruction: "Navigate to http://quotes.toscrape.com/"
- instruction: "Click the 'Login' link near the top right"
- instruction: "Type 'admin' into the username field"
- instruction: "Type 'password123' into the password field"
- instruction: "Click the Login button"