npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

browserground

v0.3.0

Published

Local UI-grounding specialist for hybrid AI agents. Screenshot + text target → strict JSON bbox. Qwen3-VL-2B LoRA, MLX 4-bit + GGUF + Ollama builds. Daemon, HTTP server, batch, confidence, eval. Drop-in for Claude Code, Codex, browser-use, Skyvern. Cuts G

Readme


Why this exists — the hybrid AI argument

Today, most AI agents route every screenshot to a cloud frontier model (GPT-4V, Claude Vision, Gemini) just to find click coordinates. That's a $0.01–0.05 multimodal call adding 800ms–2s of latency, repeated 20–50× per agent run. Cost and latency compound. Screenshots full of private UI leave your machine.

A general 200B-parameter LLM is overkill for "where is the Submit button?" — that's a narrow vision task. The right shape is a hybrid one: cheap fast specialist local models for the dedicated tasks they handle better, and a cloud LLM only for the planning and reasoning it's uniquely good at.

That's exactly what browserground is — the click-grounding specialist you drop in next to your Claude / GPT-5 / Codex agent.

| | Pure-cloud agent | Hybrid (+ browserground) | |---|---|---| | Per-screenshot cost | $0.01–0.05 | $0 | | Latency | 800ms–2s round-trip | ~1.5s MLX / ~1.8s transformers | | Tokens billed by cloud | 1500+ multimodal | ~40 text | | Screenshots leave machine | yes | no | | Rate limits | yes | no |

What you get

browserground parse screenshot.png --target "Submit button"
# {"bbox_2d": [344, 612, 478, 658]}

Strict-JSON bbox of the element to click. 100% format compliance on the eval set — no markdown fences, no <ref> tokens, parseable every time.

Install

npm install -g browserground

On first browserground parse, the model auto-downloads to ~/.cache/huggingface/. On Apple Silicon the MLX 4-bit build (1.8 GB) is preferred; elsewhere the LoRA on the Qwen3-VL-2B base (~4.3 GB).

Use

Single-shot

browserground parse screen.png --target "Submit button"

Daemon mode (model stays loaded — recommended for agents)

browserground serve &
browserground parse a.png --target "Chrome icon"
browserground parse b.png --target "the back arrow"
browserground stop

HTTP daemon (REST)

browserground serve --http :8401 &
curl -s -X POST localhost:8401/api/ground \
  -H 'Content-Type: application/json' \
  -d '{"image_path":"/abs/path/screen.png","target":"Submit button"}'

Batch mode

# Many targets on one image
browserground parse screen.png --targets queries.txt --jsonl

# JSON pairs file: [{"image":"a.png","target":"..."}, ...]
browserground parse --targets pairs.json --jsonl

Confidence + alternatives

browserground parse screen.png --target "Subscribe" --confidence --alternatives 2
# {"bbox_2d":[...], "confidence":0.92, "alternatives":[{"bbox_2d":[...]}, ...]}

Eval on your labeled data

browserground eval ./screenshots ./eval-targets.json --out report.json
# targets.json: [{"image":"a.png","target":"...","bbox":[x1,y1,x2,y2]}, ...]
# Report: accuracy, format-OK, p50/p95 latency.

Hook into your agent stack

Claude Code

mkdir -p .claude/skills/browserground
curl -sL https://raw.githubusercontent.com/renezander030/browserground/main/plugins/claude-code/SKILL.md \
  > .claude/skills/browserground/SKILL.md

Codex CLI

See plugins/codex/AGENTS.md.

browser-use

Drop-in Controller action — see plugins/browser-use/.

Skyvern

Local-first grounding with cloud fallback — see plugins/skyvern/.

Ollama

ollama pull renezander030/browserground
ollama run renezander030/browserground "Locate: Submit button" /path/to/screen.png

Python (no Node)

pip install "browserground[mlx]"            # Apple Silicon
pip install "browserground[transformers]"   # CUDA / CPU / MPS
from browserground import click_xy
x, y = click_xy("screen.png", "the back arrow")

Benchmark

ScreenSpot-v2 point-grounding accuracy (300 items, 100/split):

| Model | Params | Overall | Mobile | |---|---:|---:|---:| | GPT-5.4 (cloud frontier) ¹ | — | 85.4% | — | | browserground v0.3 | 2 B | 60.0% | 78.0% | | SeeClick | 9.6 B | 55.1% | — | | ShowUI-2B | 2 B | 75.5% | — | | UI-TARS-2B-SFT | 2 B | 89.5% | — |

¹ GPT-5.4 score is on the harder ScreenSpot-Pro benchmark (no public v2 number for the 2026 cloud generation).

When browserground beats UI-TARS-2B-SFT for your stack — even though UI-TARS scores higher overall: newer Qwen3-VL base, strict-JSON output (100% parseable, no regex), browser-focused training mix, CLI + npm + pip + Ollama distribution, designed as a hybrid-AI piece (not a standalone agent toolkit).

Limitations

  • Icon UI accuracy (~41%) lags text UI (~74%) — icons need more visual exposure in training
  • English-only training data
  • No mouse-action prediction (only location — pair with an action predictor for full computer-use loops)

Links

License

Apache 2.0.