gologin-web-access
v0.3.5
Published
Unified web access CLI for developers and AI agents to read and interact with the web using the GoLogin Scraping API and Cloud Browser.
Maintainers
Readme
Gologin Web Access
Gologin Web Access lets developers and AI agents read and interact with the web using GoLogin Scraping API and Gologin Cloud Browser.
This is a unified web access layer, not just a scraping tool and not just a browser automation tool.
- Read the web through stateless extraction APIs
- Interact with the web through stateful cloud browser sessions
- Carry Gologin’s browser-side strengths into those workflows: profiles, identity-aware browser sessions, cloud browser infrastructure, and Gologin’s profile/proxy stack when you run against a configured profile
- Manage common GoLogin profile/proxy API operations without leaving the CLI: cloud usage, cloud profile start/stop, profile cookies, fingerprint refresh, managed proxies, and user-agent updates
Package name and binary are the same:
- npm package:
gologin-web-access - command:
gologin-web-access
What It Unifies
Gologin Web Access combines two existing product surfaces behind one CLI:
- Scraping API Stateless read and extraction. Best when you want page content quickly without maintaining a browser session.
- Cloud Browser Stateful interaction. Best when you need navigation, clicks, typing, screenshots, or multi-step flows that persist across commands.
The point of the unified CLI is that both modes live in one product with one command surface and one config model, while still being honest about which credential powers which workflow. Recommended setup is still to configure both credentials up front so agents do not stop to ask for missing keys mid-task.
Command Groups
Quick Picks
readfor "read this docs page/article" or "tell me what is on this page"scrape-textfor plain text from one known page when you do not need headings/links metadatascrape-jsonfor structured title, description, headings, and links from one known pagebatch-scrapefor many known URLs at once; add--output <path>when the JSON may be large and add--strictonly if partial success should fail the command
Scraping / Read
These commands use GoLogin Scraping API:
gologin-web-access scrape <url>gologin-web-access read <url> [--format text|markdown|html] [--source auto|scraping|browser]gologin-web-access scrape-markdown <url> [--source auto|scraping|browser]gologin-web-access scrape-text <url> [--source auto|scraping|browser]gologin-web-access scrape-json <url> [--fallback none|browser]gologin-web-access batch-scrape <url...> [--format html|markdown|text|json] [--fallback none|browser] [--source auto|scraping|browser] [--only-main-content] [--retry <n>] [--backoff-ms <ms>] [--summary] [--output <path>] [--strict]gologin-web-access batch-extract <url...> --schema <schema.json> [--source auto|scraping|browser] [--retry <n>] [--backoff-ms <ms>] [--summary] [--output <path>]gologin-web-access search <query> [--limit <n>] [--country <cc>] [--language <lang>] [--source auto|scraping|browser]gologin-web-access map <url> [--limit <n>] [--max-depth <n>] [--concurrency <n>] [--strict]gologin-web-access crawl <url> [--format html|markdown|text|json] [--limit <n>] [--max-depth <n>] [--only-main-content] [--strict]gologin-web-access crawl-start <url> ...gologin-web-access crawl-status <jobId>gologin-web-access crawl-result <jobId>gologin-web-access crawl-errors <jobId>gologin-web-access extract <url> --schema <schema.json> [--source auto|scraping|browser]gologin-web-access change-track <url> [--format html|markdown|text|json]gologin-web-access batch-change-track <url...> [--format html|markdown|text|json] [--retry <n>] [--backoff-ms <ms>] [--summary] [--output <path>]gologin-web-access parse-document <url-or-path>gologin-web-access run <runbook.json>gologin-web-access batch <runbook.json> --targets <targets.json>gologin-web-access jobsgologin-web-access job <jobId>
Use these when you want stateless page retrieval or extracted content.
Browser / Interact
These commands use Gologin Cloud Browser through the local daemon-backed agent layer:
gologin-web-access open <url> [--profile <id>]gologin-web-access search-browser <query> [--profile <id>]gologin-web-access scrape-screenshot <url> [path] [--profile <id>]gologin-web-access tabsgologin-web-access tabopen [url]gologin-web-access tabfocus <index>gologin-web-access tabclose [index]gologin-web-access snapshotgologin-web-access click <ref>gologin-web-access dblclick <ref>gologin-web-access focus <ref>gologin-web-access type <ref> <text>gologin-web-access fill <ref> <text>gologin-web-access hover <ref>gologin-web-access select <ref> <value>gologin-web-access check <ref>gologin-web-access uncheck <ref>gologin-web-access press <key> [target]gologin-web-access scroll <direction> [pixels]gologin-web-access scrollintoview <ref>gologin-web-access wait <target|ms>gologin-web-access get <kind> [target]gologin-web-access backgologin-web-access forwardgologin-web-access reloadgologin-web-access find ...gologin-web-access cookies [--output <path>] [--json]gologin-web-access cookies-import <cookies.json>gologin-web-access cookies-cleargologin-web-access storage-export [path] [--scope <local|session|both>]gologin-web-access storage-import <storage.json> [--scope <local|session|both>] [--clear]gologin-web-access storage-clear [--scope <local|session|both>]gologin-web-access eval <expression>gologin-web-access upload <ref> <file...>gologin-web-access pdf <path>gologin-web-access screenshot <path>gologin-web-access closegologin-web-access sessionsgologin-web-access current
Use these when you need state, interaction, or multi-step browser flows.
GoLogin API Helpers
These commands use the GoLogin REST API directly through GOLOGIN_TOKEN. They do not require Scraping API and do not start the browser daemon:
gologin-web-access cloud-usage --profile <profileId> | --workspace <workspaceId> [--days <1-30>] [--json]gologin-web-access profile-cloud start <profileId> [--json]gologin-web-access profile-cloud stop <profileId> [--json]gologin-web-access profile-cookies export <profileId> [--output <path>] [--json]gologin-web-access profile-cookies import <profileId> <cookies.json> [--clean] [--json]gologin-web-access profile-fingerprint refresh <profileId...> [--json]gologin-web-access profile-proxy list [--page <n>] [--json]gologin-web-access profile-proxy trafficgologin-web-access profile-proxy add-gologin <profileId> --country <cc> [--city <city>] [--type residential|mobile|dc] [--json]gologin-web-access profile-ua latest [--os lin|mac|win|android|android-cloud] [--json]gologin-web-access profile-ua update <profileId...> [--all-profiles] [--workspace <id>] [--json]
Use these when an agent needs GoLogin account/profile operations and would otherwise drop into raw REST calls or SDK code.
When To Use scrape vs browser
- Use
scrapecommands when you need page content, extracted text, markdown, or simple structured output. - Use
readas the default for docs and article reading when you want one high-level main-content command rather than choosing HTML/text/markdown yourself. - Use
scrape-textwhen you already know you want plain text. - Use
scrape-jsonwhen you want structured metadata and headings instead of full prose. - Use
searchwhen you need web discovery or SERP results before deciding what to scrape. It now tries multiple search paths automatically, validates that the response is a real SERP, and reuses a short local cache for repeated queries. - Use
mapwhen you need internal link discovery or a site inventory. - Use
crawlwhen you need multi-page read-only extraction across a site. - Use
crawl-startpluscrawl-statusandcrawl-resultwhen the crawl should run detached. - Use
extractwhen you want deterministic structured output from CSS selectors rather than generic page summaries. - Use
batch-extractwhen the same selector schema should run across many known URLs. - Use
change-trackwhen you want local change detection against the last stored snapshot of a page. - Use
batch-change-trackwhen you want to monitor a watchlist of pages in one pass. - Use
parse-documentwhen the source is a PDF, DOCX, XLSX, HTML, or local document path instead of a normal HTML page. - Use browser commands when you need clicks, forms, navigation, screenshots, sessions, or logged-in/profile-backed flows.
- Use GoLogin API helper commands when you need to attach managed proxy traffic, export/import profile cookies, refresh fingerprints, update user agents, inspect usage, or start/stop a cloud profile.
- Use browser commands when you need ref-based interaction, uploads, PDFs, semantic find flows, keyboard control, or a browser-visible search journey.
- Use
runandbatchwhen you want reusable workflows or multi-target execution on top of the CLI surface. - Use
scrapewhen stateless speed matters more than interaction. - Use browser commands when the site requires state, continuity, or real browser behavior.
Why This Is Not Just A Read-Only Crawler
The read layer matters, but this product is broader than a Firecrawl-like “read the page” use case.
What makes Gologin Web Access different is the ability to move from stateless extraction into stateful browser interaction without leaving the CLI:
- Browser sessions can run through Gologin Cloud Browser instead of a local one-off browser process.
- Browser workflows can use a Gologin profile via
--profileorGOLOGIN_DEFAULT_PROFILE_ID. - That gives the CLI access to Gologin’s identity/profile model and session layer, instead of stopping at raw fetches.
- When a configured profile carries proxy settings, those browser-side capabilities come from the Gologin browser stack rather than from a separate scraping-only pipeline.
This README only documents what the current CLI actually implements. It does not claim extra browser capabilities beyond the commands listed above.
Command Structure Choice
The current CLI keeps commands flat:
gologin-web-access scrape ...gologin-web-access scrape-markdown ...gologin-web-access open ...gologin-web-access snapshot
This is clearer right now than introducing a browser namespace such as gologin-web-access browser open.
Why:
- The command surface is still compact.
- Flat commands are shorter for both humans and AI agents.
- The read vs interact split is already explicit through the command names and documentation.
If the browser surface grows substantially later, a nested namespace may become worth adding. For the current product, flat commands are simpler.
Credentials And Config
This CLI uses two different GoLogin credentials on purpose, because the underlying products are different.
GOLOGIN_SCRAPING_API_KEYRequired for Scraping / Read commands.GOLOGIN_TOKENRequired forgologin-web-access open, GoLogin API helper commands, and profile validation ingologin-web-access doctor.GOLOGIN_DEFAULT_PROFILE_IDOptional default profile for browser flows.GOLOGIN_DAEMON_PORTOptional local daemon port for browser workflows.
Recommended full setup for agents is to configure both GOLOGIN_SCRAPING_API_KEY and GOLOGIN_TOKEN before starting work, even if the current task looks read-only or browser-only.
Missing-key errors are command-group specific. Example:
Missing GOLOGIN_SCRAPING_API_KEY. This is required for scraping commands like \gologin-web-access scrape`.`
Environment variables are the primary configuration mechanism:
export GOLOGIN_SCRAPING_API_KEY="wu_..."
export GOLOGIN_TOKEN="gl_..."
export GOLOGIN_DEFAULT_PROFILE_ID="profile_123"
export GOLOGIN_DAEMON_PORT="4590"If you do not want to source ~/.zprofile in every shell, run:
gologin-web-access config initUseful variants:
gologin-web-access config init --scraping-api-key wu_... --token gl_...
gologin-web-access config init --web-unlocker-key wu_... --token gl_... # legacy aliasThat writes ~/.gologin-web-access/config.json once and the CLI will keep reading it on later runs.
By default config init also validates both keys immediately so you find bad credentials during setup instead of on the first real request. Use --no-validate only when you intentionally want an offline write.
You can also write a minimal config file at ~/.gologin-web-access/config.json:
{
"scrapingApiKey": "wu_...",
"cloudToken": "gl_...",
"defaultProfileId": "profile_123",
"daemonPort": 4590
}Gologin Web Access will also read the older path ~/.gologin-web/config.json if it already exists, but new config writes go to ~/.gologin-web-access/config.json.
Backward-compatible aliases are also accepted for existing setups:
GOLOGIN_WEBUNLOCKER_API_KEYGOLOGIN_CLOUD_TOKENGOLOGIN_PROFILE_ID
Useful config commands:
gologin-web-access version
gologin-web-access config init
gologin-web-access config show
gologin-web-access doctordoctor reports the embedded Cloud Browser runtime bundled inside this package, whether the local daemon is reachable, and whether the recommended two-key setup is complete.
Install
npm install -g gologin-web-accessQuickstart
Read A Page
export GOLOGIN_SCRAPING_API_KEY="wu_..."
gologin-web-access scrape https://example.com
gologin-web-access read https://docs.browserbase.com/features/stealth-mode
gologin-web-access scrape-markdown https://example.com/docs
gologin-web-access scrape-text https://docs.browserbase.com/features/stealth-mode
gologin-web-access scrape-json https://example.com --fallback browser
gologin-web-access batch-scrape https://docs.browserbase.com/features/contexts https://docs.browserbase.com/features/proxies --format text --only-main-content --summary
gologin-web-access batch-extract https://example.com https://www.iana.org/help/example-domains --schema ./schema.json --summary --output ./artifacts/extract.json
gologin-web-access search "gologin antidetect browser" --limit 5
gologin-web-access search "gologin antidetect browser" --limit 5 --source auto
gologin-web-access map https://example.com --limit 50 --max-depth 2
gologin-web-access crawl https://docs.browserbase.com --format text --limit 20 --max-depth 2 --only-main-content
gologin-web-access crawl-start https://example.com --limit 20 --max-depth 2
gologin-web-access extract https://example.com --schema ./schema.json
gologin-web-access change-track https://example.com --format markdown
gologin-web-access batch-change-track https://example.com https://example.org --format text --summary --output ./artifacts/watchlist.json
gologin-web-access parse-document ./example.pdfInteract With A Site
export GOLOGIN_TOKEN="gl_..."
export GOLOGIN_DEFAULT_PROFILE_ID="profile_123"
gologin-web-access open https://example.com
gologin-web-access tabs
gologin-web-access snapshot
gologin-web-access click e3
gologin-web-access type e5 "search terms"
gologin-web-access wait 1500
gologin-web-access get title
gologin-web-access eval "document.title"
gologin-web-access cookies --output ./cookies.json
gologin-web-access storage-export ./storage.json
gologin-web-access screenshot ./page.png
gologin-web-access current
gologin-web-access closeManage Profiles And Proxies
export GOLOGIN_TOKEN="gl_..."
gologin-web-access cloud-usage --profile profile_123
gologin-web-access profile-proxy add-gologin profile_123 --country us --type residential
gologin-web-access profile-proxy traffic
gologin-web-access profile-cookies export profile_123 --output ./cookies.json
gologin-web-access profile-fingerprint refresh profile_123
gologin-web-access profile-ua latest --os mac
gologin-web-access profile-ua update profile_123Search In A Real Browser
export GOLOGIN_TOKEN="gl_..."
gologin-web-access search-browser "gologin antidetect browser"
gologin-web-access snapshot -iStructured Output And Retry Controls
scrape-markdownandscrape-textnow default to--source auto: they start with Scraping API, isolate the most readable content block, and can auto-retry with Cloud Browser when the output still looks like JS-rendered docs chrome.readis the shortest path for "look at this docs page" work: it targets the most readable content block and defaults to--format text --source auto.scrape-markdownandscrape-textalso accept--source scrapingand--source browserwhen you want to force one path.--source unlockerremains as a legacy alias.extractnow accepts--source auto|scraping|browserand returnsrenderSource, fallback flags, and request metadata with the extracted JSON.batch-extractreuses the same extraction path across many URLs and returns one structured result per URL, including request and fallback metadata. Add--output <path>to save the full array directly.scrape-jsonnow returns both a flatheadingsarray andheadingsByLevelbuckets forh1throughh6.scrape-json --fallback browseris available for JS-heavy pages where stateless extraction returns weak heading data.scrape-jsonnow also classifies the page outcome asok,empty,incomplete,authwall,challenge,blocked, orcookie_wall, and includesnextActionHintwhen the result is weak or gated.scrape,scrape-markdown,scrape-text,scrape-json, andbatch-scrapeaccept--retry,--backoff-ms, and--timeout-ms.batch-scrape --only-main-contentlets markdown, text, and html batch runs use the same readable-content isolation path asread.crawl --only-main-contentuses the same readable-fragment extraction strategy for html, markdown, and text crawl output, but stays on the stateless Scraping API path.batch-scrape --summaryprints a one-line success/failure summary tostderrafter the JSON payload.batch-scrapenow returns exit code0on partial success by default and only fails the command when every URL failed. Add--strictif any single failed URL should make the whole batch exit non-zero.batch-scrape --output <path>writes the full JSON to disk so shells and agent consoles cannot truncate a large payload silently.batch-scrape --format jsonnow returns the same structured scrape envelope asscrape-json, includingrenderSource,fallbackAttempted,fallbackUsed, andrequest.attemptCount/retryCount/attempts.batch-scrape --only-main-contentnow propagatesoutcome,outcomeReason,nextActionHint, and fallback metadata per URL so agents can tell "weak page" from "gated page" without scraping log text.scrape-jsonnow surfaces explicitBLOCKED_PAGEfailures when structured output clearly matches a challenge or block page, instead of silently looking like a valid empty result.searchnow returnsrequestedLimit,returnedCount,warnings,cacheTtlMs, and per-resultposition.searchmay return fewer results than the requested--limitwhen the upstream SERP contains fewer valid results; inspectreturnedCount,warnings, andattempts.change-tracknow accepts--retry,--backoff-ms, and--timeout-ms, and JSON output includes request metadata.batch-change-tracktracks many pages in one pass and reports per-URLnew|same|changedstatus plus a summary line when--summaryis used. Add--output <path>to save the full watchlist result directly.
Reusable Workflows
gologin-web-access run ./examples/runbook.json --session s1
gologin-web-access batch ./examples/runbook.json --targets ./examples/targets.json --concurrency 2
gologin-web-access jobssnapshot prints refs such as e1, e2, e3. Those refs stay valid until the page changes or you take a new snapshot.
map and crawl now return status: ok|partial|failed. By default, partial results stay usable and do not exit non-zero. Add --strict when any failed page should fail the command.
Product Boundaries
Gologin Web Access still has two runtime layers:
- Scraping API for stateless read and extraction
- Cloud Browser for stateful interaction
But both are now shipped inside the same package and the same repository. One install gives you the full read layer and the full browser/session layer.
Development
npm install
npm run build
npm run typecheck
npm testPublish
npm publish --access publicPrepublish checks run automatically through prepublishOnly.
