@aimount/browser

v0.1.0-alpha.0

Published

22 days ago

Browser snapshot and scroll tools for embedded aimount integrations

0High
0Medium
0Low

akhmanov

@aimount/browser

Shared browser observation tools for embedded aimount integrations.

This README freezes the important stage-1 architecture decisions that led to the first package cut, so later work does not need to reconstruct them from chat history.

Why this package exists

aimount is an embedded assistant platform. Host apps already register project-specific browser tools next to domain tools, as seen in senler. The reusable value here is not a second browser-side agent loop. The reusable value is a shared browser engine plus standard tools that fit the existing aimount tool/runtime model.

_ref/page-agent/ is a donor, not a template. The part worth borrowing is the browser controller / snapshot extraction idea. The full page-agent loop, extension transport, and action lifecycle are intentionally not copied into aimount.

v1 public scope

The first public cut is intentionally small.

Export createBrowserEngine(...).
Export createPageSnapshotTool(engine).
Export createPageScrollTool(engine).
Do not ship navigate or reload in v1.

The validated reason for cutting navigate and reload is complexity versus shared value. They drag in page transition recovery, durable restart handling, and future runtime-session semantics across tabs. Those concerns remain important, but they do not belong in the first public package cut.

Public wiring

The host owns engine creation. Tools are explicit and separate.

import {
  createBrowserEngine,
  createPageScrollTool,
  createPageSnapshotTool,
} from '@aimount/browser';

const browserEngine = createBrowserEngine({
  layers: {
    content: { selectors: ['#content'] },
    assistant: { selectors: ['#assistant'] },
  },
  defaultLayers: ['content'],
});

export const page_snapshot = createPageSnapshotTool(browserEngine);
export const page_scroll = createPageScrollTool(browserEngine);

Important: aimount tool ids come from export keys, not from an internal name field. If the host wants the tool ids to be page_snapshot and page_scroll, the host should export them under exactly those keys.

Shared engine contract

createBrowserEngine(...) exists because reads and actions must share one browser observer/controller state.

The engine is shared.
The host creates it once.
Each tool creator receives the same engine instance.
There is no hidden singleton.
AssistantWIProvider does not own browser engine lifecycle in v1.

This is the aimount analogue of borrowing the PageController boundary from page-agent without importing the rest of the agent runtime.

`page_snapshot` contract

Public API:

page_snapshot({ layers?: string[] })

Locked decisions:

View is viewport-only, not full-document.
Result shape is text + meta.
layers are named layer ids, not raw selectors.
Layer selector mapping is configured in the engine.
If layers are omitted, the engine uses configured default layers.
Unknown layer ids fail explicitly.
Visibility boundaries are integrator-provided. The package does not try to autodetect the assistant container or any other hidden zones.

Snapshot content rules:

Result shape stays text + meta, but text is now a structural browser-state snapshot.
The top-level text format is Page state: plus Visible structure:.
Keep visible interactive nodes even when they do not have a reliable name.
Include nearby readable context, not only control names.
Add coarse 3x3 zones: top-left, top-center, top-right, middle-left, center, middle-right, bottom-left, bottom-center, bottom-right.
Do not emit semantic overlays; container-level text blobs were removed because they created noisy evidence for exact UI claims.
Include snapshotId and freshness metadata so later action flows do not have to invent identity after the fact.

Item-level evidence rules:

Every meta.items[] entry includes name, nameSource, and nameStatus.
nameSource is one of text | aria-label | title | placeholder | value | unknown.
nameStatus is one of strong | weak | unknown.
Unknown or weakly named controls remain visible in the snapshot instead of being dropped.

Observation honesty rules:

meta.observation.totalInteractive counts all visible distinct actionable nodes.
meta.observation.weakInteractive counts items whose best name comes from a weak source.
meta.observation.unknownInteractive counts visible interactives with no reliable name.
meta.observation.exactUiClaims communicates whether the snapshot is safe for exact button/icon/label claims:
- safe when visible interactives are structurally preserved and meaningfully named,
- partial when rough guidance is okay but exact UI claims should be limited to strong items,
- unsafe when the page is too semantically weak for exact UI claims.

Safety rules:

Minimal built-in redaction is required.
Password values and obvious secret/token-like values should not leak into snapshot text.

`page_scroll` v1 contract

Public API:

page_scroll({ direction, screens? })

Locked v1 decisions:

Scope is page-only vertical scroll.
Container scroll is out of v1.
Unit name is screens, both externally and internally.
Fractional values are allowed.
Default amount is 0.75 screens.
Scrolling should be smooth/animated.
Result shape is nested: action metadata plus snapshot.
Returned snapshot always uses engine default layers.
Action tools do not accept custom layers in v1.

Readiness rules:

assistant-wi intentionally differs from page-agent here.
page-agent mostly does short waits and expects the model to call a later observe step.
In this package, the action tool itself waits heuristically, then returns the post-action snapshot publicly.
Readiness is heuristic + timeout.
Heuristics are action-specific, not one universal settle rule.
Budgets are per-action.
On timeout the tool returns best-effort state with an incompleteness flag rather than blocking forever.

Deferred and intentionally out of v1

These topics were discussed and are intentionally not in the first package cut:

`navigate` and `reload`

They were explored in detail, including a recovery design based on:

package-owned automatic recovery,
localStorage,
per-tab ownership,
toolCallId,
TTL expiry,
fail-closed cleanup.

That design context is still useful, but it is deferred rather than shipped in v1.

New-tab navigation

New-tab behavior is not the same thing as current-tab navigation. It likely needs explicit runtime session semantics such as fork/clone/handoff instead of silently reusing the same session.

This concern is tracked in the tasks backlog:

tasks/projects/browser-std-tools/tasks/model-new-tab-session-fork

Future `page_action`

The package name is browser, not page-snapshot, because the boundary should survive growth into future browser actions. Those actions are deliberately deferred until the shared snapshot model is stable.

Validation strategy

Playwright is mandatory for acceptance because the real browser is the source of truth for viewport, geometry, scroll, and overlay behavior.

The validation stack is:

fast unit tests for layer resolution, redaction, structural item preservation, honesty metadata, and nested action results;
Playwright acceptance tests for real browser behavior, including icon-only visible controls.

Donor notes from `page-agent`

Useful donor ideas:

browser-state extraction around a shared controller,
indexed interactive elements,
local readable context around actions,
visible-page guidance rather than raw DOM dumping,
preserving actionable structure even when semantics are weak.

Things intentionally not copied:

a second browser-side planner/agent loop,
extension-centric transport,
a separate public wait tool mental model,
weakly controlled execute-JS style behavior.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@aimount/browser

Why this package exists

v1 public scope

Public wiring

Shared engine contract

page_snapshot contract

page_scroll v1 contract

Deferred and intentionally out of v1

navigate and reload

New-tab navigation

Future page_action

Validation strategy

Donor notes from page-agent

`page_snapshot` contract

`page_scroll` v1 contract

`navigate` and `reload`

Future `page_action`

Donor notes from `page-agent`