@aimount/browser
v0.1.0-alpha.0
Published
Browser snapshot and scroll tools for embedded aimount integrations
Readme
@aimount/browser
Shared browser observation tools for embedded aimount integrations.
This README freezes the important stage-1 architecture decisions that led to the first package cut, so later work does not need to reconstruct them from chat history.
Why this package exists
aimount is an embedded assistant platform. Host apps already register project-specific browser tools next to domain tools, as seen in senler. The reusable value here is not a second browser-side agent loop. The reusable value is a shared browser engine plus standard tools that fit the existing aimount tool/runtime model.
_ref/page-agent/ is a donor, not a template. The part worth borrowing is the browser controller / snapshot extraction idea. The full page-agent loop, extension transport, and action lifecycle are intentionally not copied into aimount.
v1 public scope
The first public cut is intentionally small.
- Export
createBrowserEngine(...). - Export
createPageSnapshotTool(engine). - Export
createPageScrollTool(engine). - Do not ship
navigateorreloadin v1.
The validated reason for cutting navigate and reload is complexity versus shared value. They drag in page transition recovery, durable restart handling, and future runtime-session semantics across tabs. Those concerns remain important, but they do not belong in the first public package cut.
Public wiring
The host owns engine creation. Tools are explicit and separate.
import {
createBrowserEngine,
createPageScrollTool,
createPageSnapshotTool,
} from '@aimount/browser';
const browserEngine = createBrowserEngine({
layers: {
content: { selectors: ['#content'] },
assistant: { selectors: ['#assistant'] },
},
defaultLayers: ['content'],
});
export const page_snapshot = createPageSnapshotTool(browserEngine);
export const page_scroll = createPageScrollTool(browserEngine);Important: aimount tool ids come from export keys, not from an internal name field. If the host wants the tool ids to be page_snapshot and page_scroll, the host should export them under exactly those keys.
Shared engine contract
createBrowserEngine(...) exists because reads and actions must share one browser observer/controller state.
- The engine is shared.
- The host creates it once.
- Each tool creator receives the same engine instance.
- There is no hidden singleton.
AssistantWIProviderdoes not own browser engine lifecycle in v1.
This is the aimount analogue of borrowing the PageController boundary from page-agent without importing the rest of the agent runtime.
page_snapshot contract
Public API:
page_snapshot({ layers?: string[] })Locked decisions:
- View is viewport-only, not full-document.
- Result shape is
text + meta. layersare named layer ids, not raw selectors.- Layer selector mapping is configured in the engine.
- If
layersare omitted, the engine uses configured default layers. - Unknown layer ids fail explicitly.
- Visibility boundaries are integrator-provided. The package does not try to autodetect the assistant container or any other hidden zones.
Snapshot content rules:
- Result shape stays
text + meta, buttextis now a structural browser-state snapshot. - The top-level text format is
Page state:plusVisible structure:. - Keep visible interactive nodes even when they do not have a reliable name.
- Include nearby readable context, not only control names.
- Add coarse 3x3 zones:
top-left,top-center,top-right,middle-left,center,middle-right,bottom-left,bottom-center,bottom-right. - Do not emit semantic
overlays; container-level text blobs were removed because they created noisy evidence for exact UI claims. - Include
snapshotIdand freshness metadata so later action flows do not have to invent identity after the fact.
Item-level evidence rules:
- Every
meta.items[]entry includesname,nameSource, andnameStatus. nameSourceis one oftext | aria-label | title | placeholder | value | unknown.nameStatusis one ofstrong | weak | unknown.- Unknown or weakly named controls remain visible in the snapshot instead of being dropped.
Observation honesty rules:
meta.observation.totalInteractivecounts all visible distinct actionable nodes.meta.observation.weakInteractivecounts items whose best name comes from a weak source.meta.observation.unknownInteractivecounts visible interactives with no reliable name.meta.observation.exactUiClaimscommunicates whether the snapshot is safe for exact button/icon/label claims:safewhen visible interactives are structurally preserved and meaningfully named,partialwhen rough guidance is okay but exact UI claims should be limited to strong items,unsafewhen the page is too semantically weak for exact UI claims.
Safety rules:
- Minimal built-in redaction is required.
- Password values and obvious secret/token-like values should not leak into snapshot text.
page_scroll v1 contract
Public API:
page_scroll({ direction, screens? })Locked v1 decisions:
- Scope is page-only vertical scroll.
- Container scroll is out of v1.
- Unit name is
screens, both externally and internally. - Fractional values are allowed.
- Default amount is
0.75screens. - Scrolling should be smooth/animated.
- Result shape is nested: action metadata plus
snapshot. - Returned snapshot always uses engine default layers.
- Action tools do not accept custom
layersin v1.
Readiness rules:
assistant-wiintentionally differs frompage-agenthere.page-agentmostly does short waits and expects the model to call a later observe step.- In this package, the action tool itself waits heuristically, then returns the post-action snapshot publicly.
- Readiness is heuristic + timeout.
- Heuristics are action-specific, not one universal settle rule.
- Budgets are per-action.
- On timeout the tool returns best-effort state with an incompleteness flag rather than blocking forever.
Deferred and intentionally out of v1
These topics were discussed and are intentionally not in the first package cut:
navigate and reload
They were explored in detail, including a recovery design based on:
- package-owned automatic recovery,
localStorage,- per-tab ownership,
toolCallId,- TTL expiry,
- fail-closed cleanup.
That design context is still useful, but it is deferred rather than shipped in v1.
New-tab navigation
New-tab behavior is not the same thing as current-tab navigation. It likely needs explicit runtime session semantics such as fork/clone/handoff instead of silently reusing the same session.
This concern is tracked in the tasks backlog:
tasks/projects/browser-std-tools/tasks/model-new-tab-session-fork
Future page_action
The package name is browser, not page-snapshot, because the boundary should survive growth into future browser actions. Those actions are deliberately deferred until the shared snapshot model is stable.
Validation strategy
Playwright is mandatory for acceptance because the real browser is the source of truth for viewport, geometry, scroll, and overlay behavior.
The validation stack is:
- fast unit tests for layer resolution, redaction, structural item preservation, honesty metadata, and nested action results;
- Playwright acceptance tests for real browser behavior, including icon-only visible controls.
Donor notes from page-agent
Useful donor ideas:
- browser-state extraction around a shared controller,
- indexed interactive elements,
- local readable context around actions,
- visible-page guidance rather than raw DOM dumping,
- preserving actionable structure even when semantics are weak.
Things intentionally not copied:
- a second browser-side planner/agent loop,
- extension-centric transport,
- a separate public
waittool mental model, - weakly controlled execute-JS style behavior.
