@clue-ai/frontend-sdk

v1.0.10

Published

a month ago

Clue Frontend SDK — OTel-based observation-source event capture for frontend apps

0High
0Medium
0Low

namikawa07

@clue-ai/frontend-sdk

Clue browser ingest SDK.

Minimal integration

After pnpm add @clue-ai/frontend-sdk, the intended Next.js integration is:

import clue from "@clue-ai/frontend-sdk";

clue.init({
  endpoint: process.env.NEXT_PUBLIC_CLUE_API_BASE_URL!,
  projectKey: process.env.NEXT_PUBLIC_CLUE_PROJECT_KEY!,
});

clue.identify(user.id, { name: user.name });
clue.group("organization", organization.id, { name: organization.name });

For Vite, use import.meta.env.VITE_CLUE_API_BASE_URL and import.meta.env.VITE_CLUE_PROJECT_KEY. For React apps that use the Create React App env contract, use process.env.REACT_APP_CLUE_API_BASE_URL and process.env.REACT_APP_CLUE_PROJECT_KEY. Other frontend frameworks should use the public env names generated by Clue setup.

projectKey is the public ingest key. The API resolves authoritative tenantId, projectId, and dev/prod ProjectService from that key. endpoint is the Clue API base URL. The SDK appends /api/v1/ingest/browser-tokens and the canonical browser observation batch path internally.

Responsibility boundary

SDK captures browser signals and applies local minimization where the signal is obviously high-risk.
API ingest privacy gate is authoritative for final allow/deny/unmask policy.
Worker-side sanitize is defense-in-depth only.
SDK sends projectKey, schemaVersion, sdkType, sdkVersion, stable keys, and minimized capture payloads.
SDK does not authoritatively stamp tenant_id, project_id, environment_id, or final archive-safe request/response payloads.

The frontend SDK does not own final archive-safe allowlist decisions.

Default privacy/minimization behavior

The frontend SDK avoids shipping obvious high-risk plaintext and keeps the MVP default event surface intentionally narrow.

input_change.value -> local value metadata only
selection_text -> metadata only
raw selector/path -> never serialized; stable key only

Current MVP defaults:

successful network emits one request_finished
failed network emits one request_failed
standard action capture emits element_clicked, form_submitted, input_committed, toggle_changed, selection_committed, file_selected, and drag_drop_completed
normal batching flushes when the queued payload reaches the 48KB send threshold, every 15 seconds by default, or when the page is leaving
event type does not change the delivery rule in MVP

Stable key precedence:

data-testid
data-qa
safe name
safe aria-label
structural fallback

data-clue-id and data-clue-key are not part of the MVP stable-key contract. Captured stable keys are best-effort evidence, not authoritative business meaning.

Sampling and cost guards

sampling.sessionSampleRate controls whether a session emits normal capture
oversized events degrade in stages before final drop:
- no raw payload body
- metadata/schema only
- shell only
oversized batches stay below the configured payload hard max

Flush reliability

The SDK flushes on:

visibilitychange -> hidden
pagehide
beforeunload

The page-leave flush is best effort. The SDK does not persist unsent events in localStorage or IndexedDB in MVP. Lifecycle session URLs strip query strings and hashes before they enter the raw SDK event payload.

Keepalive flush contract

unload / hidden flush uses authenticated fetch with the keepalive hint
unload flush only sends when a valid browser token is already cached
events are not sent through navigator.sendBeacon

Degraded event contract

Downstream ingest must accept these as normal input:

no raw payload body
metadata + schema only
shell-only event

Privacy and PII handling

The SDK applies a layered privacy model:

1. Hard-deny: PII / secrets are stripped before transport

The SDK strips a built-in set of property keys before the event leaves the browser. These keys cannot be unmasked through allowedValuePaths — even an explicit body.email allowlist does not bypass the hard-deny.

Hard-deny categories (case- and separator-insensitive — userEmail, user-email, USER_EMAIL, email_address all match):

Auth credentials: authorization, cookie, set-cookie, password, passwd, secret, token, access_token, refresh_token, session, session_token, api_key, apikey, private_key
PII categories: email, phone, credit_card, ssn

This list is a strict superset of the server-side ingest hard-deny (@clue/shared INGEST_HARD_DENY_KEYS), enforced by CI parity tests in all three SDKs.

2. Default-masked: arbitrary values are masked unless allowlisted

Properties that pass the hard-deny gate are still masked by default. The SDK emits a fingerprint or shape-only digest, not the raw value.

To capture a specific field in plaintext, two things must both be true:

The field path appears in the SDK call's allowedValuePaths (caller opt-in at SDK init time).
The same property key appears in the project's server-side project service observationAnalysisPropertyAllowlist.

If condition 1 holds but condition 2 does not, the value is stored in the raw archive only — facts_json.analysisProperties.projected will be empty and the field will not appear in Clue's analysis UI.

3. Operational consequence: configure server allowlist before launch

Customers who launch without setting observationAnalysisPropertyAllowlist on their resolved project service will see custom event names land in observation_type but no projected properties in dashboards. The backend source archive keeps masked/raw-only evidence for debugging, while analysis projection remains empty until the server allowlist includes the field. Configure the server allowlist via the Clue admin UI or project service allowlist API after taking a privacy review of which property names you want to project for analytics.

Architecture boundary

The frontend SDK is a browser wrapper around capture, privacy minimization, correlation propagation, and transport. It sends source-signal envelopes such as sdk_source_signal_observed; the Clue backend ingest core performs final classification, unknown diagnostic handling, backend request correlation, and ClickHouse/worker input shaping.

browser capture / OTel network span
  -> frontend SDK source-signal envelope
  -> /api/v1/ingest/browser
  -> apps/api/src/modules/ingest core

Rules:

Keep browser code limited to raw facts, correlation context, standard OTel attributes, local minimization, and transport.
Do not add frontend-only classification, source-hint strategy, semantic DOM annotations, or custom fallback behavior.
Add classifier behavior in the backend ingest core so frontend-originated requests, Node.js, Python, and future SDK wrappers receive the same behavior.
Preserve interaction_id, request_span_id, request_id, and trace_id whenever they are available. User-action linkage is an ingest contract, not a display-only field.

Explicit Event Emit: `clue.track`

When DOM auto-capture cannot infer the operation (e.g. a custom keyboard shortcut, an API-only background sync, or a multi-step UI flow), the host app may emit an explicit semantic action:

import clue from "@clue-ai/frontend-sdk";

clue.track("report.create", { source: "shortcut", result: "success" });

clue.track emits a custom_emitted event for product actions that are not covered by DOM auto-capture. Keep properties privacy-safe and stable; the worker derives higher-level semantics downstream. Event names must match ^[A-Za-z0-9_.:-]{1,128}$; invalid names are dropped with a public warning.

Identity Lifecycle: Organization

The frontend SDK exposes the MVP identity lifecycle APIs that the worker uses to preserve anonymous → identified user → organization evidence:

clue.identify("user_42", { plan: "pro" });
clue.group("organization", "org_3", { name: "Acme Inc." });
clue.reset();

MVP does not expose Clue account or workspace as public group types. If the customer product calls its company concept an account, workspace, tenant, or team, pass that stable company id as clue.group("organization", ...). Blank or non-string identify user ids and organization group ids are dropped with a public warning and do not update SDK identity state.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@clue-ai/frontend-sdk

Minimal integration

Responsibility boundary

Default privacy/minimization behavior

Sampling and cost guards

Flush reliability

Keepalive flush contract

Degraded event contract

Privacy and PII handling

1. Hard-deny: PII / secrets are stripped before transport

2. Default-masked: arbitrary values are masked unless allowlisted

3. Operational consequence: configure server allowlist before launch

Architecture boundary

Explicit Event Emit: clue.track

Identity Lifecycle: Organization

Explicit Event Emit: `clue.track`