cursor-buddy
v0.0.11
Published
AI-powered cursor companion for web apps
Maintainers
Readme
cursor-buddy
https://github.com/user-attachments/assets/def0876a-d63c-4e31-b633-9be3fb2b79b5
AI Agent that lives in your cursor, built for web apps. Push-to-talk voice assistant that can see your screen and point at things.
Customize its prompt, pass custom tools, choose between browser or server-side speech APIs, use any AI SDK models and customize the UI to fit your needs.
Features
- Push-to-talk voice input — Hold a hotkey to speak, release to send
- Browser-first live transcription — Realtime transcript while speaking, with server fallback
- DOM snapshot context — AI sees a token-efficient representation of your visible page structure
- Voice responses — Browser or server TTS, with optional streaming playback
- Cursor pointing — AI can point at UI elements it references
- Tool call bubbles — Visual feedback for tool execution with customizable display
- Voice interruption — Start talking again to cut off current response
- Framework agnostic — Core client written in Typescript, adapter-based architecture
- Customizable — CSS variables, custom components, headless mode
- Configurable — Choose any AI SDK models, equip the agent with tools, or modify the system prompt
Installation
npm install cursor-buddy
# or
pnpm add cursor-buddyQuick Start
1. Server Setup
Create an API route that handles chat, transcription, and TTS.
Keep transcriptionModel configured if you want browser transcription to fall
back to the server in auto mode. Keep speechModel configured if you want
server speech or browser speech fallback in auto mode.
// lib/cursor-buddy.ts
import { createCursorBuddyHandler } from "cursor-buddy/server"
import { openai } from "@ai-sdk/openai"
export const cursorBuddy = createCursorBuddyHandler({
model: openai("gpt-4o"),
speechModel: openai.speech("tts-1"),
transcriptionModel: openai.transcription("whisper-1"),
})Next.js App Router
// app/api/cursor-buddy/[...path]/route.ts
import { toNextJsHandler } from "cursor-buddy/server/next"
import { cursorBuddy } from "@/lib/cursor-buddy"
export const { POST } = toNextJsHandler(cursorBuddy)2. Client Setup
Add the <CursorBuddy /> component to your app.
// app/layout.tsx
import { CursorBuddy } from "cursor-buddy/react"
export default function RootLayout({ children }) {
return (
<html>
<body>
{children}
<CursorBuddy endpoint="/api/cursor-buddy" />
</body>
</html>
)
}That's it! Hold Ctrl+Alt to speak, release to send.
How It Works
flowchart LR
subgraph Input
A[Hold hotkey] --> B[Mic + Speech Recognition]
A --> C[Screenshot + DOM Snapshot]
end
subgraph Transcription
B --> D{Browser transcript?}
D -->|Yes| E[Use browser transcript]
D -->|No| F[Server transcription]
end
subgraph Processing
E --> G[Send to AI with context]
F --> G
C --> G
G --> H[AI Response]
H -->|point tool called| I[Animate cursor to @ID]
end
subgraph Output
H --> J[Speak response via TTS]
end- User holds the hotkey
- Microphone captures audio and browser speech recognition starts when available
- At the same time, a screenshot and token-efficient DOM snapshot of the viewport are captured in the background. This runs in parallel with speech capture to minimize latency
- User releases hotkey
- The client prefers the browser transcript; if it is unavailable or empty in
automode, the recorded audio is transcribed on the server - The already-captured screenshot + DOM snapshot are sent to the AI model. Each element has an
@ID(e.g.,@12) that the AI can reference. - AI responds with text and can optionally call the
pointtool to indicate an element on screen by its@IDfrom the DOM snapshot - Response is spoken in the browser or on the server based on
speech.mode, and can either wait for the full response or stream sentence-by-sentence based onspeech.allowStreaming - If the AI calls the point tool, the cursor animates to the target element's current position (it resolves the element from the snapshot registry and computes its center point)
- If user presses hotkey again at any point, current response is interrupted
DOM Snapshot
The DOM snapshot is a token-efficient representation of the visible page structure that gives the AI context about what's on screen.
When the user holds the hotkey, cursor-buddy traverses the visible DOM and builds a lightweight text representation. Each interactive or meaningful element is assigned a unique @ID that the AI can reference when pointing.
- Enables pointing — The AI can say "click the submit button @42" and the cursor will animate to that exact element
- Token efficient — Only visible, relevant elements are included (no hidden elements, scripts, or styles)
- Semantic context — The AI understands the page structure, not just pixels from the screenshot
For a simple login form, the snapshot might look like:
# viewport 1280x720
@20 body "Sign In Email Password Sign In Forgot password?" [x=0 y=0 w=1280 h=720]
@19 main "Sign In Email Password Sign In Forgot password?" [x=440 y=200 w=400 h=320]
@18 form "Sign In Email Password Sign In Forgot password?" [x=440 y=200 w=400 h=320]
@1 h1 "Sign In" [x=580 y=220 w=120 h=32]
@4 div "Email" [x=460 y=270 w=360 h=56]
@2 label "Email" [x=460 y=270 w=40 h=20]
@3 input [type="email"] [placeholder="Enter your email"] [x=460 y=294 w=360 h=32]
@7 div "Password" [x=460 y=340 w=360 h=56]
@5 label "Password" [x=460 y=340 w=64 h=20]
@6 input [type="password"] [placeholder="Enter your password"] [x=460 y=364 w=360 h=32]
@8 button "Sign In" [type="submit"] [x=460 y=420 w=360 h=40]
@9 a "Forgot password?" [href="/forgot"] [x=540 y=476 w=120 h=20]Each line contains: @ID tag "text content" [attributes] [bounding box]
The AI sees this alongside the screenshot. When it wants to guide the user to enter their email, it can call point(@3) and the cursor will animate to that input field.
What Gets Captured
| Included | Excluded |
|----------|----------|
| Visible elements in viewport | Hidden elements (display: none, visibility: hidden) |
| Interactive elements (buttons, inputs, links) | Script and style tags |
| Text content (truncated if long) | Elements outside viewport |
| Element attributes (type, placeholder, href) | Inline styles and classes |
| Semantic structure | Comment nodes |
Server Configuration
createCursorBuddyHandler({
// Required
model: LanguageModel, // AI SDK chat model
speechModel: SpeechModel, // Optional server TTS model
transcriptionModel: TranscriptionModel, // Optional server fallback for STT
// Optional
system: string | ((ctx) => string), // Custom system prompt
tools: Record<string, Tool>, // AI SDK tools
maxHistory: number, // Max conversation history (default: 10)
})Custom System Prompt
createCursorBuddyHandler({
model: openai("gpt-4o"),
speechModel: openai.speech("tts-1"),
transcriptionModel: openai.transcription("whisper-1"),
// Extend the default prompt
system: ({ defaultPrompt }) => `
${defaultPrompt}
You are helping users navigate a project management dashboard.
The sidebar contains: Projects, Tasks, Calendar, Settings.
`,
})Client Configuration
<CursorBuddy
// Required
endpoint="/api/cursor-buddy"
// Optional
hotkey="ctrl+alt" // Push-to-talk hotkey (default: "ctrl+alt")
container={element} // Portal container (default: document.body)
transcription={{ mode: "auto" }} // "auto" | "browser" | "server"
speech={{ mode: "server", allowStreaming: false }}
// mode: "auto" | "browser" | "server"
// allowStreaming: speak sentence-by-sentence while chat streams
// Custom components
cursor={(props) => <CustomCursor {...props} />}
speechBubble={(props) => <CustomBubble {...props} />}
waveform={(props) => <CustomWaveform {...props} />}
// Tool display configuration
toolDisplay={{
"*": { minDisplayTime: 1500 }, // Default for all tools
web_search: { label: "Searching..." }, // Custom label
internal_tool: { mode: "hidden" }, // Hide from UI
}}
renderToolBubble={(props) => <CustomToolBubble {...props} />}
// Callbacks
onTranscript={(text) => {}} // Called when speech is transcribed
onResponse={(text) => {}} // Called when AI responds
onPoint={(target) => {}} // Called when AI points at element
onStateChange={(state) => {}} // Called on state change
onError={(error) => {}} // Called on error
onToolCall={(event) => {}} // Called when a tool is invoked
onToolResult={(event) => {}} // Called when a tool completes
/>Transcription Modes
"auto"— Try browser speech recognition first, then fall back to the server transcription route if needed."browser"— Require browser speech recognition. If it fails, the turn errors and no server fallback is attempted."server"— Skip browser speech recognition and always use the server transcription route.
Speech Modes
"auto"— Try browser speech synthesis first, then fall back to the server TTS route if browser speech is unavailable or fails."browser"— Require browser speech synthesis. If it fails, the turn errors and no server fallback is attempted."server"— Skip browser speech synthesis and always use the server TTS route.
Speech Streaming
speech.allowStreaming: false— Wait for the full/chatresponse, then speak it once.speech.allowStreaming: true— Speak completed sentence segments as the chat stream arrives.
Tool Display
When the AI uses tools (like web search), bubbles appear near the cursor showing the tool's status. Configure how tools are displayed:
<CursorBuddy
endpoint="/api/cursor-buddy"
toolDisplay={{
// Default settings for all tools
"*": {
minDisplayTime: 1500, // Minimum time to show bubble (ms)
},
// Per-tool configuration
web_search: {
label: "Searching the web...", // Static label
// Or dynamic label based on status:
// label: (args, status) => status === "completed" ? "Found results" : "Searching..."
},
// Hide internal tools from UI
internal_logging: {
mode: "hidden",
},
// Custom render for specific tool
data_fetch: {
render: (props) => (
<div className="my-custom-bubble">
{props.status === "pending" ? "Loading..." : "Done!"}
</div>
),
},
}}
/>Tool Call States
| Status | Description |
|--------|-------------|
| pending | Tool called, waiting for result |
| awaiting_approval | Needs user consent (for tools with needsApproval) |
| approved | User approved, executing |
| denied | User denied the tool call |
| completed | Finished successfully |
| failed | Execution failed |
Approval Keyboard Shortcuts
When a tool requires approval, use these keyboard shortcuts:
| Key | Action | |-----|--------| | Y or Enter | Approve the tool call | | N or Escape | Deny the tool call |
Shortcuts are automatically enabled when a tool is awaiting approval and disabled otherwise. They are ignored when focus is in an input field or textarea.
Customization
CSS Variables
Cursor buddy styles are customizable via CSS variables. Override them in your stylesheet:
:root {
/* Cursor colors by state */
--cursor-buddy-color-idle: #3b82f6;
--cursor-buddy-color-listening: #ef4444;
--cursor-buddy-color-processing: #eab308;
--cursor-buddy-color-responding: #22c55e;
/* Speech bubble */
--cursor-buddy-bubble-bg: #ffffff;
--cursor-buddy-bubble-text: #1f2937;
--cursor-buddy-bubble-radius: 8px;
--cursor-buddy-bubble-shadow: 0 4px 12px rgba(0, 0, 0, 0.15);
/* Waveform */
--cursor-buddy-waveform-color: #ef4444;
/* Tool bubbles */
--cursor-buddy-tool-bg: #ffffff;
--cursor-buddy-tool-text: #1f2937;
--cursor-buddy-tool-pending: #3b82f6;
--cursor-buddy-tool-approval: #f59e0b;
--cursor-buddy-tool-success: #22c55e;
--cursor-buddy-tool-error: #ef4444;
}Custom Components
Replace default components with your own:
import { CursorBuddy, type CursorRenderProps } from "cursor-buddy/react"
function MyCursor({ state, rotation, scale }: CursorRenderProps) {
return (
<div style={{ transform: `rotate(${rotation}rad) scale(${scale})` }}>
{state === "listening" ? "Listening..." : "Point"}
</div>
)
}
<CursorBuddy
endpoint="/api/cursor-buddy"
cursor={(props) => <MyCursor {...props} />}
/>Headless Mode
For full control, use the provider and hook directly:
import {
CursorBuddyProvider,
useCursorBuddy
} from "cursor-buddy/react"
function App() {
return (
<CursorBuddyProvider endpoint="/api/cursor-buddy">
<MyCustomUI />
</CursorBuddyProvider>
)
}
function MyCustomUI() {
const {
state, // "idle" | "listening" | "processing" | "responding"
liveTranscript, // In-progress transcript while speaking
transcript, // Latest user speech
response, // Latest AI response
audioLevel, // 0-1, for waveform visualization
isEnabled,
isPointing,
error,
// Tool state
toolCalls, // All tool calls in current turn
activeToolCalls, // Visible, non-expired tool calls
pendingApproval, // Tool awaiting user approval, or null
// Actions
startListening,
stopListening,
setEnabled,
pointAt, // Manually point at coordinates
dismissPointing,
reset,
// Tool actions
approveToolCall, // Approve a pending tool call
denyToolCall, // Deny a pending tool call
dismissToolCall, // Dismiss a tool call bubble
} = useCursorBuddy()
return (
<div>
<p>State: {state}</p>
<p>Live transcript: {liveTranscript}</p>
<button
onMouseDown={startListening}
onMouseUp={stopListening}
>
Hold to speak
</button>
{/* Render active tool calls */}
{activeToolCalls.map((tool) => (
<div key={tool.id}>
{tool.label}
{tool.status === "awaiting_approval" && (
<>
<button onClick={() => approveToolCall(tool.id)}>Yes</button>
<button onClick={() => denyToolCall(tool.id)}>No</button>
</>
)}
</div>
))}
</div>
)
}Complete Render Props types:
interface CursorRenderProps {
state: "idle" | "listening" | "processing" | "responding"
isPointing: boolean
rotation: number // Radians, direction of travel
scale: number // 1.0 normal, up to 1.3 during flight
}
interface SpeechBubbleRenderProps {
text: string
isVisible: boolean
}
interface WaveformRenderProps {
audioLevel: number // 0-1
isListening: boolean
}Framework-Agnostic Usage
For non-React environments, use the core client directly:
import { CursorBuddyClient } from "cursor-buddy"
const client = new CursorBuddyClient("/api/cursor-buddy", {
transcription: { mode: "auto" },
speech: { mode: "server", allowStreaming: false },
onStateChange: (state) => console.log("State:", state),
onTranscript: (text) => console.log("Transcript:", text),
onResponse: (text) => console.log("Response:", text),
onError: (err) => console.error("Error:", err),
})
// Subscribe to state changes
client.subscribe(() => {
const snapshot = client.getSnapshot()
console.log(snapshot)
})
// Trigger voice interaction
client.startListening()
// ... user speaks ...
client.stopListening()API Reference
Core Exports (cursor-buddy)
| Export | Description |
|--------|-------------|
| CursorBuddyClient | Framework-agnostic client class |
| VoiceState | Type: "idle" \| "listening" \| "processing" \| "responding" |
| PointingTarget | Type: { x: number, y: number, label: string } |
| Point | Type: { x: number, y: number } |
Server Exports (cursor-buddy/server)
| Export | Description |
|--------|-------------|
| createCursorBuddyHandler | Create the main request handler |
| DEFAULT_SYSTEM_PROMPT | Default system prompt for reference |
| CursorBuddyHandlerConfig | Type for handler configuration |
| CursorBuddyHandler | Return type of createCursorBuddyHandler |
Server Adapters (cursor-buddy/server/next)
| Export | Description |
|--------|-------------|
| toNextJsHandler | Convert handler to Next.js App Router format |
React Exports (cursor-buddy/react)
| Export | Description |
|--------|-------------|
| CursorBuddy | Drop-in component with built-in UI |
| CursorBuddyProvider | Headless provider for custom UI |
| useCursorBuddy | Hook to access state and actions |
Types (cursor-buddy/react)
| Export | Description |
|--------|-------------|
| CursorBuddyProps | Props for <CursorBuddy /> |
| CursorBuddyProviderProps | Props for <CursorBuddyProvider /> |
| UseCursorBuddyReturn | Return type of useCursorBuddy() |
| CursorRenderProps | Props passed to custom cursor |
| SpeechBubbleRenderProps | Props passed to custom speech bubble |
| WaveformRenderProps | Props passed to custom waveform |
| ToolBubbleRenderProps | Props passed to custom tool bubble |
| ToolCallState | State of a tool call |
| ToolCallStatus | "pending" \| "awaiting_approval" \| "approved" \| "denied" \| "completed" \| "failed" |
| ToolDisplayConfig | Configuration for tool display |
| ToolDisplayOptions | Options for a single tool |
Security Best Practices
Since the cursor-buddy endpoints allow direct LLM communication, it is strongly recommended to configure CORS and rate limiting to prevent abuse, unauthorized access, and unexpected API costs.
Wrap the handler with CORS and rate limiting:
// app/api/cursor-buddy/[...path]/route.ts
import { toNextJsHandler } from "cursor-buddy/server/next"
import { cursorBuddy } from "@/lib/cursor-buddy"
const handler = toNextJsHandler(cursorBuddy)
export async function POST(request: Request) {
// Verify origin
const origin = request.headers.get("origin")
if (origin !== process.env.ALLOWED_ORIGIN) {
return new Response("Unauthorized", { status: 403 })
}
// Check rate limit (e.g., 10 requests per minute)
const ip = request.headers.get("x-forwarded-for") || "unknown"
const { success } = await rateLimiter.limit(ip)
if (!success) {
return new Response("Rate limit exceeded", { status: 429 })
}
return handler(request)
}License
MIT
