@rajnandan1/atticus
v1.1.3
Published
A framework-agnostic voice agent library for voice-controlled UI interactions
Maintainers
Readme
Atticus

A framework-agnostic voice agent library for voice-controlled UI interactions, powered by OpenAI's Realtime API.
Installation
npm/yarn
npm install @rajnandan1/atticusCDN (Vanilla HTML/JS)
<!-- Use the IIFE build via unpkg or jsdelivr -->
<script src="https://unpkg.com/@rajnandan1/atticus@latest/dist/index.global.js"></script>
<!-- Or specific version -->
<script src="https://unpkg.com/@rajnandan1/[email protected]/dist/index.global.js"></script>
<!-- jsdelivr alternative -->
<script src="https://cdn.jsdelivr.net/npm/@rajnandan1/atticus@latest/dist/index.global.js"></script>The script tag exposes Atticus globally - see Vanilla HTML Usage below.
Quick Start
import { Atticus } from "atticus";
// Get a client secret from your backend (which calls OpenAI's API)
const clientSecret = await fetchClientSecret();
const agent = new Atticus({
clientSecret,
voice: "shimmer", // Optional: alloy, ash, ballad, coral, echo, sage, shimmer, verse
language: "en", // Optional: supports 40+ languages
agent: {
name: "Assistant",
instructions: "You are a helpful assistant.",
},
ui: {
enabled: true,
rootElement: formContainer,
},
});
// Listen to events
agent.on("connected", () => console.log("Connected!"));
agent.on("message", (msg) => console.log("Message:", msg));
agent.on("error", (err) => console.error("Error:", err));
// Connect and start talking
await agent.connect();
// Disconnect when done
agent.disconnect();Vanilla HTML/JS Usage
Atticus works perfectly with vanilla HTML/JS using a script tag:
<!DOCTYPE html>
<html>
<head>
<title>Atticus Voice Demo</title>
</head>
<body>
<button id="connectBtn">Connect</button>
<div id="status">Idle</div>
<!-- Include Atticus -->
<script src="https://unpkg.com/@rajnandan1/atticus@latest/dist/index.global.js"></script>
<script>
// Atticus is now available globally
let agent = null;
document
.getElementById("connectBtn")
.addEventListener("click", async () => {
if (agent && agent.isConnected) {
agent.disconnect();
return;
}
// Get client secret from your backend
const response = await fetch("/api/session", {
method: "POST",
});
const { clientSecret } = await response.json();
agent = new Atticus.Atticus({
clientSecret,
agent: {
name: "Assistant",
instructions: "You are a helpful voice assistant.",
},
voice: "shimmer",
language: "en",
ui: {
enabled: true,
rootElement: document.body,
},
});
agent.on("connected", () => {
document.getElementById("status").textContent =
"Connected!";
});
agent.on("message", (msg) => {
console.log("Message:", msg);
});
await agent.connect();
});
</script>
</body>
</html>See index.html for a complete example.
UI-Aware Mode
Enable UI awareness to let users control your interface with voice. Actions are automatically executed by default:
const agent = new Atticus({
clientSecret,
agent: {
name: "UI Assistant",
instructions: "Help users fill out the form on this page.",
},
ui: {
enabled: true,
rootElement: document.getElementById("app")!,
autoUpdate: true, // Auto-refresh DOM context
},
});
// Actions are auto-executed! Just listen for logging/feedback
agent.on("action", (action) => {
console.log("Action executed:", action.outputText);
console.log("Code:", action.outputCode);
});
await agent.connect();
// Now say: "Fill the name field with John Doe"
// The library will automatically execute the action!Manual Action Execution
If you want to handle actions yourself:
const agent = new Atticus({
clientSecret,
agent: { name: "Assistant", instructions: "..." },
doNotExecuteActions: true, // Disable auto-execution
ui: { enabled: true, rootElement: document.body },
});
agent.on("action", async (action) => {
// Validate or modify action before execution
if (action.actionType === "click") {
const result = await agent.executeAction(action);
console.log("Result:", result);
}
});Configuration
interface AtticusConfig {
// Required: OpenAI client secret (ephemeral key)
clientSecret: string;
// Required: Agent configuration
agent: {
name: string;
instructions: string;
};
// Optional: Voice for the agent (default: 'alloy')
// Options: 'alloy', 'ash', 'ballad', 'coral', 'echo', 'sage', 'shimmer', 'verse'
voice?: AtticusVoice;
// Optional: Language code (default: 'en')
// Supports: en, es, fr, de, it, pt, ru, ja, ko, zh, hi, ar, and 30+ more
language?: string;
// Optional: OpenAI model (default: 'gpt-4o-realtime-preview')
model?: string;
// Optional: Auto-greet on connect (default: true)
autoGreet?: boolean;
// Optional: Greeting message (default: language-specific greeting)
greetingMessage?: string;
// Optional: Debug logging (default: false)
debug?: boolean;
// Optional: Disable auto-execution of UI actions (default: false)
doNotExecuteActions?: boolean;
// Optional: UI awareness configuration
ui?: {
enabled: boolean;
rootElement: Element;
autoUpdate?: boolean;
autoUpdateInterval?: number; // ms, default: 5000
d2SnapOptions?: {
maxTokens?: number; // default: 4096
assignUniqueIDs?: boolean; // default: true
};
};
}Voice Options
| Voice | Description |
| --------- | --------------------------- |
| alloy | Neutral, balanced (default) |
| ash | Soft, gentle |
| ballad | Warm, expressive |
| coral | Clear, friendly |
| echo | Smooth, conversational |
| sage | Calm, wise |
| shimmer | Bright, energetic |
| verse | Articulate, professional |
Supported Languages
Atticus supports 40+ languages with native greetings. Set the language option:
const agent = new Atticus({
clientSecret,
language: "hi", // Hindi - will greet with "नमस्ते!"
agent: { name: "Assistant", instructions: "..." },
});| Code | Language | Code | Language | Code | Language |
| ---- | ---------- | ---- | -------- | ---- | --------- |
| en | English | ja | Japanese | pl | Polish |
| hi | Hindi | ko | Korean | nl | Dutch |
| es | Spanish | zh | Chinese | sv | Swedish |
| fr | French | ar | Arabic | da | Danish |
| de | German | bn | Bengali | no | Norwegian |
| it | Italian | ta | Tamil | fi | Finnish |
| pt | Portuguese | te | Telugu | tr | Turkish |
| ru | Russian | th | Thai | uk | Ukrainian |
Events
| Event | Payload | Description |
| ------------------------- | ------------------- | ------------------------------------------------------------- |
| connected | - | Successfully connected |
| disconnected | - | Disconnected |
| error | string | Error occurred |
| statusChange | AtticusStatus | Connection status changed |
| conversationStateChange | ConversationState | Conversation state changed |
| message | Message | New message received |
| historyChange | Message[] | Conversation history updated |
| stateChange | AtticusState | Any state changed |
| agentStart | - | Agent started speaking |
| agentEnd | - | Agent stopped speaking |
| userAudio | - | User audio detected |
| action | UIAction | UI action executed (or requested if doNotExecuteActions=true) |
UI Action Types
When UI mode is enabled, the agent can perform these actions:
| Action | Description | Example Code |
| ---------- | -------------------------- | ----------------------------------------------------- |
| click | Click elements | document.getElementById('btn').click() |
| type | Enter text | document.getElementById('input').value = 'Hello' |
| scroll | Scroll page/elements | window.scrollTo(0, 500) |
| focus | Focus form elements | document.getElementById('field').focus() |
| select | Select dropdown options | document.getElementById('select').value = 'option1' |
| hover | Hover over elements | - |
| navigate | Navigate pages | window.location.href = '/page' |
| read | Read information (no code) | - |
API
Methods
connect()- Connect to the voice agentdisconnect()- Disconnect from the voice agenttoggle()- Toggle connection stateinterrupt()- Interrupt the AI while speakingsendMessage(text)- Send a text messageupdateDOM(element | html)- Manually update DOM contextrefreshDOM()- Refresh DOM from root elementstartAutoUpdate()- Start auto-updating DOMstopAutoUpdate()- Stop auto-updating DOMexecuteAction(action)- Manually execute a UI actiongetState()- Get complete state objectdestroy()- Clean up resources
Properties
status- Connection status (idle|connecting|connected|error)conversationState- Conversation state (idle|ai_speaking|user_turn|user_speaking)error- Error message (if any)history- Conversation historyisConnected- Is connectedisAiSpeaking- Is AI speakingisUserSpeaking- Is user speakinglanguage- Configured languagecurrentDOM- Current DOM contextisUIEnabled- Is UI mode enabled
Getting a Client Secret
The client secret (ephemeral key) must be obtained from OpenAI's API. You can get it directly via curl or from your backend.
Option 1: Direct curl (for testing)
curl -X POST "https://api.openai.com/v1/realtime/sessions" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-realtime-preview-2024-12-17",
"voice": "shimmer"
}'Response:
{
"id": "sess_xxx",
"object": "realtime.session",
"model": "gpt-4o-realtime-preview-2024-12-17",
"client_secret": {
"value": "ek_xxx...",
"expires_at": 1234567890
}
}Copy the client_secret.value and use it with Atticus.
Option 2: Backend (Node.js/Express) - Recommended for Production
app.post("/api/session", async (req, res) => {
const response = await fetch(
"https://api.openai.com/v1/realtime/sessions",
{
method: "POST",
headers: {
Authorization: `Bearer ${process.env.OPENAI_API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
model: "gpt-4o-realtime-preview-2024-12-17",
voice: "shimmer",
}),
}
);
const data = await response.json();
res.json({ clientSecret: data.client_secret.value });
});Frontend
async function fetchClientSecret() {
const response = await fetch('/api/session', { method: 'POST' });
const data = await response.json();
return data.clientSecret;
}
const clientSecret = await fetchClientSecret();
const agent = new Atticus({ clientSecret, ... });Running the Demo
# Clone the repo
git clone https://github.com/rajnandan1/atticus.git
cd atticus
# Install dependencies
npm install
# Start dev server (builds + serves demo)
npm run dev
# Open http://localhost:3000/demo/