keyquill
v3.6.0
Published
Bring Your Own Key to any web app — without trusting their server. Browser extension SDK for secure LLM API key management.
Maintainers
Readme
Keyquill
Bring Your Own Key to any web app — without trusting their server.
A browser extension + SDK that lets users securely use their own LLM API keys from any web application. Keys never leave the browser extension — no server relay needed.
The Problem
Web apps that use LLM APIs face a dilemma:
- Server-side proxy: The app server sees the user's API key (security risk)
- Direct browser calls: Blocked by CORS (LLM providers don't allow browser-origin requests)
- Store key in localStorage: Vulnerable to XSS attacks
The Solution
Your Web App Keyquill Extension LLM Provider
┌──────────┐ ┌─────────────────────┐ ┌──────────┐
│ │ chrome.runtime │ │ fetch() │ │
│ SDK │───────────────────>│ Service Worker │────────────>│ OpenAI │
│ │ (messages only) │ + Key Storage │ (no CORS) │ Anthropic│
│ │<───────────────────│ │<────────────│ Gemini │
│ │ Port (streaming) │ chrome.storage │ SSE │ etc. │
└──────────┘ │ .session │ └──────────┘
└─────────────────────┘
Keys stay HERE. Always.Security properties:
- API keys stored in
chrome.storage.session— inaccessible to web page JavaScript - Keys cleared automatically when the browser closes
- Extension service worker makes CORS-free calls to LLM providers
- Per-origin consent (MetaMask-style): first use from each origin requires explicit user approval via a consent popup. Grants are stored in
chrome.storage.localand can be revoked from the extension popup. - Key management (
registerKey/deleteKey) is restricted to the extension popup — web pages cannot register or delete keys - Web app never sees the key — only sends messages and receives streamed text
Quick Start
1. Install the SDK
npm install keyquill2. Use in your app
keyquill@3 (current major) uses a capability-first API — the app declares what it needs, the user's policy picks the actual model. Three ergonomic tiers:
import { Keyquill } from "keyquill";
const quill = new Keyquill();
if (await quill.isAvailable()) {
if (!(await quill.isConnected())) {
try {
await quill.connect();
} catch (err) {
// USER_DENIED / TIMEOUT — handle gracefully
return;
}
}
// ── Tier 1: zero-config ──────────────────────────────
// Uses the key's default model. Simplest possible chat.
const { completion } = await quill.chat({
messages: [{ role: "user", content: "Hello!" }],
});
console.log(completion.content);
// ── Tier 2: capability-declared (recommended) ────────
// The broker picks the best model in the user's allowlist that
// satisfies every capability. `tone` abstracts over temperature.
for await (const event of quill.chatStream({
messages: [{ role: "user", content: "Debug this code..." }],
requires: ["reasoning", "long_context"],
tone: "precise",
maxOutput: 2048,
})) {
if (event.type === "delta") process.stdout.write(event.text);
}
// Tool calling — `tool_use` is implied by passing `tools`.
const { completion: res } = await quill.chat({
messages: [{ role: "user", content: "What's the weather in Tokyo?" }],
tools: [
{
type: "function",
function: {
name: "get_weather",
description: "Get current weather",
parameters: {
type: "object",
properties: { location: { type: "string" } },
required: ["location"],
},
},
},
],
});
if (res.tool_calls) {
console.log(res.tool_calls[0].function.name); // "get_weather"
}
// ── Tier 3: full control ─────────────────────────────
// Pin the exact model + parameters.
const { completion: pro } = await quill.chat({
messages: [{ role: "user", content: "Prove the central limit theorem." }],
prefer: {
model: "gpt-5.4-pro",
reasoningEffort: "high",
temperature: 1, // reasoning models require 1
},
});
// Vision (multimodal) — `vision` is implied by an image ContentPart.
for await (const event of quill.chatStream({
messages: [
{
role: "user",
content: [
{ type: "text", text: "What is in this image?" },
{ type: "image_url", image_url: { url: "data:image/png;base64,..." } },
],
},
],
requires: ["vision"],
})) {
if (event.type === "delta") process.stdout.write(event.text);
}
// ── Preview: dry-run before committing ───
// Tells you which model would run, the estimated cost, and whether a
// consent prompt would fire — without actually sending the request.
const plan = await quill.preview({
messages: [{ role: "user", content: "Long analysis prompt…" }],
requires: ["reasoning"],
tone: "precise",
});
if (plan.kind === "ready") {
console.log(`~$${plan.estimatedCostUSD.toFixed(4)} with ${plan.model.displayName}`);
}
}3. Install the extension
Load the extension from packages/keyquill-extension/dist/ in Chrome:
- Go to
chrome://extensions/ - Enable "Developer mode"
- Click "Load unpacked" and select the
distfolder
API Reference
new Keyquill(options?)
| Option | Type | Default | Description |
| ------------- | -------- | ----------- | ----------------------------------------- |
| extensionId | string | auto-detect | Chrome extension ID |
| timeout | number | 5000 | Timeout for non-streaming operations (ms) |
quill.isAvailable(): Promise<boolean>
Check if the extension is installed and responsive. Result is cached for 30 seconds.
quill.isConnected(): Promise<boolean>
Check whether the current origin has an active consent grant. Call this before deciding whether to show a "Connect" button.
quill.connect(timeoutMs?: number): Promise<void>
Request permission for the current origin. Opens a consent popup the first time
(60 second default timeout for user interaction). Resolves when the user approves,
throws with USER_DENIED or TIMEOUT otherwise. Subsequent calls return
instantly if the grant is already present.
try {
await quill.connect();
} catch (err) {
// User denied or timed out — show a "Try again" prompt
}quill.disconnect(): Promise<void>
Revoke the current origin's grant. The user will be prompted again on the next
connect() call.
quill.listProviders(): Promise<ProviderSummary[]>
List registered providers. No key material is returned — only hints like sk-t...st12.
Requires an active connection (call connect() first).
quill.registerKey(provider, params): Promise<void>
Popup-only. Calling this from a web page throws
BLOCKED. API key material must not travel through the web-page channel. Users register keys via the extension popup (toolbar icon).
quill.deleteKey(provider): Promise<void>
Popup-only. Same rationale as
registerKey: a compromised origin should not be able to wipe user keys. Users delete via the extension popup.
quill.testKey(provider): Promise<{ reachable: boolean }>
Test connectivity to a provider.
quill.chat(params): Promise<{ completion; keyId }>
Non-streaming chat completion. Returns the full response plus the keyId that serviced it.
quill.chatStream(params): AsyncGenerator<StreamEvent>
Stream a chat completion. First event is always { type: "start", keyId, provider, label } so callers can tell which key serviced the request.
type StreamEvent =
| { type: "start"; keyId: string; provider: string; label: string }
| { type: "delta"; text: string }
| { type: "tool_call_delta"; tool_calls: ToolCallDelta[] }
| {
type: "done";
finish_reason?: string;
usage?: { promptTokens: number; completionTokens: number };
}
| { type: "error"; code: string; message: string };quill.preview(params): Promise<PlanPreview>
Dry-run the resolver. Returns the model that would service the request, its estimated cost, and whether a consent prompt or policy rejection would fire — without issuing a provider fetch or opening any popup. Useful for pre-flight UX: cost previews, "this will need approval" hints, or capability-fallback logic.
const plan = await quill.preview({
messages: [{ role: "user", content: "Prove Fermat's Last Theorem." }],
requires: ["reasoning", "long_context"],
tone: "precise",
});
if (plan.kind === "ready") {
console.log(
`Would use ${plan.model.displayName} (${plan.model.releaseStage}); ` +
`~$${plan.estimatedCostUSD.toFixed(4)}, ` +
`${plan.estimatedTokens.output} output tokens.`,
);
} else if (plan.kind === "consent-required") {
console.log(`Heads up — ${plan.message}`);
} else {
console.error(`Would be rejected: ${plan.message}`);
}type PlanPreview =
| {
kind: "ready";
keyId: string;
provider: string;
model: {
id: string;
displayName: string;
capabilities: readonly Capability[];
releaseStage: "stable" | "preview" | "deprecated";
};
estimatedCostUSD: number;
estimatedTokens: { input: number; output: number };
selectionReason:
| "default"
| "explicit"
| "capability-match"
| "preferred-per-capability";
}
| {
kind: "consent-required";
reason:
| "model-outside-allowlist"
| "model-in-denylist"
| "high-cost"
| "capability-missing";
message: string;
proposedModel?: { id: string; displayName: string };
}
| { kind: "rejected"; reason: string; message: string };Preview requires an active connection — a compromised origin cannot probe the user's policy without prior consent.
ChatParams (shared by chat and chatStream)
interface ChatParams {
messages: ChatMessage[]; // conversation (text / vision / tool results)
tools?: Tool[]; // function calling
toolChoice?: ToolChoice; // "none" | "auto" | "required" | specific
responseFormat?: ResponseFormat; // "text" | "json_object" | "json_schema"
// Capability-first fields
requires?: Capability[]; // capabilities the broker must satisfy
tone?: "precise" | "balanced" | "creative";
maxOutput?: number; // max output tokens (clamped by policy)
prefer?: {
model?: string; // Tier 3 explicit model pin
provider?: string; // narrow to a specific provider
temperature?: number;
topP?: number;
reasoningEffort?: "minimal" | "low" | "medium" | "high";
};
keyId?: string; // explicit key selection
}
type Capability =
| "tool_use" | "structured_output" | "vision" | "audio"
| "reasoning" | "long_context" | "streaming" | "cache"
| "fast" | "cheap" | "multilingual" | "code";Migration history
From the legacy [email protected] (snake_case) API
Pin it if you're not ready to migrate — it still works against the
installed extension via an internal wire translator. When you migrate
to @3, rewrite the top-level fields per this table:
| @0.3.x (legacy) | @3 (current) |
| --- | --- |
| model: "gpt-4o" | prefer: { model: "gpt-4o" } |
| temperature: 0.7 | prefer: { temperature: 0.7 } — or tone: "balanced" |
| top_p: 0.9 | prefer: { topP: 0.9 } |
| max_tokens: 2048 | maxOutput: 2048 |
| max_completion_tokens: 2048 | maxOutput: 2048 |
| reasoning_effort: "high" | prefer: { reasoningEffort: "high" } |
| tool_choice: "required" | toolChoice: "required" |
| response_format: { type: "json_object" } | responseFormat: { type: "json_object" } |
| provider: "openai" | prefer: { provider: "openai" } |
| stop: [...] | (removed — not commonly used, re-request if needed) |
Between majors since @1
If you're coming from @1.x or @2.x, the ChatParams shape is
identical — only a few type names on the SDK surface changed:
| Deprecated name | Replacement | Removed in |
| --- | --- | --- |
| VaultRequest | KeyquillRequest | @2 |
| VaultResponse | KeyquillResponse | @2 |
| KeySummary.defaultModel | KeySummary.effectiveDefaultModel | @3 |
| KeySummary.defaults | KeySummary.policy | @3 |
| KeySummary.isActive | (removed — per-origin bindings drive routing) | @3 |
# Install
npm install keyquill@3 # current
npm install [email protected] # legacy snake_case API (frozen)Supported Providers
The wire protocol is OpenAI Chat Completions format. Any OpenAI-compatible provider works out of the box.
| Provider | Base URL | Type |
| ----------- | --------------------------------------------------------- | ---------- |
| OpenAI | https://api.openai.com/v1 | Native |
| Anthropic | https://api.anthropic.com/v1 | Translated |
| Gemini | https://generativelanguage.googleapis.com/v1beta/openai | Compatible |
| Groq | https://api.groq.com/openai/v1 | Compatible |
| Mistral | https://api.mistral.ai/v1 | Compatible |
| DeepSeek | https://api.deepseek.com/v1 | Compatible |
| Together AI | https://api.together.xyz/v1 | Compatible |
| Fireworks | https://api.fireworks.ai/inference/v1 | Compatible |
| xAI (Grok) | https://api.x.ai/v1 | Compatible |
| Ollama | http://localhost:11434/v1 | Compatible |
Anthropic is the only provider requiring translation (OpenAI format → Messages API). All others receive requests as-is.
Comparison with Alternatives
| Approach | Key Location | XSS Safe | Server Trust | CORS | | -------------- | --------------------- | -------- | -------------- | ------------ | | Server proxy | Server memory | Yes | Required | N/A | | localStorage | Browser JS | No | N/A | Blocked | | sessionStorage | Browser JS | No | N/A | Blocked | | Keyquill | Extension storage | Yes | Not needed | Bypassed |
Framework Support
Zero dependencies. Works with any framework:
- React / Next.js
- Vue / Nuxt
- Svelte / SvelteKit
- Preact
- Vanilla JavaScript/TypeScript
License
MIT
