aether-slm-framework
v1.1.2
Published
Zero-Cost, privacy-first, modular SLM & RAG framework for the browser. Run AI entirely on-device with no API keys and no data leaving the browser sandbox.
Downloads
663
Maintainers
Readme
Aether-SLM Framework
Browser-native AI with local inference, local RAG, SharedWorker VRAM sharing, and zero API-key cost.
Aether-SLM lets web apps run private AI directly in the browser. It combines a local small language model, a local RAG database, hardware-aware backend selection, OPFS model caching, and a production Hub for model delivery.
import { Aether } from 'aether-slm-framework';
const ai = await Aether.init();
const answer = await ai.say('Explain local-first AI in one sentence.');
console.log(answer);No API keys. No inference server. No prompt or document upload. The first model download is cached locally for future sessions.
Contents
- Why Aether
- Install
- Quick Start
- Create Aether App
- Copy-Paste Examples
- Framework Guides
- Configuration
- Runtime Modes
- Models
- Local RAG
- Aether Hub
- Feature Gallery App
- Browser Headers
- Troubleshooting
- Architecture
- Development
Why Aether
Aether is built for apps that need useful AI without sending user data to a remote model API.
- Local inference: generation runs in the browser through ONNX and transformers.js.
- Local RAG: documents are embedded, indexed, searched, and retrieved on device.
- Smart runtime: uses SharedWorker when cross-origin isolation is available, and Lite Worker fallback when it is not.
- Shared VRAM: same-origin tabs share one model host instead of loading a copy per tab.
- Smart defaults:
Aether.init()chooses the production Hub, runtime mode, and model tier automatically. - Hub delivery: model assets are served from
https://vibercoderofek.uk. - Hardware dispatch: picks WebNN, WebGPU, or WASM based on browser capability.
- OPFS cache: large model files are cached in the browser's Origin Private File System.
Install
npm install aether-slm-frameworkAether ships its browser workers in the package and loads them automatically.
Model assets are delivered by the production Hub at https://vibercoderofek.uk.
Create Aether App
For a new project, start with the generated Vite template:
npm exec --package aether-slm-framework create-aether-app -- my-local-ai
cd my-local-ai
npm install
npm run devThe template includes Aether.init({ model: 'fast' }), createAetherStatusPanel(...), the production Hub URL https://vibercoderofek.uk, and COOP/COEP headers for SharedWorker mode. If you remove the headers, Aether automatically falls back to Lite mode instead of crashing.
Aether ships ready-to-run browser workers and loads the Transformers runtime on demand. You do not need to install model-runtime packages for the quick start.
Quick Start
Two-line chatbot
import { Aether } from 'aether-slm-framework';
const ai = await Aether.init();
await ai.say('Hello from my local AI app.');Need a quicker first success on slower devices?
const ai = await Aether.init({ model: 'fast' });
await ai.ready({ capability: 'DRAFT', timeoutMs: 180_000 });Stream tokens into the page
import { Aether } from 'aether-slm-framework';
const ai = await Aether.init();
const output = document.querySelector('#output')!;
for await (const { chunk, mode } of ai.stream('Write a haiku about WebGPU.', 120)) {
output.textContent += chunk;
output.setAttribute('data-model-mode', mode);
}Copy-Paste Examples
1. Minimal HTML chat
<main>
<input id="prompt" placeholder="Ask Aether..." />
<button id="send">Send</button>
<pre id="output"></pre>
</main>
<script type="module">
import { Aether } from 'aether-slm-framework';
const ai = await Aether.init({ debug: true });
const prompt = document.querySelector('#prompt');
const output = document.querySelector('#output');
document.querySelector('#send').addEventListener('click', async () => {
output.textContent = '';
for await (const { chunk } of ai.stream(prompt.value, 160)) {
output.textContent += chunk;
}
});
</script>2. Streaming with model download status
import { Aether, createAetherStatusPanel } from 'aether-slm-framework';
import type { DownloadProgressResponse, SystemStateResponse } from 'aether-slm-framework';
const ai = await Aether.init({
model: 'fast',
hubUrl: 'https://vibercoderofek.uk',
debug: true,
});
const client = ai.client;
createAetherStatusPanel(client, '#aether-status', { showDetails: true });
client.onStateChange = (msg) => {
if (msg.type === 'SYSTEM_STATE') {
const state = (msg as SystemStateResponse).state;
console.log('Runtime state:', state);
}
if (msg.type === 'DOWNLOAD_PROGRESS') {
const progress = msg as DownloadProgressResponse;
console.log(
`${progress.modelRole}: ${progress.status} ${progress.progress}%`,
progress.file ?? '',
);
}
};
let answer = '';
for await (const { chunk, mode } of ai.stream('What is OPFS?', 140)) {
answer += chunk;
console.log(mode, chunk);
}Framework Guides
3. Local RAG search
import { AetherRAGClient } from 'aether-slm-framework';
const rag = new AetherRAGClient(
{ hubUrl: 'https://vibercoderofek.uk' },
{
onStatus: console.log,
onProgress: ({ indexed, total, filename }) =>
console.log(`Indexed ${indexed}/${total}: ${filename}`),
},
);
await rag.indexText(
'product-docs',
'Aether stores embeddings locally and never uploads user documents.',
{ namespace: 'docs', persist: true },
);
const results = await rag.query('Where are embeddings stored?', {
namespace: 'docs',
topK: 3,
});
console.log(results.map((r) => r.text));4. Grounded local answer with RAG
import { Aether, AetherRAGClient } from 'aether-slm-framework';
const ai = await Aether.init();
const rag = new AetherRAGClient({
mode: 'BOTH',
hubUrl: 'https://vibercoderofek.uk',
});
await rag.indexText(
'app-facts',
'The VibeCoder framework is powered by local RAG and browser-side inference.',
{ namespace: 'facts' },
);
const question = 'What powers the framework?';
const hits = await rag.query(question, { namespace: 'facts', topK: 4 });
const context = hits.map((hit) => hit.text).join('\n---\n');
let answer = '';
for await (const { chunk } of ai.stream(`Context:\n${context}\n\nQuestion: ${question}`, 160)) {
answer += chunk;
}
console.log(answer);5. React-style hook
import { useEffect, useRef, useState } from 'react';
import { Aether } from 'aether-slm-framework';
import type { AetherSession } from 'aether-slm-framework';
export function LocalChat() {
const ai = useRef<AetherSession | null>(null);
const [prompt, setPrompt] = useState('');
const [answer, setAnswer] = useState('');
const [ready, setReady] = useState(false);
useEffect(() => {
Aether.init().then((session) => {
ai.current = session;
setReady(true);
});
}, []);
async function send() {
if (!ai.current) return;
setAnswer('');
for await (const { chunk } of ai.current.stream(prompt, 160)) {
setAnswer((value) => value + chunk);
}
}
return (
<section>
<textarea value={prompt} onChange={(event) => setPrompt(event.target.value)} />
<button disabled={!ready} onClick={send}>Send</button>
<pre>{answer}</pre>
</section>
);
}Configuration
Aether.init() accepts the same config as AetherClient.
import { Aether } from 'aether-slm-framework';
const ai = await Aether.init({
mode: 'BOTH',
hubUrl: 'https://vibercoderofek.uk',
model: 'auto',
runtimeMode: 'auto',
deviceTier: 'auto',
debug: true,
maxContextTokens: 2048,
vramHardLimitMB: 4096,
downloadConcurrency: 6,
});| Option | Default | Description |
| --- | --- | --- |
| mode | 'SLM' for AetherClient, 'BOTH' for Aether.init() | 'SLM' loads inference only, 'DB' loads RAG only, 'BOTH' loads both. |
| hubUrl | https://vibercoderofek.uk | Production model Hub origin. Do not change unless you operate your own Hub. |
| model | 'auto' | Friendly model preset or custom model object. See Models. |
| runtimeMode | 'auto' | 'auto', 'shared', or 'lite'. Auto uses SharedWorker when possible and Lite Worker otherwise. |
| deviceTier | 'auto' | 'auto', 'mobile', or 'pc'. Controls default model choice. |
| ragEmbeddingMode | 'lite' | 'lite', 'semantic', or 'auto'. Lite gives instant local RAG; semantic loads a local embedding model. |
| debug | false | Emit detailed runtime logs. |
| allowConcurrent | false | Allow more than one generate() call on the same client instance. |
| maxContextTokens | 2048 | Approximate prompt budget before middle truncation. |
| vramHardLimitMB | 4096 | Safety ceiling used by the runtime. |
| downloadConcurrency | 6 | Parallel range-request count, clamped from 1 to 16. |
| bypassCache | false | Force fresh model downloads instead of OPFS cache hits. |
| queuingStrategy | 'ROUND_ROBIN' | SharedWorker scheduling: 'ROUND_ROBIN' or 'FIFO'. |
| thermalThrottleMs | 1500 | Adds a small delay after long inference work. |
| sharedWorkerUrl | auto | Advanced worker URL override. |
| liteWorkerUrl | auto | Advanced Lite Worker URL override. |
| ragWorkerUrl | auto | Advanced RAG Worker URL override. |
Runtime Modes
SharedWorker mode
Used when the page is cross-origin isolated. This is the best runtime for production apps.
- One model host per same-origin app.
- Multiple tabs share the same model instance.
- Best for VRAM deduplication and long sessions.
- Requires COOP/COEP headers.
Lite mode
Used when headers are missing or SharedWorker is unavailable.
- Runs in a dedicated Worker.
- Works with zero server configuration.
- Does not provide cross-tab VRAM sharing.
- The big target model may be
On Demandon constrained devices.
You can force a runtime:
await Aether.init({ runtimeMode: 'lite' });
await Aether.init({ runtimeMode: 'shared' });Models
Aether chooses a model plan automatically, but users can choose.
await Aether.init(); // default: model: 'auto'
await Aether.init({ model: 'fast' });
await Aether.init({ model: 'llama-3.2-1b' });
await Aether.init({ model: 'qwen2.5-1.5b' });
await Aether.init({ model: 'qwen2.5-coder-3b' });
await Aether.init({ model: 'llama-3.1-8b', deviceTier: 'pc' });
await Aether.init({
model: {
id: 'your-org/your-onnx-model',
},
});
await Aether.init({
model: {
draft: 'your-org/small-draft-onnx',
target: 'your-org/big-target-onnx',
},
});
console.table(Aether.models());
console.table(Aether.models({ hubOnly: true }));
console.table(await Aether.modelsWithHubStatus());Aether.models({ hubOnly: true }) returns the presets already known to be Hub-backed. Aether.modelsWithHubStatus() also asks the live Hub for a model manifest when the Hub exposes one, then merges that with the built-in catalog. Custom Hugging Face/ONNX model IDs remain allowed through { id } or { draft, target }.
| Preset | Model plan | Best fit | Hub status |
| --- | --- | --- | --- |
| auto | Mobile: Llama 3.2 1B. PC: Llama 3.2 1B draft + Llama 3.1 8B target. | Default smart plan | Available |
| fast | Llama 3.2 1B only | First-run demos, mobile | Available |
| llama-3.2-1b | onnx-community/Llama-3.2-1B-Instruct-ONNX | Mobile/local chat | Available |
| llama-3.1-8b | 1B draft + llmware/llama-3.1-instruct-onnx target | PC high-reasoning | Available |
| qwen2.5-0.5b | onnx-community/Qwen2.5-0.5B-Instruct | Tiny multilingual demos | Candidate |
| qwen2.5-1.5b | onnx-community/Qwen2.5-1.5B-Instruct | Balanced small chat | Candidate |
| qwen2.5-coder-3b | 1.5B draft + onnx-community/Qwen2.5-Coder-3B-Instruct target | Coding/repo assistants | Candidate |
| phi-3.5-mini | 1B draft + onnx-community/Phi-3.5-mini-instruct-onnx-web target | Compact reasoning | Candidate |
| smollm2-1.7b | HuggingFaceTB/SmolLM2-1.7B-Instruct | Lightweight general chat | Candidate |
| gemma-3-1b | onnx-community/gemma-3-1b-it-ONNX | Gemma-family mobile flows | Candidate |
Model status events include:
queued: waiting for the small model to finish.checking: resolving model files and cache state.downloading: actively fetching model assets.cached: loading from OPFS.ready: loaded in the active worker.on-demand: not preloaded; loaded during generation in serial-swap mode.error: load failed.
Local RAG
The RAG pipeline runs in a Worker and uses Orama plus local embeddings. By default it uses ragEmbeddingMode: 'lite', so indexing and querying work immediately without downloading an embedding model.
import { AetherRAGClient } from 'aether-slm-framework';
const rag = new AetherRAGClient();
await rag.indexText(
'facts',
'Aether answers from private local context in the browser.',
{ namespace: 'demo' },
);
const hits = await rag.query('Where does Aether get context?', {
namespace: 'demo',
topK: 3,
});Available methods:
| Method | Use |
| --- | --- |
| indexText(source, text, options) | Index one text blob. |
| indexFiles(files, options) | Index browser File[] values. |
| indexEntries(entries, options) | Batch-index structured records. |
| query(text, options) | Hybrid BM25/vector search. |
| upsert(id, text, meta, options) | Replace a stable record. |
| delete(id, options) | Delete one record. |
| clear(options) | Clear a namespace or all local RAG data. |
RAG options:
{
namespace: 'support-docs',
persist: true,
embeddingMode: 'lite'
}namespaceisolates data domains.persist: truestores raw entries in IndexedDB and rehydrates them on reload.embeddingMode: 'lite'is instant and default. Use'semantic'when you want model-backed local embeddings.
Aether Hub
The production Hub is:
https://vibercoderofek.ukAether uses it for model asset delivery and cross-origin coordination. In application code, use:
const ai = await Aether.init({
hubUrl: 'https://vibercoderofek.uk',
});The Hub handshake is automatic in AetherClient, Aether.init(), and AetherRAGClient when hubUrl is set.
Feature Gallery App
This repository includes a full UI examples app.
npm install
npm run galleryOpen:
http://127.0.0.1:5200/The gallery includes:
- Repo Roaster: GitHub URL ingestion plus local RAG.
- Knowledge Nexus: local vector database visualization.
- Sovereign Chat: token streaming and draft/speculative visibility.
- Stress Test: multi-tab SharedWorker and VRAM deduplication test.
The sidebar shows:
- Active backend: WebNN, WebGPU, or WASM.
- Shared VRAM usage.
- SharedWorker connection count.
- Small model download/readiness.
- Big model queued/downloading/on-demand/readiness.
- System state.
Browser Headers
For the best runtime, serve these headers:
Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Embedder-Policy: require-corp
Cross-Origin-Resource-Policy: cross-origin
Permissions-Policy: cross-origin-isolated=(self)Vite example:
// vite.config.ts
import { defineConfig } from 'vite';
export default defineConfig({
server: {
headers: {
'Cross-Origin-Opener-Policy': 'same-origin',
'Cross-Origin-Embedder-Policy': 'require-corp',
'Cross-Origin-Resource-Policy': 'cross-origin',
'Permissions-Policy': 'cross-origin-isolated=(self)',
},
},
});If these headers are missing, Aether falls back to Lite mode instead of crashing.
Troubleshooting
| Symptom | Meaning | Fix |
| --- | --- | --- |
| Big model stays Queued | Small model is still downloading. | Wait for the small model to become Ready. |
| Big model becomes On Demand | Runtime cannot safely preload both models. | This is expected in Lite mode, WASM fallback, or constrained WebGPU. |
| self.crossOriginIsolated is false | COOP/COEP headers are missing. | Add the headers above or use Lite mode. |
| App logs Falling back to LITE mode | SharedWorker/SAB path is unavailable. | Add headers for SharedWorker mode, or accept Lite mode. |
| No available adapters | WebGPU adapter is unavailable in that browser/session. | Use Chrome/Edge with WebGPU enabled, update GPU drivers, or use WASM fallback. |
| Model download is slow | First load fetches large ONNX files. | Leave the tab open; OPFS cache makes later loads faster. |
| RAG query works but generation is slow | Embeddings are smaller than SLM weights. | Wait for model readiness, reduce maxTokens, or use mobile tier. |
| Hub request fails | Origin is blocked or offline. | Confirm https://vibercoderofek.uk/hub.html returns 200 and CORS is allowed. |
| Storage quota error | Browser storage is full. | Clear site data for the app origin and reload. |
Architecture
Browser origin
Tabs
AetherClient
SharedWorker or Lite Worker
Multiplexer
ONNXEngine
UMADispatcher: WebNN -> WebGPU -> WASM
OPFS model cache
Hub fetch interceptor
AetherRAGClient
RAG Worker
Lite embeddings or optional gte-small semantic embeddings
Orama BM25/vector index
optional IndexedDB persistenceData flow:
- Your app creates
Aether.init(),AetherClient, orAetherRAGClient. - Aether connects to the production Hub at
https://vibercoderofek.uk. - Model assets download once and are cached in OPFS.
- Prompts and documents remain in the browser.
- The runtime streams tokens or RAG results back to your UI.
Development
npm install
npm run typecheck
npm run build
npm run test
npm run galleryCommon scripts:
| Script | Description |
| --- | --- |
| npm run dev | Run the root Vite demo. |
| npm run gallery | Run the examples app on port 5200. |
| npm run build | Build library and workers. |
| npm run typecheck | Run TypeScript checks. |
| npm run test | Run unit tests. |
| npm run test:e2e | Run Playwright tests. |
License
ISC. See LICENSE.
Project Promise
Aether-SLM is for local-first AI applications: chat, RAG, document search, private copilots, offline tools, and multi-tab browser apps that should not require a model API bill.
The framework downloads model files, but user prompts, indexed documents, embeddings, and retrieval context stay local to the browser runtime.
