react-brai
v2.0.9
Published
The zero-latency WebGPU runtime for React. Run Llama-3, Phi-3, and Gemma directly in the browser with privacy-first local inference.
Maintainers
Readme
REACT-BRAI
OVERVIEW
react-brai is the easiest way to integrate local, private Large Language Models (LLMs) into your React applications. It powers in-browser AI inference using WebGPU, allowing you to run models like Llama-3 and Qwen directly on the client-no API keys, no server costs, and complete privacy.
FEATURES
- 100% Client-Side: Runs entirely in the browser using WebGPU.
- Zero Server Costs: No cloud API bills.
- Privacy First: User data never leaves their device.
- Streaming Support: Real-time token generation.
- Progress Tracking: Built-in hooks for download status.
- Model Agnostic: Supports MLC-compiled models (Llama 3, Qwen, etc.).
INSTALLATION
npm install react-brai
or
yarn add react-brai
Note: Requires a browser with WebGPU support (Chrome 113+, Edge, Brave).
QUICK START
import { useEffect, useState } from "react";
import { useLocalAI } from "react-brai";
export default function ChatComponent() {
// Destructure the hook state and methods
const {
loadModel,
isReady,
chat,
response,
isLoading,
progress
} = useLocalAI();
const [input, setInput] = useState("");
// 1. Load the model on mount
useEffect(() => {
loadModel("Llama-3.2-3B-Instruct-q4f16_1-MLC", {
contextWindow: 4096
});
}, []);
// 2. Handle sending messages
const handleSend = () => {
chat([
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: input }
]);
};
return (
<div>
{/* Loading State */}
{!isReady && (
<div>
Loading Model: {Math.round((progress?.progress || 0) * 100)}%
<p>{progress?.text}</p>
</div>
)}
{/* Chat Interface */}
{isReady && (
<>
<div>{response || "AI is thinking..."}</div>
<input
value={input}
onChange={(e) => setInput(e.target.value)}
disabled={isLoading}
/>
<button onClick={handleSend} disabled={isLoading}>
Send
</button>
</>
)}
</div>
);
}API REFERENCE: useLocalAI()
The core hook that manages the Web Worker and WebGPU engine.
RETURN VALUES:
isReady (boolean)
- True when the model is fully loaded and ready for inference.
isLoading (boolean)
- True while the model is currently generating a response.
response (string)
- The real-time streaming text output from the model.
progress (object)
- Download status. Contains { progress: number, text: string }.
loadModel(modelId, config)
- Initializes the engine.
- modelId (string): The MLC model ID (e.g., "Llama-3.2-3B-Instruct-q4f16_1-MLC").
- config (object):
- contextWindow (number): Max tokens (default 2048).
chat(messages)
- Sends a prompt to the model.
- messages (array): List of message objects [{ role: "user", content: "..." }].
REQUIREMENTS & SERVER CONFIG
HTTPS: WebGPU requires a secure context (HTTPS) or localhost.
Headers: If hook is not working in your app, try this hack. You should configure your development server (Vite/Next.js) to serve files with specific headers for multi-threading support.
// vite.config.js or next.config.js headers: { 'Cross-Origin-Embedder-Policy': 'require-corp', 'Cross-Origin-Opener-Policy': 'same-origin', }
