react-brai

v2.1.7

Published

a month ago

The zero-latency WebGPU runtime for React. Run Llama-3, Phi-3, and Gemma directly in the browser with privacy-first local inference.

0High
0Medium
0Low

rahulpanchal16

react webgpu web-llm local-ai inference llama phi mistral hook

REACT-BRAI

OVERVIEW

react-brai is the easiest way to integrate local, private Large Language Models (LLMs) into your React applications. It powers in-browser AI inference using WebGPU, allowing you to run models like Llama-3 and Qwen directly on the client-no API keys, no server costs, and complete privacy.

FEATURES

100% Client-Side: Runs entirely in the browser using WebGPU.
Zero Server Costs: No cloud API bills.
Privacy First: User data never leaves their device.
Streaming Support: Real-time token generation.
Progress Tracking: Built-in hooks for download status.
Model Agnostic: Supports MLC-compiled models (Llama 3, Qwen, etc.).

INSTALLATION

npm install react-brai

or

yarn add react-brai

Note: Requires a browser with WebGPU support (Chrome 113+, Edge, Brave).

QUICK START

import { useEffect, useState } from "react";
import { useLocalAI } from "react-brai";

export default function ChatComponent() {
  // Destructure the hook state and methods
  const { 
    loadModel, 
    isReady, 
    chat, 
    response, 
    isLoading, 
    progress 
  } = useLocalAI();

  const [input, setInput] = useState("");

  // 1. Load the model on mount
  useEffect(() => {
    loadModel("Llama-3.2-3B-Instruct-q4f16_1-MLC", { 
      contextWindow: 4096 
    });
  }, []);

  // 2. Handle sending messages
  const handleSend = () => {
    chat([
      { role: "system", content: "You are a helpful assistant." },
      { role: "user", content: input }
    ]);
  };

  return (
    <div>
      {/* Loading State */}
      {!isReady && (
        <div>
          Loading Model: {Math.round((progress?.progress || 0) * 100)}%
          <p>{progress?.text}</p>
        </div>
      )}

      {/* Chat Interface */}
      {isReady && (
        <>
          <div>{response || "AI is thinking..."}</div>
          
          <input 
            value={input} 
            onChange={(e) => setInput(e.target.value)} 
            disabled={isLoading}
          />
          <button onClick={handleSend} disabled={isLoading}>
            Send
          </button>
        </>
      )}
    </div>
  );
}

API REFERENCE: useLocalAI()

The core hook that manages the Web Worker and WebGPU engine.

RETURN VALUES:

isReady (boolean)
- True when the model is fully loaded and ready for inference.
isLoading (boolean)
- True while the model is currently generating a response.
response (string)
- The real-time streaming text output from the model.
progress (object)
- Download status. Contains { progress: number, text: string }.
loadModel(modelId, config)
- Initializes the engine.
- modelId (string): The MLC model ID (e.g., "Llama-3.2-3B-Instruct-q4f16_1-MLC").
- config (object):
  - contextWindow (number): Max tokens (default 2048).
chat(messages)
- Sends a prompt to the model.
- messages (array): List of message objects [{ role: "user", content: "..." }].

REQUIREMENTS & SERVER CONFIG

HTTPS: WebGPU requires a secure context (HTTPS) or localhost.
Headers: If hook is not working in your app, try this hack. You should configure your development server (Vite/Next.js) to serve files with specific headers for multi-threading support.
// vite.config.js or next.config.js headers: { 'Cross-Origin-Embedder-Policy': 'require-corp', 'Cross-Origin-Opener-Policy': 'same-origin', }

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme