edge-llm

v0.1.4

Published

a month ago

On-device LLM + RAG + memory for Expo & React Native using MediaPipe

0High
0Medium
0Low

developertirth

expo react-native llm rag on-device-ai mediapipe offline-ai edge-ai mediapipe RAG memory

edge-llm - On-Device AI for React Native

edge-llm is an Expo native module that brings powerful on-device AI capabilities to React Native applications using Google's MediaPipe framework. Run LLMs, embeddings, and RAG (Retrieval-Augmented Generation) entirely offline on mobile devices.

🚀 Features

✅ On-Device LLM Inference - Run Gemma 2B models directly on mobile
✅ Text Embeddings - Generate 384-dimensional vectors using MediaPipe BERT
✅ RAG (Retrieval-Augmented Generation) - Built-in vector database with cosine similarity search
✅ 100% Offline - No internet connection required
✅ Expo Compatible - Works with Expo development builds
✅ TypeScript Support - Full type definitions included

📦 Installation

1. Install the Module

npm install edge-llm
# or
yarn add edge-llm

2. Add Model Files

YOU CAN USE ANY MEDIAPIPE COMPATIBLE MODEL (.task, .litert etc.)

just put them in android\app\src\main\assets\models folder.

await Mediapipe.init({
          embeddingModel: "models/universal_sentence_encoder.tflite",
          llmModel: "models/gemma3n_e2b_int4.task"
        });

Download the required model files and place them in your project:

your-app/
├── android/app/src/main/assets/
│   ├── gemma3n_e2b_int4.task     # LLM model (required)
│   └── bert_embedder.tflite      # Embedding model (required)

Model Downloads:

Gemma 2B INT4 - Download from MediaPipe
BERT Embedder - Download from MediaPipe

3. Rebuild Your App

npx expo prebuild --clean
npx expo run:android

🎯 Quick Start

import Mediapipe from 'edge-llm';
import { useEffect, useState } from 'react';

export default function App() {
  const [response, setResponse] = useState('');

  useEffect(() => {
    const initializeAI = async () => {
      try {
        // Test the model
        const result = await Mediapipe.testModel();
        console.log('Model loaded:', result.status);

        // Generate text
        const output = await Mediapipe.generateText('Hello, how are you?');
        setResponse(output.response);
      } catch (error) {
        console.error('AI Error:', error);
      }
    };

    initializeAI();
  }, []);

  return <Text>{response}</Text>;
}

📚 API Reference

Core Functions

`mediapipeSmokeTest()`

Test if the MediaPipe engine initializes correctly.

const result = await Mediapipe.mediapipeSmokeTest();
// Returns: { status: 'ok', message: 'MediaPipe engine initialized successfully' }

`testModel()`

Run a simple test prompt through the model.

const result = await Mediapipe.testModel();
// Returns: {
//   status: 'success',
//   testPrompt: 'Hello',
//   testResponse: '...',
//   responseLength: 50,
//   duration: 1234
// }

`generateText(prompt: string)`

Generate text from a prompt.

const result = await Mediapipe.generateText('Explain quantum computing');
// Returns: {
//   status: 'success',
//   response: '...',
//   duration: 2000,
//   promptLength: 25,
//   responseLength: 150
// }

Embedding Functions

`embedTest(text: string)`

Generate embeddings for text (384-dimensional vector).

const result = await Mediapipe.embedTest('Hello world');
// Returns: {
//   length: 384,
//   sample: [18.3, -0.16, 14.69, -12.54, -23.61]
// }

RAG (Retrieval-Augmented Generation) Functions

`clearRagDatabase()`

Clear all documents from the RAG database.

const result = await Mediapipe.clearRagDatabase();
// Returns: { status: 'success', message: 'RAG database cleared' }

`getRagStats()`

Get statistics about the RAG database.

const result = await Mediapipe.getRagStats();
// Returns: { status: 'success', documentCount: 4 }

`addRagDocument(text: string)`

Add a document to the RAG database.

const result = await Mediapipe.addRagDocument(`
  Project Name: NEBULA-7
  Lead Engineer: Tirth Parmar
  Status: Active
`);
// Returns: { status: 'stored', length: 78, totalDocuments: 1 }

Validation:

Empty documents are rejected
Error JSON objects are automatically rejected
Returns current document count

`generateWithRag(prompt: string)`

Generate text using RAG (retrieves relevant context first).

const result = await Mediapipe.generateWithRag('Who is the lead engineer?');
// Returns: {
//   status: 'success',
//   response: 'Tirth Parmar',
//   contextUsed: '...',
//   promptLength: 152,
//   bestScore: 0.856
// }

How it works:

Embeds your query
Searches database for most similar document (cosine similarity)
Uses top match as context
Generates response using LLM

Memory System

EdgeLLM includes a powerful dual-memory system that gives you complete control over conversation context and knowledge storage.

Architecture

┌─────────────────────────────────────────┐
│         SHORT-TERM MEMORY               │
│  (JavaScript Array - Session Only)      │
│  • Conversation history                 │
│  • Recent context (20 messages max)     │
│  • Auto token limiting (1000 tokens)    │
└─────────────────────────────────────────┘
                    ↕
         Memory System API
                    ↕
┌─────────────────────────────────────────┐
│         LONG-TERM MEMORY                │
│  (SQLite + Embeddings - Persistent)     │
│  • Knowledge base                       │
│  • Vector search retrieval              │
│  • Survives app restarts                │
└─────────────────────────────────────────┘

Import Memory System

import {
  // Memory functions
  askWithMemory,
  askWithShortTermOnly,
  askWithLongTermOnly,
  
  // Short-term management
  addShortTermMemory,
  clearShortTermMemory,
  getShortTermContext,
  getAllShortTermMemory,
  getShortTermLength,
  
  // Long-term management
  storeLongTermMemory,
  clearLongTermMemory,
  getLongTermStats,
  
  // Utilities
  exportMemory,
  resetAllMemory
} from 'edge-llm/memory';

Three Ways to Ask Questions

1️⃣ `askWithMemory()` - Both Memories

Uses both conversation history AND knowledge base.

// Best for: General chat with full context
const answer = await askWithMemory('What did we discuss earlier?');

Features:

✅ Accesses conversation history
✅ Retrieves relevant knowledge from RAG
✅ Auto-appends to short-term memory
✅ Token-limited (prevents bloat)

2️⃣ `askWithShortTermOnly()` - Chat History Only

Uses only conversation history (no RAG lookup).

// Best for: Quick replies, chat continuity
const answer = await askWithShortTermOnly('What was my last question?');

Features:

✅ Fast (no vector search)
✅ Access to conversation context
✅ Auto-appends to short-term memory
❌ No knowledge base access

3️⃣ `askWithLongTermOnly()` - Knowledge Base Only

Uses only RAG knowledge base (ignores chat history).

// Best for: Knowledge queries, document Q&A
const answer = await askWithLongTermOnly('What is our refund policy?');

Features:

✅ Pure knowledge lookup
✅ Fresh context (no chat history bias)
❌ Doesn't remember conversation
❌ Doesn't auto-save to short-term

Configuration Options

All ask*() functions accept optional configuration:

const answer = await askWithMemory('Hello!', {
  maxShortTermTokens: 1500,  // Default: 1000
  maxLongTermTokens: 500,    // Default: 300
  systemPrompt: 'You are a friendly assistant.'
});

Manual Memory Management

Short-Term Memory (Session)

// Add messages manually
addShortTermMemory('user', 'What is AI?');
addShortTermMemory('assistant', 'AI is...');

// Get current context (token-limited)
const context = getShortTermContext(1000); // Max 1000 tokens
console.log(context);
// Output:
// USER: What is AI?
// ASSISTANT: AI is...

// Get all messages (for debugging)
const allMessages = getAllShortTermMemory();
console.log(allMessages);
// [{ role: 'user', content: 'What is AI?' }, ...]

// Get message count
const count = getShortTermLength(); // Returns: 2

// Clear session memory
clearShortTermMemory();

Long-Term Memory (Persistent)

// Store important information permanently
await storeLongTermMemory(`
  User Preference: Dark mode enabled
  Notification: 9 AM daily
  Language: English
`);

// Get statistics
const stats = await getLongTermStats();
console.log(`Stored: ${stats.documentCount} documents`);

// Clear all long-term memory
await clearLongTermMemory();

Complete Example: Chatbot with Memory

import { useState, useEffect } from 'react';
import {
  askWithMemory,
  storeLongTermMemory,
  getLongTermStats,
  getShortTermLength,
  resetAllMemory
} from 'edge-llm/memory';

export default function SmartChatbot() {
  const [messages, setMessages] = useState([]);
  const [input, setInput] = useState('');
  const [stats, setStats] = useState({ shortTerm: 0, longTerm: 0 });

  useEffect(() => {
    const initKnowledge = async () => {
      // Seed knowledge base
      await storeLongTermMemory(`
        Company: TechCorp
        Support Email: [email protected]
        Office Hours: 9 AM - 5 PM EST
      `);

      await storeLongTermMemory(`
        Product: CloudAI Pro
        Price: $99/month
        Features: Auto-scaling, 99.9% uptime, API access
      `);

      updateStats();
    };

    initKnowledge();
  }, []);

  const updateStats = async () => {
    const longTerm = await getLongTermStats();
    const shortTerm = getShortTermLength();
    setStats({ shortTerm, longTerm: longTerm.documentCount });
  };

  const sendMessage = async () => {
    if (!input.trim()) return;

    // Add user message to UI
    setMessages(prev => [...prev, { role: 'user', text: input }]);
    setInput('');

    try {
      // Ask with BOTH memories
      const response = await askWithMemory(input, {
        systemPrompt: 'You are TechCorp support assistant.',
        maxShortTermTokens: 1500,
        maxLongTermTokens: 400
      });

      // Add AI response to UI
      setMessages(prev => [...prev, { role: 'assistant', text: response }]);
      updateStats();
    } catch (error) {
      console.error('Chat error:', error);
    }
  };

  const handleReset = async () => {
    await resetAllMemory();
    setMessages([]);
    updateStats();
    alert('Memory reset!');
  };

  return (
    <View style={{ flex: 1 }}>
      {/* Stats Bar */}
      <View style={{ padding: 10, backgroundColor: '#f0f0f0' }}>
        <Text>📝 Chat: {stats.shortTerm} messages | 📚 Knowledge: {stats.longTerm} docs</Text>
      </View>

      {/* Messages */}
      <FlatList
        data={messages}
        keyExtractor={(_, i) => i.toString()}
        renderItem={({ item }) => (
          <View style={{
            padding: 12,
            margin: 8,
            backgroundColor: item.role === 'user' ? '#DCF8C6' : '#FFF',
            borderRadius: 8,
            alignSelf: item.role === 'user' ? 'flex-end' : 'flex-start'
          }}>
            <Text>{item.text}</Text>
          </View>
        )}
      />

      {/* Input */}
      <View style={{ flexDirection: 'row', padding: 12 }}>
        <TextInput
          value={input}
          onChangeText={setInput}
          placeholder="Ask anything..."
          style={{ flex: 1, padding: 12, backgroundColor: '#F0F0F0', borderRadius: 8 }}
        />
        <TouchableOpacity onPress={sendMessage} style={{ marginLeft: 8, padding: 12 }}>
          <Text>Send</Text>
        </TouchableOpacity>
        <TouchableOpacity onPress={handleReset} style={{ marginLeft: 8, padding: 12 }}>
          <Text>Reset</Text>
        </TouchableOpacity>
      </View>
    </View>
  );
}

Advanced: Selective Memory Storage

Decide what gets stored in long-term memory:

const response = await askWithShortTermOnly(userQuestion);

// Only store important facts
if (userQuestion.includes('remember') || userQuestion.includes('save')) {
  await storeLongTermMemory(`User noted: ${userQuestion}`);
}

Memory Export (Debugging)

import { exportMemory } from 'edge-llm/memory';

const memoryDump = await exportMemory();
console.log('Short-term:', memoryDump.shortTerm);
console.log('Tokens used:', memoryDump.shortTermTokenCount);
console.log('Long-term docs:', memoryDump.longTermDocCount);

// Output:
// Short-term: [
//   { role: 'user', content: 'Hello' },
//   { role: 'assistant', content: 'Hi there!' }
// ]
// Tokens used: 25
// Long-term docs: 5

Token Management

The memory system automatically limits tokens to prevent prompt bloat:

| Memory Type | Default Limit | Purpose | |-------------|---------------|---------| | Short-term | 1000 tokens | Conversation context | | Long-term | 300 tokens | RAG retrieval context |

Token estimation: ~4 characters ≈ 1 token

// Example: 1000 tokens ≈ 4000 characters ≈ ~10 messages

Memory Strategies

Strategy 1: Chat-Heavy Application

// Use short-term memory for natural conversation
const response = await askWithShortTermOnly(userInput);

// Only store critical facts
if (isImportantFact(userInput)) {
  await storeLongTermMemory(userInput);
}

Strategy 2: Knowledge-Heavy Application

// Pre-load knowledge base
await storeLongTermMemory(companyPolicies);
await storeLongTermMemory(productDocs);

// Use long-term memory for queries
const response = await askWithLongTermOnly('What is the refund policy?');

Strategy 3: Hybrid (Recommended)

// Default: Use both memories
const response = await askWithMemory(userInput);

// Override when needed
if (isKnowledgeQuery(userInput)) {
  const response = await askWithLongTermOnly(userInput);
}

Utility Functions

`closeEngine()`

Manually close the LLM engine (frees memory).

const result = await Mediapipe.closeEngine();
// Returns: 'Engine closed'

`getEngineStatus()`

Check if the engine is initialized.

const status = await Mediapipe.getEngineStatus();
// Returns: { initialized: true, ready: true }

🏗️ Complete RAG Example

import Mediapipe from 'edge-llm';
import { useEffect, useRef, useState } from 'react';
import { View, Text, TextInput, TouchableOpacity, FlatList } from 'react-native';

export default function RAGChatApp() {
  const [messages, setMessages] = useState([]);
  const [input, setInput] = useState('');
  const [loading, setLoading] = useState(false);
  const booted = useRef(false);

  useEffect(() => {
    if (booted.current) return;
    booted.current = true;

    const initializeRAG = async () => {
      try {
        // Clear old data
        await Mediapipe.clearRagDatabase();
        console.log('✅ Database cleared');

        // Add knowledge documents
        await Mediapipe.addRagDocument(`
          Company: TechCorp
          Founded: 2020
          CEO: Jane Smith
          Products: AI Solutions, Cloud Services
        `);

        await Mediapipe.addRagDocument(`
          Product: CloudAI Pro
          Price: $99/month
          Features: Auto-scaling, 99.9% uptime, 24/7 support
        `);

        // Verify
        const stats = await Mediapipe.getRagStats();
        console.log(`✅ RAG ready with ${stats.documentCount} documents`);
      } catch (error) {
        console.error('❌ RAG init failed:', error);
      }
    };

    initializeRAG();
  }, []);

  const sendMessage = async () => {
    if (!input.trim() || loading) return;

    const userMsg = { id: Date.now(), role: 'user', text: input };
    setMessages(prev => [...prev, userMsg]);
    setInput('');
    setLoading(true);

    try {
      const result = await Mediapipe.generateWithRag(input);

      if (result.status === 'success') {
        const aiMsg = {
          id: Date.now() + 1,
          role: 'assistant',
          text: result.response
        };
        setMessages(prev => [...prev, aiMsg]);
      }
    } catch (error) {
      console.error('❌ Generation failed:', error);
    } finally {
      setLoading(false);
    }
  };

  return (
    <View style={{ flex: 1 }}>
      <FlatList
        data={messages}
        keyExtractor={item => item.id.toString()}
        renderItem={({ item }) => (
          <View style={{
            padding: 12,
            margin: 8,
            backgroundColor: item.role === 'user' ? '#DCF8C6' : '#FFF',
            borderRadius: 8,
            alignSelf: item.role === 'user' ? 'flex-end' : 'flex-start',
            maxWidth: '80%'
          }}>
            <Text>{item.text}</Text>
          </View>
        )}
      />

      <View style={{ flexDirection: 'row', padding: 12, borderTopWidth: 1, borderColor: '#DDD' }}>
        <TextInput
          value={input}
          onChangeText={setInput}
          placeholder="Ask anything..."
          style={{
            flex: 1,
            padding: 12,
            backgroundColor: '#F0F0F0',
            borderRadius: 8,
            marginRight: 8
          }}
        />
        <TouchableOpacity
          onPress={sendMessage}
          disabled={loading}
          style={{
            backgroundColor: loading ? '#CCC' : '#4CAF50',
            paddingHorizontal: 20,
            paddingVertical: 12,
            borderRadius: 8,
            justifyContent: 'center'
          }}
        >
          <Text style={{ color: '#FFF', fontWeight: '600' }}>
            {loading ? 'Thinking...' : 'Send'}
          </Text>
        </TouchableOpacity>
      </View>
    </View>
  );
}

🔧 Configuration

Customize Token Limit

The default maximum token limit is 4096. To change it, modify MediapipeModule.kt:

val options = LlmInferenceOptions.builder()
    .setModelPath(modelFile.absolutePath)
    .setMaxTokens(8192)  // Increase token limit
    .build()

Using Different Models

Replace model files in android/app/src/main/assets/ and update the model name in code. it should be mediapipe compatible.

🏗️ Architecture

┌─────────────────────────────────────────────┐
│           React Native Layer                │
│  (TypeScript/JavaScript)                    │
└──────────────────┬──────────────────────────┘
                   │ Expo Module Bridge
┌──────────────────▼──────────────────────────┐
│        MediapipeModule (Kotlin)             │
│  - LLM Inference Engine                     │
│  - Text Embedder                            │
│  - RAG Manager                              │
└──────────────────┬──────────────────────────┘
                   │
      ┌────────────┼────────────┐
      │            │            │
┌─────▼────┐ ┌────▼─────┐ ┌───▼──────┐
│ RagDB    │ │ RagMath  │ │ MediaPipe│
│ (SQLite) │ │ (Cosine) │ │ Framework│
└──────────┘ └──────────┘ └──────────┘

🤝 Contributing

Found a bug or want to contribute? Here's how:

Report Issues: Check similarity scores in logs and share full error traces
Test Changes: Always rebuild with npx expo prebuild --clean
Document Changes: Update this README with any API changes

📄 License

MIT License - See LICENSE file for details

🙏 Acknowledgments

MediaPipe Team - For the incredible on-device ML framework
Google - For Gemma models
Anthropic - For Claude AI assistance in development

📞 Support

Issues: GitHub Issues
Discussions: GitHub Discussions
Email: [email protected]

🗺️ Roadmap

[ ] iOS Support
[ ] Streaming responses
[ ] Memory management improvements
[ ] Conversation history management
[ ] Voice input/output
[ ] Model quantization utilities

Built with ❤️ by Tirth Parmar

Last Updated: January 24, 2026

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

edge-llm - On-Device AI for React Native

🚀 Features

📦 Installation

1. Install the Module

2. Add Model Files

3. Rebuild Your App

🎯 Quick Start

📚 API Reference

Core Functions

mediapipeSmokeTest()

testModel()

generateText(prompt: string)

Embedding Functions

embedTest(text: string)

RAG (Retrieval-Augmented Generation) Functions

clearRagDatabase()

getRagStats()

addRagDocument(text: string)

generateWithRag(prompt: string)

Memory System

Architecture

Import Memory System

Three Ways to Ask Questions

1️⃣ askWithMemory() - Both Memories

2️⃣ askWithShortTermOnly() - Chat History Only

3️⃣ askWithLongTermOnly() - Knowledge Base Only

Configuration Options

Manual Memory Management

Short-Term Memory (Session)

Long-Term Memory (Persistent)

Complete Example: Chatbot with Memory

Advanced: Selective Memory Storage

Memory Export (Debugging)

Token Management

Memory Strategies

Strategy 1: Chat-Heavy Application

Strategy 2: Knowledge-Heavy Application

Strategy 3: Hybrid (Recommended)

Utility Functions

closeEngine()

getEngineStatus()

🏗️ Complete RAG Example

🔧 Configuration

Customize Token Limit

Using Different Models

🏗️ Architecture

🤝 Contributing

📄 License

🙏 Acknowledgments

📞 Support

🗺️ Roadmap

`mediapipeSmokeTest()`

`testModel()`

`generateText(prompt: string)`

`embedTest(text: string)`

`clearRagDatabase()`

`getRagStats()`

`addRagDocument(text: string)`

`generateWithRag(prompt: string)`

1️⃣ `askWithMemory()` - Both Memories

2️⃣ `askWithShortTermOnly()` - Chat History Only

3️⃣ `askWithLongTermOnly()` - Knowledge Base Only

`closeEngine()`

`getEngineStatus()`