@glogwa/llama-roblox
v1.0.4
Published
LLaMA model inference implementation for Roblox using llama.cpp architecture
Maintainers
Readme
@glogwa/llama-roblox
Complete LLaMA model inference for Roblox using llama.cpp architecture
A production-ready implementation of llama.cpp for Roblox, enabling on-device LLM inference with GGUF model support.
✨ Features
- 🚀 Full GGUF v3 Support - Load quantized models directly
- 🎯 10 Quantization Formats - Q4_0, Q4_1, Q5_0, Q5_1, Q8_0, Q8_1, Q2_K-Q6_K, F16, F32, BF16
- 🧠 Complete Transformer - Multi-head attention, RoPE, feed-forward networks
- 💬 Chat Templates - ChatML, Llama 2, Alpaca, Vicuna
- 🎲 7 Sampling Strategies - Temperature, Top-K, Top-P, Min-P, Mirostat, and more
- ⚡ Optimized Performance - Cache-blocked matrix multiplication, KV cache
- 📦 Zero Dependencies - Pure TypeScript implementation
📥 Installation
npm install @glogwa/llama-roblox🚀 Quick Start
import { quickSetup } from "@glogwa/llama-roblox";
// Load your GGUF model (e.g., Qwen 3 0.6B Q4_K_M)
const modelBuffer = loadModelFromStorage();
// Quick setup with sensible defaults
const llm = quickSetup(modelBuffer, {
n_ctx: 2048,
temperature: 0.7,
});
// Generate text
const response = llm.generate("Hello, world!", 100);
print(response);
// Clean up
llm.free();💬 Chat Example
import { createLLM, ChatTemplateType } from "@glogwa/llama-roblox";
const llm = createLLM();
// Load and configure
llm.loadModel(modelBuffer);
llm.createContext({ n_ctx: 2048 });
llm.setupSampler({ temperature: 0.8 });
// Setup chat
llm.setupConversation(ChatTemplateType.CHATML);
llm.setSystemPrompt("You are a helpful AI assistant.");
// Multi-turn conversation
const response1 = llm.chat("What is TypeScript?", 100);
print(response1);
const response2 = llm.chat("How is it different from JavaScript?", 100);
print(response2);
llm.free();🎯 Supported Models
Works with any GGUF model, including:
- ✅ Qwen 3 (0.6B, 1.5B, 3B, 7B)
- ✅ LLaMA 2/3 (7B, 13B, 70B)
- ✅ Mistral (7B)
- ✅ Phi-2/3 (2.7B, 3.8B)
- ✅ TinyLlama (1.1B)
- ✅ And many more!
📊 Quantization Support
| Format | Bits | Description | Size Reduction | |--------|------|-------------|----------------| | F32 | 32 | Full precision | 1x (baseline) | | F16 | 16 | Half precision | 2x | | Q8_0 | 8 | 8-bit quantization | 4x | | Q6_K | 6 | 6-bit K-quants | 5.3x | | Q5_0/Q5_1 | 5 | 5-bit quantization | 6.4x | | Q4_0/Q4_1 | 4 | 4-bit quantization | 8x | | Q4_K_M | 4 | 4-bit K-quants (medium) | 8x | | Q3_K | 3 | 3-bit K-quants | 10.7x | | Q2_K | 2 | 2-bit K-quants | 16x |
🎲 Sampling Strategies
// Greedy (deterministic)
llm.setupSampler({ temperature: 0.0 });
// Balanced
llm.setupSampler({
temperature: 0.7,
top_k: 40,
top_p: 0.95,
});
// Creative
llm.setupSampler({
temperature: 1.0,
top_p: 0.98,
repeat_penalty: 1.1,
});
// Mirostat (perplexity control)
llm.setupSampler({
mirostat: 2,
mirostat_tau: 5.0,
mirostat_eta: 0.1,
});Building from Source
To build the project from scratch, use:
npm install
npm run buildOr with Rojo:
rojo build -o "LLM-on-roblox.rbxlx"For development with live sync:
rojo serveFor more help, check out the Rojo documentation.
Documentation
See the full documentation for detailed usage, API reference, and examples.
License
ISC License
Credits
Based on llama.cpp by Georgi Gerganov
