@glogwa/llama-roblox

v1.0.4

Published

5 months ago

LLaMA model inference implementation for Roblox using llama.cpp architecture

0High
0Medium
0Low

glogwa

roblox llama llm ai gguf inference roblox-ts

@glogwa/llama-roblox

Complete LLaMA model inference for Roblox using llama.cpp architecture

A production-ready implementation of llama.cpp for Roblox, enabling on-device LLM inference with GGUF model support.

✨ Features

🚀 Full GGUF v3 Support - Load quantized models directly
🎯 10 Quantization Formats - Q4_0, Q4_1, Q5_0, Q5_1, Q8_0, Q8_1, Q2_K-Q6_K, F16, F32, BF16
🧠 Complete Transformer - Multi-head attention, RoPE, feed-forward networks
💬 Chat Templates - ChatML, Llama 2, Alpaca, Vicuna
🎲 7 Sampling Strategies - Temperature, Top-K, Top-P, Min-P, Mirostat, and more
⚡ Optimized Performance - Cache-blocked matrix multiplication, KV cache
📦 Zero Dependencies - Pure TypeScript implementation

📥 Installation

npm install @glogwa/llama-roblox

🚀 Quick Start

import { quickSetup } from "@glogwa/llama-roblox";

// Load your GGUF model (e.g., Qwen 3 0.6B Q4_K_M)
const modelBuffer = loadModelFromStorage();

// Quick setup with sensible defaults
const llm = quickSetup(modelBuffer, {
    n_ctx: 2048,
    temperature: 0.7,
});

// Generate text
const response = llm.generate("Hello, world!", 100);
print(response);

// Clean up
llm.free();

💬 Chat Example

import { createLLM, ChatTemplateType } from "@glogwa/llama-roblox";

const llm = createLLM();

// Load and configure
llm.loadModel(modelBuffer);
llm.createContext({ n_ctx: 2048 });
llm.setupSampler({ temperature: 0.8 });

// Setup chat
llm.setupConversation(ChatTemplateType.CHATML);
llm.setSystemPrompt("You are a helpful AI assistant.");

// Multi-turn conversation
const response1 = llm.chat("What is TypeScript?", 100);
print(response1);

const response2 = llm.chat("How is it different from JavaScript?", 100);
print(response2);

llm.free();

🎯 Supported Models

Works with any GGUF model, including:

✅ Qwen 3 (0.6B, 1.5B, 3B, 7B)
✅ LLaMA 2/3 (7B, 13B, 70B)
✅ Mistral (7B)
✅ Phi-2/3 (2.7B, 3.8B)
✅ TinyLlama (1.1B)
✅ And many more!

📊 Quantization Support

| Format | Bits | Description | Size Reduction | |--------|------|-------------|----------------| | F32 | 32 | Full precision | 1x (baseline) | | F16 | 16 | Half precision | 2x | | Q8_0 | 8 | 8-bit quantization | 4x | | Q6_K | 6 | 6-bit K-quants | 5.3x | | Q5_0/Q5_1 | 5 | 5-bit quantization | 6.4x | | Q4_0/Q4_1 | 4 | 4-bit quantization | 8x | | Q4_K_M | 4 | 4-bit K-quants (medium) | 8x | | Q3_K | 3 | 3-bit K-quants | 10.7x | | Q2_K | 2 | 2-bit K-quants | 16x |

🎲 Sampling Strategies

// Greedy (deterministic)
llm.setupSampler({ temperature: 0.0 });

// Balanced
llm.setupSampler({
    temperature: 0.7,
    top_k: 40,
    top_p: 0.95,
});

// Creative
llm.setupSampler({
    temperature: 1.0,
    top_p: 0.98,
    repeat_penalty: 1.1,
});

// Mirostat (perplexity control)
llm.setupSampler({
    mirostat: 2,
    mirostat_tau: 5.0,
    mirostat_eta: 0.1,
});

Building from Source

To build the project from scratch, use:

npm install
npm run build

Or with Rojo:

rojo build -o "LLM-on-roblox.rbxlx"

For development with live sync:

rojo serve

For more help, check out the Rojo documentation.

Documentation

See the full documentation for detailed usage, API reference, and examples.

License

ISC License

Credits

Based on llama.cpp by Georgi Gerganov

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@glogwa/llama-roblox

✨ Features

📥 Installation

🚀 Quick Start

💬 Chat Example

🎯 Supported Models

📊 Quantization Support

🎲 Sampling Strategies

Building from Source

Documentation

License

Credits