@genai-fi/nanogpt
v0.14.1
Published
A browser-native implementation of GPT language models built on TensorFlow.js, developed as part of the Finnish Generation AI research project. This library enables training, fine-tuning, and inference of transformer-based language models entirely in the
Readme
GenAI NanoGPT
A browser-native implementation of GPT language models built on TensorFlow.js, developed as part of the Finnish Generation AI research project. This library enables training, fine-tuning, and inference of transformer-based language models entirely in the browser with support for explainable AI (XAI) features. It is intended to be used as an educational tool for learning about the model training process since it targets mostly tiny models. In principle it could be adapted to load other pre-trained models from Hugging Face.
Live version available here: https://lm.gen-ai.fi
Overview
GenAI NanoGPT is inspired by Andrej Karpathy's NanoGPT but reimagined for the browser using TensorFlow.js. It provides a complete pipeline for:
- Training language models from scratch in the browser
- Loading pre-trained models from various sources (Hugging Face, local files)
- Generating text efficiently on a wide range of devices
- Analyzing model behavior through attention visualization and embeddings
- Optimizing performance across CPU, WebGL, and WebGPU backends
Key Features
- 🚀 Browser-Native: No server required - train and run models entirely client-side
- 📱 Works on Small Devices: Train models on iPads, phones, and Chromebooks - no powerful hardware needed
- 🎯 Multiple Backends: Automatic backend selection (CPU, WebGL, WebGPU) for optimal performance
- 🔧 Flexible Tokenization: Support for both character-level and BPE tokenizers
- 📊 XAI Support: Attention score visualization, gradient analysis, and embedding extraction
- 💾 Model Persistence: Save and load models in SafeTensors format
- ⚡ Performance Optimizations: Custom WebGPU kernels, gradient checkpointing, and mixed precision training
- 🎨 Real-time Training: Live training metrics and generation during training
Installation
npm install @genai-fi/nanogptQuick Start
Creating and Training a Model
import { TeachableLLM, selectBackend } from '@genai-fi/nanogpt';
// Select the best available backend
await selectBackend('webgpu'); // or 'webgl', 'cpu'
// Create a new model
const model = TeachableLLM.create('char', {
vocabSize: 200,
blockSize: 128, // Context window size
nLayer: 4, // Number of transformer layers
nHead: 4, // Number of attention heads
nEmbed: 192, // Embedding dimension
dropout: 0.1,
useRope: true, // Use Rotary Position Embeddings
});
// Training data
const trainingText = [
'The quick brown fox jumps over the lazy dog.',
'A journey of a thousand miles begins with a single step.',
// ... more text
];
// Train the model
await model.train(trainingText, {
batchSize: 16,
learningRate: 3e-4,
maxSteps: 1000,
logInterval: 10,
validationSplit: 0.1,
});
// Generate text
const output = await model.generateText('Once upon a time', {
maxLength: 100,
temperature: 0.8,
topP: 0.9,
});
console.log(output);Loading a Pre-trained Model
import { TeachableLLM, waitForModel } from '@genai-fi/nanogpt';
// Load from Hugging Face
const model = TeachableLLM.loadModel('username/model-name');
// Or load from a file
const fileInput = document.getElementById('fileInput');
fileInput.addEventListener('change', async (event) => {
const file = event.target.files[0];
const model = TeachableLLM.loadModel(file);
await waitForModel(model);
const text = await model.generateText('Hello');
console.log(text);
});Event Handlers and Real-time Updates
Monitoring Training Progress
Track training metrics in real-time with event handlers:
const model = TeachableLLM.create('char', config);
// Listen for training step updates
model.on('trainStep', (step, progress) => {
console.log(`Step ${step.step}/${progress.totalSteps}`);
console.log(`Loss: ${step.loss.toFixed(4)}`);
console.log(`Validation Loss: ${step.valLoss?.toFixed(4) || 'N/A'}`);
console.log(`Progress: ${(progress.progress * 100).toFixed(1)}%`);
console.log(`Time Remaining: ${progress.timeRemaining}s`);
// Update UI progress bar
updateProgressBar(progress.progress);
updateLossChart(step.loss, step.valLoss);
});
await model.train(trainingText, options);Real-time Token Generation
Stream generated tokens as they're produced:
const generator = model.generator();
// Listen for generated tokens
generator.on('tokens', (tokens) => {
// tokens is an array of new token IDs
const text = model.tokeniser.decode(tokens);
console.log('New tokens:', text);
// Update UI incrementally
appendToOutput(text);
});
// Generation lifecycle events
generator.on('start', () => {
console.log('Generation started');
showSpinner();
});
generator.on('stop', () => {
console.log('Generation complete');
hideSpinner();
});
generator.on('error', (error) => {
console.error('Generation error:', error);
});
// Start generation
await generator.generate('Once upon a time', {
maxLength: 200,
temperature: 0.8,
});Training on Small Devices
GenAI NanoGPT is designed to work efficiently on resource-constrained devices like iPads, phones, and Chromebooks:
Recommended Settings for Small Devices
// Smaller model configuration for mobile devices
const mobileModel = TeachableLLM.create('char', {
vocabSize: 200,
blockSize: 128, // Smaller context window
nLayer: 4, // Fewer layers
nHead: 3, // Fewer attention heads
nEmbed: 192, // Smaller embeddings
});
// Training options optimized for limited memory
await mobileModel.train(trainingText, {
batchSize: 8, // Smaller batch size
learningRate: 3e-4,
maxSteps: 500,
validationSplit: 0.1,
logInterval: 50,
gradientCheckpointing: true,
mixedPrecision: true,
});Tips for Training on Mobile Devices
- Start Small: Use smaller models (4 layers) and shorter context windows (128 tokens)
- Reduce Batch Size: Use batch sizes of 8-16 depending on available memory
- Use Character Tokenization: Character-level tokenizers use less memory than BPE
- Optimize Training Data: Use smaller datasets or train in stages
Advanced Usage
Attention Visualization
const generator = model.generator();
const text = await generator.generate('Prompt', {
attentionScores: true,
maxLength: 50,
});
// Get attention data for visualization
const attentionData = generator.getAttentionData();
// Shape: [num_tokens][num_layers][num_heads][seq_len][seq_len]
const probabilities = generator.getProbabilitiesData();
// Shape: [num_tokens][seq_len][vocab_size]Streaming Generation
const generator = model.generator();
generator.on('tokens', (tokens) => {
// Update UI with new tokens in real-time
updateDisplay(tokens);
});
generator.on('start', () => console.log('Generation started'));
generator.on('stop', () => console.log('Generation complete'));
await generator.generate('Once upon a time', {
maxLength: 200,
});Memory Management
// Enable profiling
model.enableProfiler = true;
// After training/generation
const profiler = model.getProfiler();
if (profiler) {
console.log('Memory stats:', profiler.getStats());
}
// Clean up
model.dispose();Examples
See the browser-tests directory for complete examples:
generate.html: Text generation with UIrope-train.html: Training a model with RoPEhf.html: Loading from Hugging Faceloader.html: Loading different file formatsperf.html: Performance testing
Development
Setup
git clone https://github.com/knicos/genai-nanogpt.git
cd genai-nanogpt
npm installBuilding
npm run build # Build for production
npm run dev # Development mode with watchTesting
npm test # Run all testsBrowser Tests
npm run test:gl # Start dev serverProject Structure
lib/
├── models/ # Model architectures (NanoGPT)
├── layers/ # Transformer layers (attention, MLP, etc.)
├── ops/ # Custom TensorFlow.js operations
│ ├── cpu/ # CPU kernels
│ ├── webgl/ # WebGL kernels
│ └── webgpu/ # WebGPU kernels
├── training/ # Training utilities and optimizers
├── tokeniser/ # Tokenization implementations
├── loader/ # Model loading/saving
├── utilities/ # Helper functions
└── TeachableLLM.ts # Main APICustom Operations
This library implements several custom TensorFlow.js operations optimized for transformer models:
- RoPE: Rotary Position Embeddings
- Attention Mask: Causal attention masking
- RMS Norm: Root Mean Square normalization
- Adam Optimizer: Extended Adam with weight decay
- 16-bit Operators: To enable mixed-precision training
See lib/ops for implementations.
Contributing
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Commit your changes:
git commit -m 'Add amazing feature' - Push to the branch:
git push origin feature/amazing-feature - Open a Pull Request
Code Style
This project uses ESLint and Prettier for code formatting:
npm run lint # Check code stylePerformance Tips
- Use WebGPU: Provides the best performance for training and inference
- Batch Size: Larger batches improve GPU utilization but require more memory
- Mixed Precision: Enable for faster training on supported hardware (coming soon)
- Gradient Checkpointing: Reduce memory usage during training, but slower
- Use RoPE: More efficient than absolute position embeddings
- Start Small on Mobile: Use 2-4 layers and batch size 2-8 on phones/tablets
Acknowledgments
- Inspired by Andrej Karpathy's NanoGPT
- Built with TensorFlow.js
- Developed as part of the Finnish Generation AI research project
Citation
If you use this library in your research, please cite:
@inproceedings{10.1145/3769994.3770061,
author = {Pope, Nicolas and Tedre, Matti},
title = {A Teachable Machine for Transformers},
year = {2025},
publisher = {Association for Computing Machinery},
doi = {10.1145/3769994.3770061},
booktitle = {Proceedings of the 25th Koli Calling International Conference on Computing Education Research},
}