quadtrix
v1.0.2
Published
CLI for running Quadtrix.cpp chat and local training.
Maintainers
Readme
Quadtrix.cpp
Quadtrix.cpp is a local language model project with several execution paths:
- A dependency-free C++17 transformer implementation with manual forward and backward passes.
- A PyTorch training and inference path for faster experimentation on CPU, CUDA, or supported accelerator backends.
- A FastAPI middleware layer for chat sessions, health checks, backend selection, and feedback.
- A React + TypeScript frontend for local chat, settings, session history, and model status.
- Optional package/CLI support through
bin/quadtrix.js.
[!IMPORTANT] Please be aware that several commands listed in this documentation—specifically those involving file paths and directory navigation—should not be directly copied and pasted into your terminal. Because file structures and path syntax (such as / vs ) vary significantly across operating systems like Windows, macOS, and Linux, you must manually adjust these arguments to match your local environment. Ensure you verify your current working directory and replace any placeholder paths with the absolute or relative path specific to your machine to avoid execution errors.
The project is designed as a technical learning implementation. The C++ path exposes the transformer internals directly: tensor operations, attention, layer normalization, cross-entropy, analytical gradients, AdamW, checkpointing, and autoregressive generation.
Contents
- System Architecture
- Repository Structure
- Requirements
- Setup
- Run the Project
- C++ Backend
- PyTorch Backend
- FastAPI Backend
- Frontend
- NPM CLI
- Configuration
- API Reference
- Training Details
- Reports and Results
- Troubleshooting
- License
System Architecture
Quadtrix has four main layers:
User Interface
React + Vite frontend
PWA assets and service worker
API Layer
FastAPI app in backend/
Session storage, request validation, CORS, health checks
Model Backends
C++ executable: main.cpp + include/
PyTorch checkpoint: engine/inference.py + engine/best_model.pt or engine/best_model .pt
Training and Data
C++ character-level training from data/input.txt
PyTorch GPT-2 BPE training from engine/input.txt or QUADTRIX_TRAIN_DATABackend Modes
| Backend | Path | Purpose |
|---|---|---|
| C++ CPU | main.cpp, include/, config/ | Manual transformer training, generation, terminal chat |
| PyTorch | engine/main.py, engine/inference.py | Faster training/inference with torch and tiktoken |
| iGPU | iGPU/ | Experimental integrated-GPU inference/training scripts |
| API | backend/main.py | FastAPI middleware used by the frontend |
| Web UI | frontend/ | Local chat interface and PWA shell |
Repository Structure
This is the current project layout, excluding generated caches such as .git/, .venv/, .npm-cache/, and build outputs.
Quadtrix.cpp/
|-- .github/
| |-- ISSUE_TEMPLATE/
| |-- workflows/
| | |-- ci.yml
| | `-- release.yml
| |-- dependabot.yml
| `-- pull_request_template.md
|-- backend/
| |-- middleware/
| | |-- error_handler.py
| | |-- logging.py
| | `-- __init__.py
| |-- router/
| | |-- chat.py
| | |-- feedback.py
| | |-- health.py
| | |-- sessions.py
| | `-- __init__.py
| |-- README.md
| |-- config.py
| |-- inference.py
| |-- main.py
| |-- models.py
| |-- requirements.txt
| |-- server.py
| `-- session_store.py
|-- bin/
| `-- quadtrix.js
|-- config/
| `-- config.h
|-- data/
| |-- data_set.py
| `-- input.txt
|-- engine/
| |-- data/
| | |-- data_set.py
| | `-- input.txt
| |-- fine-tune/
| | |-- chat.py
| | |-- data-set.py
| | `-- main.py
| |-- logs/
| |-- engine.c
| |-- export_weights.py
| |-- fineweb_30mb.txt
| |-- fineweb_dataset.py
| |-- inference.py
| `-- main.py
|-- frontend/
| |-- public/
| | |-- icon.svg
| | |-- manifest.webmanifest
| | `-- sw.js
| |-- src/
| | |-- api/
| | |-- components/
| | |-- hooks/
| | |-- store/
| | |-- types/
| | |-- utils/
| | |-- App.tsx
| | |-- index.css
| | |-- main.tsx
| | `-- registerServiceWorker.ts
| |-- index.html
| |-- manifest.webmanifest
| |-- package.json
| |-- tailwind.config.ts
| |-- tsconfig.json
| |-- vite.config.ts
| `-- sw.js
|-- gpu/
| |-- dataloader.h
| `-- model.h
|-- iGPU/
| |-- inference.py
| `-- main.py
|-- include/
| |-- attention.h
| |-- backward.h
| |-- block.h
| |-- dataloader.h
| |-- embedding.h
| |-- feedforward.h
| |-- gpt.h
| |-- layernorm.h
| |-- linear.h
| |-- tensor.h
| `-- torch_bridge.h
|-- model/
| |-- Cmakelists.txt
| `-- export_tokenizer.py
|-- scripts/
| `-- build_torch.ps1
|-- src/
| |-- torch_example.cpp
| `-- torch_main.cpp
|-- contributing.md
|-- LICENSE
|-- main.cpp
|-- package.json
|-- quadtrix.exe
|-- quadtrix_training_report.png
|-- README.md
|-- run.md
`-- SECURITY.mdImportant Files
| File | Role |
|---|---|
| main.cpp | C++ entry point for training, generation, and terminal chat |
| config/config.h | C++ hyperparameters, data path defaults, checkpoint path defaults |
| include/tensor.h | Custom tensor operations used by the C++ model |
| include/gpt.h | GPT language model implementation and generation path |
| include/backward.h | Analytical backpropagation and AdamW optimizer state |
| data/input.txt | Default C++ training corpus |
| engine/main.py | PyTorch training script |
| engine/inference.py | PyTorch checkpoint loading and generation |
| backend/main.py | FastAPI application entry point |
| backend/inference.py | Backend adapter for PyTorch and C++ model services |
| frontend/src/ | React chat application |
| bin/quadtrix.js | Node CLI wrapper for setup, chat, and training |
Requirements
Core Requirements
| Tool | Version | Used For |
|---|---:|---|
| Python | 3.10+ recommended | Backend, PyTorch training, PyTorch inference |
| Node.js | 18+ | Frontend and CLI |
| npm | bundled with Node.js | Frontend dependencies |
| C++ compiler | C++17 support | Building main.cpp |
| Git | any recent version | Cloning and source control |
Python Dependencies
The backend installs:
fastapi
uvicorn[standard]
pydantic
pydantic-settings
httpx
redis
torch
tiktokenThese are declared in backend/requirements.txt.
C++ Dependencies
The native C++ path has no third-party runtime dependency. It builds from:
main.cpp
config/config.h
include/*.hSetup
The commands below use PowerShell from the repository root:
cd C:\Users\Admin\Documents\GitHub\Quadtrix.cpp1. Create a Python Virtual Environment
python -m venv .venv
.\.venv\Scripts\python.exe -m pip install --upgrade pip2. Install Backend and PyTorch Dependencies
cd backend
..\.venv\Scripts\python.exe -m pip install -r requirements.txt
cd ..3. Install Frontend Dependencies
cd frontend
npm.cmd install
cd ..Use npm.cmd on Windows PowerShell if direct npm execution is blocked by PowerShell execution policy.
4. Build the Frontend
cd frontend
npm.cmd run build
cd ..5. Build the C++ Executable
If quadtrix.exe already exists, this step is optional. To rebuild:
g++ -std=c++17 -O2 -I. -Iinclude -o quadtrix.exe main.cppFor extra CPU optimization on GCC/Clang:
g++ -std=c++17 -O3 -march=native -I. -Iinclude -o quadtrix.exe main.cppRun the Project
Option A: Run PyTorch Chat in the Web UI
This is the simplest web path when a PyTorch checkpoint is available.
Terminal 1:
cd C:\Users\Admin\Documents\GitHub\Quadtrix.cpp\backend
..\.venv\Scripts\python.exe -m uvicorn main:app --host 127.0.0.1 --port 3001Terminal 2:
cd C:\Users\Admin\Documents\GitHub\Quadtrix.cpp\frontend
npm.cmd run devOpen:
http://localhost:5173Select the .pt or PyTorch backend in the UI. The FastAPI app loads the checkpoint through engine/inference.py.
Option B: Run C++ Terminal Chat
The C++ executable supports terminal chat directly:
cd C:\Users\Admin\Documents\GitHub\Quadtrix.cpp
.\quadtrix.exe data\input.txt --chatSet the number of generated tokens per answer:
.\quadtrix.exe data\input.txt --chat --chat-tokens 300The C++ chat path requires best_model.bin unless GPT_MODEL_PATH points to another checkpoint.
Option C: Run C++ Generation
.\quadtrix.exe data\input.txt --generateGeneration streams tokens until interrupted with Ctrl+C.
Option D: Train the C++ Model
.\quadtrix.exe data\input.txtTraining writes the best checkpoint to:
best_model.binOption E: Train the PyTorch Model
cd C:\Users\Admin\Documents\GitHub\Quadtrix.cpp
.\.venv\Scripts\python.exe engine\main.pyThe PyTorch script looks for engine/input.txt by default. If that file is not present, point the script at an existing corpus with QUADTRIX_TRAIN_DATA:
$env:QUADTRIX_TRAIN_DATA="C:\Users\Admin\Documents\GitHub\Quadtrix.cpp\data\input.txt"
.\.venv\Scripts\python.exe engine\main.pyThe PyTorch training script saves its checkpoint as:
best_model.ptC++ Backend
The C++ implementation is the educational core of the project.
C++ Features
- Character-level tokenizer built from the input corpus.
- Train/validation split through
DataLoader. - Decoder-only transformer architecture.
- Token and positional embeddings.
- Multi-head causal self-attention.
- Feed-forward MLP blocks.
- Pre-layer normalization and residual connections.
- Cross-entropy loss.
- Manual analytical backward pass.
- AdamW optimization.
- Checkpoint saving/loading.
- Autoregressive generation.
- Terminal chat mode.
C++ Runtime Arguments
quadtrix.exe [data_path] [--generate] [--chat] [--chat-tokens N]| Argument | Description |
|---|---|
| data_path | Plain-text corpus used to build the tokenizer and train/validation split |
| --generate | Load weights and continuously generate text |
| --chat | Load weights and start interactive terminal chat |
| --chat-tokens N | Set maximum generated tokens per chat response |
C++ Environment Variables
| Variable | Default | Description |
|---|---|---|
| GPT_DATA_PATH | data/input.txt | Overrides the default C++ data file |
| GPT_MODEL_PATH | best_model.bin | Overrides the model checkpoint path |
Example:
$env:GPT_MODEL_PATH="C:\models\quadtrix-best.bin"
.\quadtrix.exe data\input.txt --chatPyTorch Backend
The PyTorch path mirrors the transformer idea with torch, torch.nn, and tiktoken.
Training
.\.venv\Scripts\python.exe engine\main.pyImportant defaults in engine/main.py:
| Parameter | Value |
|---|---:|
| batch_size | 16 |
| block_size | 32 |
| max_iters | 10000 |
| eval_interval | 10 |
| learning_rate | 1e-3 |
| n_embd | 64 |
| n_head | 4 |
| n_layer | 4 |
| dropout | 0.1 |
| Tokenizer | GPT-2 BPE through tiktoken |
CLI Inference
Interactive chat:
.\.venv\Scripts\python.exe engine\inference.py --checkpoint "engine\best_model.pt"Single prompt:
.\.venv\Scripts\python.exe engine\inference.py --checkpoint "engine\best_model.pt" --prompt "Once upon a time" --max-new-tokens 100 --temperature 1.0Available inference flags:
| Flag | Description |
|---|---|
| --checkpoint | Path to .pt checkpoint |
| --prompt | Generate once instead of starting interactive chat |
| --max-new-tokens | Maximum generated tokens |
| --temperature | Sampling temperature |
| --top-k | Optional top-k sampling cutoff |
FastAPI Backend
The production-style API lives in backend/main.py.
Start it with:
cd C:\Users\Admin\Documents\GitHub\Quadtrix.cpp\backend
..\.venv\Scripts\python.exe -m uvicorn main:app --host 127.0.0.1 --port 3001The FastAPI backend provides:
POST /api/chatGET /api/healthGET /api/stats- session creation/listing/deletion
- message persistence through in-memory or Redis-backed session storage
- feedback capture
- CORS configuration for the frontend
- PyTorch checkpoint loading through
backend/inference.py
Backend Environment
Create backend/.env if you want to override defaults:
API_PORT=3001
CORS_ORIGINS=http://localhost:5173
REDIS_URL=
LOG_LEVEL=INFO
MAX_SESSIONS=1000
SESSION_TTL_HOURS=24
CPP_SERVER_URL=http://localhost:8080
TORCH_CHECKPOINT_PATH=../engine/best_model.pt
REQUEST_TIMEOUT_SECONDS=60The checked-in backend default is ../engine/best_model .pt with a space before .pt. If your checkpoint is named best_model.pt, set TORCH_CHECKPOINT_PATH=../engine/best_model.pt in backend/.env.
Note: the FastAPI C++ adapter expects a C++-compatible HTTP service at CPP_SERVER_URL with /health and /generate. The current main.cpp source provides terminal training/chat/generation. Use the PyTorch backend for the web UI unless you have a compatible C++ HTTP service running.
Frontend
The frontend is a React + TypeScript + Vite app.
Development Server
cd C:\Users\Admin\Documents\GitHub\Quadtrix.cpp\frontend
npm.cmd run devOpen:
http://localhost:5173Production Build
npm.cmd run buildPreview Production Build
npm.cmd run previewOpen the preview URL, usually:
http://localhost:4173PWA Files
The frontend includes installable web app files:
frontend/manifest.webmanifest
frontend/sw.js
frontend/public/manifest.webmanifest
frontend/public/sw.js
frontend/src/registerServiceWorker.tsInstall from Chrome or Edge after running the production preview. The installed app still needs the FastAPI backend running at the configured API URL.
NPM CLI
The root package exposes a quadtrix CLI through bin/quadtrix.js.
quadtrix chat [--api-port 3001] [--web-port 5173] [--no-open]
quadtrix train --backend cpp [--data data/input.txt]
quadtrix train --backend python
quadtrix setupFrom the repository root:
npm.cmd install
npm.cmd run build:frontend
node bin\quadtrix.js setupRun the packaged chat wrapper:
node bin\quadtrix.js chat --api-port 3001 --web-port 5173Train through the wrapper:
node bin\quadtrix.js train --backend cpp --data data\input.txt
node bin\quadtrix.js train --backend python --data data\input.txtConfiguration
C++ Configuration
Edit config/config.h and rebuild the C++ executable.
static const int BATCH_SIZE = 4;
static const int BLOCK_SIZE = 64;
static const int MAX_ITERS = 3000;
static const int EVAL_INTERVAL = 200;
static const float LEARNING_RATE = 3e-4f;
static const int EVAL_ITERS = 10;
static const int N_EMBD = 128;
static const int N_HEAD = 4;
static const int N_LAYER = 4;
static const float DROPOUT = 0.2f;C++ Model Shape
| Parameter | Meaning |
|---|---|
| BATCH_SIZE | Number of sequences per gradient step |
| BLOCK_SIZE | Context length in tokens |
| N_EMBD | Embedding width |
| N_HEAD | Number of attention heads |
| N_LAYER | Number of transformer blocks |
| DROPOUT | Dropout probability during training |
| LEARNING_RATE | AdamW learning rate |
| MAX_ITERS | Total training iterations |
| EVAL_INTERVAL | Evaluation/checkpoint interval |
Scaling Guide
| Goal | Change |
|---|---|
| Better local coherence | Increase BLOCK_SIZE |
| Higher model capacity | Increase N_EMBD and N_LAYER |
| Faster CPU runs | Use fewer layers or lower embedding width |
| Faster optimized build | Compile with -O3 -march=native |
| More stable loss estimates | Increase EVAL_ITERS |
API Reference
Base URL:
http://localhost:3001Health
GET /api/healthExample:
Invoke-RestMethod http://localhost:3001/api/healthStats
GET /api/statsExample:
Invoke-RestMethod http://localhost:3001/api/statsChat
POST /api/chatPyTorch backend example:
Invoke-RestMethod `
-Uri http://localhost:3001/api/chat `
-Method Post `
-ContentType "application/json" `
-Body '{
"session_id": null,
"prompt": "Once upon a time",
"max_tokens": 100,
"temperature": 1.0,
"stream": false,
"model_backend": "torch"
}'C++ backend example, when a compatible C++ HTTP service is available:
Invoke-RestMethod `
-Uri http://localhost:3001/api/chat `
-Method Post `
-ContentType "application/json" `
-Body '{
"session_id": null,
"prompt": "Once upon a time",
"max_tokens": 100,
"temperature": 1.0,
"stream": false,
"model_backend": "cpp"
}'Sessions
GET /api/sessions
POST /api/sessions
DELETE /api/sessions/{id}
GET /api/sessions/{id}/messagesFeedback
POST /api/feedbackTraining Details
Quadtrix uses a decoder-only transformer in the GPT family.
Input token IDs [B, T]
-> token embedding
-> positional embedding
-> transformer block x N
-> layer norm
-> masked multi-head self-attention
-> residual add
-> layer norm
-> feed-forward MLP
-> residual add
-> final layer norm
-> language-model head
-> logits [B, T, vocab_size]
-> cross-entropy lossC++ Manual Backpropagation
The C++ path implements gradients explicitly through:
- Cross-entropy and softmax.
- Final projection layer.
- Final layer normalization.
- Feed-forward MLP layers.
- ReLU activation.
- Attention projection.
- Scaled dot-product attention.
- Causal mask.
- Query, key, and value projections.
- Token and positional embeddings.
Gradient flow:
dLoss/dLogits
-> lm_head
-> final layer norm
-> each transformer block in reverse
-> feed-forward residual branch
-> layer norm 2
-> MLP fc2, activation, fc1
-> attention residual branch
-> layer norm 1
-> output projection
-> each attention head
-> attention weights @ value
-> softmax
-> scaled QK^T
-> query/key/value projections
-> token embedding gradients
-> position embedding gradientsCausal Masking
Future tokens are masked before softmax so each position can only attend to current and previous positions. Masked entries receive zero useful probability mass and therefore do not contribute meaningful gradient during the attention backward pass.
Optimizer
The C++ path uses AdamW-style updates with:
- First moment estimate.
- Second moment estimate.
- Weight decay.
- Learning-rate-controlled parameter updates.
Reports and Results
Leaderboard
| s.no | Time | Val BPB / Loss | Core | Description | Date | Contributors | |---:|---:|---:|---:|---|---|---| | 0 | 39.4 min | 1.3145 | 0.82M | Quadtrix CPU baseline, small data, fragmented output | 2026 | @Eamon2009 | | 1 | 61.3 min | 0.7176 | 10.82M | Quadtrix Colab large-scale run, coherent paragraphs, strong convergence | 2026 | @Eamon2009 | | 2 | 6.1 min | 0.9250 | 1.99M | Quadtrix T4 optimized run, fast training, stable learning, basic coherence | 2026 | @Eamon2009 | | 3 | 76.2 min | 1.6371 | ~0.82M | Quadtrix.cpp extended CPU training, 3000 iterations | 2026 | @Eamon2009 |
Hardware Execution Backends
| Device | Execution Path | |---|---| | CPU | Native C++ implementation and PyTorch CPU fallback | | CUDA | PyTorch CUDA acceleration when available | | iGPU | Experimental scripts for integrated-GPU style execution paths |
Training Metrics
Final C++ run metrics documented in the project:
| Metric | Value | |---|---:| | Validation loss | 1.6371 | | Parameters | 826,985 | | Vocabulary | 105 characters | | Training tokens | ~28.3M | | Validation tokens | ~3.1M | | Training time | ~76.2 min | | Architecture | 4 layers, 4 heads, 128 embedding width |
Python (PyTorch)

C++
Training Comparison
| Metric | Character-Level | Small Scale | Large Scale | |---|---:|---:|---:| | Parameters | 0.83M | 2.00M | 19.17M | | Layers | 4 | 4 | 4 | | Embedding Dim | 128 | 200 | 200 | | Attention Heads | 4 | 4 | 4 | | Context Length | 64 | 200 | 200 | | Corpus | TinyStories | TinyStories | Children's Stories | | Vocab Size | 105 char | 110 char | ~50K BPE | | Total Iterations | 3,000 | 5,000 | 5,000 | | Hardware | CPU/CUDA | CUDA T4 | Unknown | | Final Train Loss | 1.5632 | 0.9045 | Unknown | | Final Val Loss | 1.6371 | 0.9301 | Unknown | | Generalization Gap | 0.0739 | 0.0256 | Unknown |
Comparison to Related Projects
| Project | Focus | Language | Autograd | |---|---|---|---| | nanoGPT | Minimal GPT training | Python | PyTorch | | minGPT | Educational GPT | Python | PyTorch | | llama2.c | Inference-oriented C implementation | C | None | | Quadtrix.cpp | Training, inference, web UI, multi-backend experiments | C++/Python/TypeScript | Manual C++ + PyTorch |
Troubleshooting
PowerShell Blocks npm
Use:
npm.cmd install
npm.cmd run dev
npm.cmd run buildPython Backend Cannot Import Dependencies
Install dependencies into the repository virtual environment:
cd C:\Users\Admin\Documents\GitHub\Quadtrix.cpp\backend
..\.venv\Scripts\python.exe -m pip install -r requirements.txtPyTorch Checkpoint Is Missing
Check the configured checkpoint path:
Test-Path "C:\Users\Admin\Documents\GitHub\Quadtrix.cpp\engine\best_model.pt"If it is missing, train first:
cd C:\Users\Admin\Documents\GitHub\Quadtrix.cpp
.\.venv\Scripts\python.exe engine\main.pyC++ Chat Cannot Find Weights
The C++ chat and generation modes need best_model.bin.
Train first:
.\quadtrix.exe data\input.txtOr point to an existing checkpoint:
$env:GPT_MODEL_PATH="C:\path\to\best_model.bin"
.\quadtrix.exe data\input.txt --chatFrontend Cannot Reach Backend
Start the FastAPI backend:
cd C:\Users\Admin\Documents\GitHub\Quadtrix.cpp\backend
..\.venv\Scripts\python.exe -m uvicorn main:app --host 127.0.0.1 --port 3001Then check:
Invoke-RestMethod http://localhost:3001/api/healthPort Already in Use
Check common ports:
Get-NetTCPConnection -LocalPort 3001
Get-NetTCPConnection -LocalPort 5173
Get-NetTCPConnection -LocalPort 4173
Get-NetTCPConnection -LocalPort 8080Rebuild C++ After Changing Hyperparameters
Any change in config/config.h requires recompilation:
g++ -std=c++17 -O2 -I. -Iinclude -o quadtrix.exe main.cppReferences
- Vaswani et al., "Attention Is All You Need", 2017.
- Radford et al., GPT-2 technical work, 2019.
- nanoGPT and minGPT as educational reference points.
License
MIT
