autograd-ts
v0.1.5
Published
A tiny reverse-mode automatic differentiation engine in TypeScript
Maintainers
Readme
autograd-ts
A tiny TypeScript library implementing reverse-mode automatic differentiation (autograd) and neural network training from scratch.
The goal of the project is exploring the fundamental mechanism that allows neural networks to learn.
At its core, a neural network is a system of interconnected computations.
Producing an output is straightforward. However, learning requires determining how each value influenced that output and how those values should be adjusted when the prediction is wrong.
This library explores the mechanism that makes that possible.
Installation
npm install autograd-tsImport
import { Value, val, MLP, mse, sgd } from 'autograd-ts';Understanding Learning
A neural network can make a prediction by evaluating a series of mathematical operations.
For example, consider the simplified model:
z = x * y + xGiven the inputs:
x = 2
y = 3the model produces:
z = 8Suppose the expected value was:
target = 10Now we need to answer:
1. Which values contributed to the error?
2. How much did they contribute?
3. How should they change?For a simple expression this may seem manageable. For a neural network containing thousands or millions of parameters, it becomes a much more difficult problem.
This leads to the central question behind the project:
When a neural network makes a prediction, how does it know which weights to adjust?
A model must be able to trace a prediction back through the computations that produced it, measure how each value contributed to the final result, and determine how those values should change when the prediction is wrong.
Modern machine learning systems solve this problem through a sequence of concepts that build on one another:
Computation Graph
↓
Automatic Differentiation
↓
Backpropagation
↓
Gradient Descent
↓
LearningLayer 1: Computation Graphs
Consider:
z = x * y + xExpanded:
a = x * y
z = a + xThis expression can be represented as a directed acyclic graph (DAG):
x -----------\
* ---> a ----\
y -----------/ \
+ ---> z
x -------------------------/Each node represents a computed value.
| Node | Meaning | Value | | ---- | ------- | ----- | | x | Input | 2 | | y | Input | 3 | | a | x * y | 6 | | z | a + x | 8 |
Each edge represents a dependency. Notice that x appears twice in the graph, feeding into both the * and the + operations. This means changes to x affect the output through two separate paths, and both contributions must be accounted for when computing its gradient.
The graph gives us a structured record of how the computation unfolded. Without that record, there is no way to reason systematically about how each value influenced the output. This is what makes backward traversal possible.
Importantly, this graph is built automatically. When you write arithmetic using Value objects, each operation records its inputs as it runs. You never construct the graph manually.
Layer 2: Automatic Differentiation
Every operation in this library doesn't just compute a result, it also records how to differentiate through itself. Each Value node stores a _backward closure that knows how to propagate gradients to its inputs using the chain rule.
// When you write:
const z = x.mul(y).add(x);
// mul records: x.grad += y.data * out.grad
// y.grad += x.data * out.grad
// add records: a.grad += out.grad
// x.grad += out.gradNotice that gradients use += rather than =. Because x feeds into two operations, its gradient arrives from two separate paths and must be summed. Overwriting it would discard one of those contributions.
Calling z.backward() topologically sorts the computation graph and sets z.grad = 1, which seeds the process: the derivative of the output with respect to itself is always 1. The topological order guarantees that by the time a node propagates gradients to its inputs, all nodes downstream of it have already done so. The traversal then runs in reverse, each node firing its _backward to accumulate gradients up the chain.
For z = x * y + x with x = 2, y = 3:
x.grad = 4 (∂z/∂x)
y.grad = 2 (∂z/∂y)These gradients measure how sensitive the output is to each input. x.grad = 4 means a small increase in x increases z by approximately 4.
This process is known as reverse-mode automatic differentiation, the same technique used by modern machine learning frameworks like PyTorch and TensorFlow.
Layer 3: Neural Networks
A single neuron computes a weighted sum of its inputs, adds a bias, and passes the result through an activation function:
output = activation(w1*x1 + w2*x2 + b)which is simply a larger computation graph:
x1 --*
|
w1 --|
x2 --*
+ ---> activation ---> output
w2 --|
b ---|The weights (w1, w2) control how much each input contributes to the output. The bias (b) shifts the result, allowing the neuron to activate even when all inputs are zero. All three are Value nodes, so the autograd engine computes their gradients automatically. Nothing special is added to support neural networks. The same graph and differentiation system powers everything.
The activation function is what makes neural networks capable of learning complex patterns. Without it, stacking multiple layers of neurons would be equivalent to a single linear transformation, regardless of network depth, and the model could only represent straight-line relationships. Activations like tanh and relu introduce the non-linearity needed to model more complex functions.
Stacking neurons into layers, and layers into a network, scales this same idea. Each layer transforms its inputs, building increasingly abstract representations as the signal moves forward through the network.
Layer 4: Learning
The loss is a single scalar that measures how wrong the model's prediction is. The goal of training is to minimize it. What makes this possible is that the loss is the output of the same computation graph, so calling backward() on it distributes gradients all the way back to every weight in the network.
The gradient of the loss with respect to a weight tells you which direction increases the loss. Moving in the opposite direction reduces it. That is gradient descent. The learning rate controls how large each step is: too large and the updates overshoot, too small and training is slow.
Each training step follows the same pattern:
for (let epoch = 0; epoch < epochs; epoch++) {
const prediction = model.forward(inputs); // build the computation graph
const loss = mse(prediction, targets); // measure how wrong the prediction is
model.zeroGrad(); // clear gradients from the previous step
loss.backward(); // propagate gradients back through the graph
sgd(model.parameters(), learningRate); // nudge each weight to reduce loss
}Important:
zeroGrad()must be called before eachbackward(). Gradients accumulate with+=, so skipping this causes gradients from previous steps to corrupt the current update.
Over many epochs, the network adjusts its weights to reduce prediction error. This feedback loop is the mechanism that allows neural networks to learn.
API
Value
The core scalar node. Every computation builds on Value objects.
| Operation | Method | Notes |
| -------------- | -------------- | ------------------------------ |
| Addition | a.add(b) | b can be Value or number |
| Subtraction | a.sub(b) | b can be Value or number |
| Multiplication | a.mul(b) | b can be Value or number |
| Division | a.div(b) | b can be Value or number |
| Power | a.pow(n) | n is a plain number |
| Tanh | a.tanh() | Activation function |
| ReLU | a.relu() | Activation function |
| Backprop | a.backward() | Computes all gradients |
val(n) is a shorthand for new Value(n).
MLP
A Multi-Layer Perceptron, a fully connected neural network composed of stacked layers of neurons.
new MLP(inputSize, layerSizes, options?)| Parameter | Description |
| -------------------------- | --------------------------------------------------------------------------------------------------------- |
| inputSize | Number of input features |
| layerSizes | Output size of each layer, e.g. [4, 1] creates a hidden layer of 4 neurons and an output layer of 1 |
| options.hiddenActivation | 'tanh' (default), 'relu', or 'none' |
| options.outputActivation | 'none' (default), 'tanh', or 'relu' |
model.forward(inputs) // returns Value[], runs the forward pass
model.parameters() // returns all trainable Value nodes
model.zeroGrad() // resets all parameter gradients to zeromse(predictions, targets)
Computes mean squared error. Returns a Value that is part of the computation graph and supports backward().
sgd(parameters, learningRate)
Updates each parameter in-place: param.data -= learningRate * param.grad.
Examples
examples/
├── basic.ts computation graph, forward pass, and gradients
└── xor.ts training a network to learn XORBasic
z: 8
dz/dx: 4
dz/dy: 2XOR training
epoch=0 loss=1.824531
epoch=100 loss=0.712943
epoch=200 loss=0.403817
epoch=300 loss=0.182691
epoch=400 loss=0.087432
epoch=500 loss=0.041827
epoch=600 loss=0.021554
epoch=700 loss=0.012115
epoch=800 loss=0.007083
epoch=900 loss=0.004291
[0, 0] => 0.0213
[0, 1] => 0.9834
[1, 0] => 0.9772
[1, 1] => 0.0189Project Scope
A minimal foundation for training neural networks from scratch.
Core autograd engine: Value with add, sub, mul, div, pow, tanh, relu, and backward
Neural network components: Neuron, Layer, MLP
Training utilities: MSE loss, SGD optimizer
