@dniskav/neuron

v0.2.3

Published

25 days ago

Minimal neural network from scratch — neuron, layer, network, backpropagation. No dependencies.

0High
0Medium
0Low

neural-network machine-learning backpropagation typescript

A minimal, dependency-free neural network library built from scratch in TypeScript. Designed for learning and experimentation — every line of math is readable.

What's inside

| Export | Description | |--------|-------------| | Neuron | Single-input neuron. The simplest possible unit: one weight, one bias. | | NeuronN | N-input neuron with Xavier initialization and configurable activation. | | Layer | A group of NeuronN neurons that share the same inputs. | | Network | Two-layer network (hidden + output) with backpropagation. | | NetworkN | Deep network of arbitrary depth. Define your architecture as [inputs, ...hidden, outputs]. | | LSTMLayer | Recurrent layer with persistent hidden and cell state. Learns sequences via BPTT. | | NetworkLSTM | Wraps an LSTMLayer + dense layers. Maintains memory across steps within an episode. | | NetworkTransformer | Full token-classification Transformer: embeddings → N blocks → per-token logits. | | NetworkTransformerRL | Transformer for RL agents: continuous input projection → causal attention → Q-values. Remembers the last N steps. | | TransformerBlock | One Transformer block: multi-head attention + FFN + LayerNorm × 2 with residuals. | | MultiHeadAttention | N parallel attention heads concatenated and projected to d_model. | | AttentionHead | Single scaled dot-product self-attention head (Q / K / V projections + backprop). | | LayerNorm | Layer normalization with learnable γ / β per feature. | | WeightMatrix | 2D weight matrix with per-scalar Adam optimizers. Optional per-element gradient clipping via update(dW, lr, clipValue). | | EmbeddingMatrix | Lookup-table embedding matrix with SGD updates. | | sigmoid relu tanh linear | Built-in activation functions. | | SGD Momentum Adam | Optimizers. Each instance tracks its own state per weight. | | mse crossEntropy | Loss functions for evaluation and logging. | | mseDelta crossEntropyDelta | Output-layer delta functions for use with trainWithDeltas. |

Install

npm install @dniskav/neuron

Usage

Single neuron — learn a threshold

import { Neuron } from "@dniskav/neuron";

const neuron = new Neuron();

// Train: output 1 if input >= 18, else 0
for (let epoch = 0; epoch < 1000; epoch++) {
  neuron.train(20, 1, 0.1); // adult
  neuron.train(15, 0, 0.1); // minor
}

console.log(neuron.predict(17)); // ~0.1 (minor)
console.log(neuron.predict(25)); // ~0.9 (adult)

N-input neuron — multi-feature classification

import { NeuronN } from "@dniskav/neuron";

const neuron = new NeuronN(3); // 3 inputs: R, G, B

// Teach it to detect bright colors (luminance > 0.65)
neuron.train([1, 1, 1], 1, 0.05); // white → bright
neuron.train([0, 0, 0], 0, 0.05); // black → dark

console.log(neuron.predict([0.9, 0.9, 0.9])); // close to 1

Network — non-linear classification

import { Network } from "@dniskav/neuron";

// 2 inputs → 8 hidden neurons → 1 output
const net = new Network(2, 8, 1);

// Train on XOR (not linearly separable — needs hidden layer)
const data = [[0,0,0], [0,1,1], [1,0,1], [1,1,0]];

for (let epoch = 0; epoch < 5000; epoch++) {
  for (const [x, y, t] of data) {
    net.train([x, y], t, 0.3);
  }
}

console.log(net.predict([0, 1])); // ~0.97
console.log(net.predict([1, 1])); // ~0.03

NetworkN — deep network with custom architecture

import { NetworkN } from "@dniskav/neuron";

// 3 inputs → 24 hidden → 16 hidden → 2 outputs
const net = new NetworkN([3, 24, 16, 2]);

// Train with multiple targets
net.train([0.5, 0.3, 0.8], [1, 0], 0.05);

// Predict returns an array — one value per output neuron
const [out1, out2] = net.predict([0.5, 0.3, 0.8]);

Activations — ReLU, tanh, and more

Pass an activation per layer. The last layer typically uses sigmoid for binary output or linear for regression.

import { NetworkN, relu, sigmoid } from "@dniskav/neuron";

const net = new NetworkN([3, 64, 32, 1], {
  activations: [relu, relu, sigmoid],
});

Available: sigmoid, relu, tanh, linear.

Optimizers — Adam, Momentum, SGD

Pass an optimizer factory. Each weight gets its own instance with independent state.

import { NetworkN, relu, sigmoid, Adam } from "@dniskav/neuron";

const net = new NetworkN([2, 64, 1], {
  activations: [relu, sigmoid],
  optimizer: () => new Adam(),          // default: beta1=0.9, beta2=0.999
});

// Momentum example
import { Momentum } from "@dniskav/neuron";
const net2 = new NetworkN([2, 32, 1], {
  optimizer: () => new Momentum(0.9),
});

Optimizers also work in NetworkLSTM (applied to the dense layers):

import { NetworkLSTM, relu, Adam } from "@dniskav/neuron";

const net = new NetworkLSTM(1, 8, [4, 1], {
  denseActivation: relu,
  optimizer: () => new Adam(0.001),
});

Loss utilities

import { mse, crossEntropy } from "@dniskav/neuron";

const predicted = net.predict([0.5, 0.3]);
console.log(mse(predicted, [1, 0]));
console.log(crossEntropy(predicted, [1, 0]));

trainWithDeltas — custom loss / physics-based gradients

NetworkN also exposes trainWithDeltas for when you compute your own output-layer deltas (e.g., from a physics simulation or a custom loss function):

import { NetworkN, mseDelta } from "@dniskav/neuron";

const net = new NetworkN([3, 16, 2]);
const pred = net.predict(inputs);

// Compute deltas manually using a helper, or from any external signal
const deltas = pred.map((p, i) => mseDelta(p, targets[i]));
net.trainWithDeltas(inputs, deltas, 0.01);

NetworkLSTM — recurrent network with memory

NetworkLSTM adds within-episode memory: the network can remember what happened in previous steps of the same sequence.

import { NetworkLSTM } from "@dniskav/neuron";

// 1 input → LSTM(8 hidden) → Dense(4) → 1 output
const net = new NetworkLSTM(1, 8, [4, 1]);

// Task: predict 1 if we're past step 3 in the episode, else 0
// A feedforward net can't do this — it has no memory of step count.

for (let epoch = 0; epoch < 300; epoch++) {
  net.resetState();             // clear memory at episode start

  const targets: number[][] = [];
  for (let step = 0; step < 6; step++) {
    net.predict([1]);           // same input every step
    targets.push([step >= 3 ? 1 : 0]);
  }

  net.train(targets, 0.05);    // BPTT across the full episode
}

// Run a fresh episode and check predictions
net.resetState();
for (let step = 0; step < 6; step++) {
  const [out] = net.predict([1]);
  console.log(`step ${step}: ${out.toFixed(2)}  (expected: ${step >= 3 ? 1 : 0})`);
}
// step 0: 0.07  (expected: 0)
// step 1: 0.11  (expected: 0)
// step 2: 0.18  (expected: 0)
// step 3: 0.81  (expected: 1)
// step 4: 0.89  (expected: 1)
// step 5: 0.93  (expected: 1)

The network learns to count steps using its hidden state — no external counter needed.

How it works

Each class applies an activation function to the weighted sum of inputs and uses gradient descent to update weights:

weight += lr × delta × input
bias   += lr × delta

NetworkN implements full backpropagation across all layers, propagating deltas from the output back to the first layer using the chain rule. The derivative of the chosen activation is applied at each layer.

NeuronN uses simplified Xavier initialization — weights start in [-√(1/n), +√(1/n)] — so gradients flow well from the start of training.

When an optimizer is used (e.g., Adam), the raw gradient is passed to the optimizer instead of being applied directly. Each weight maintains its own optimizer state (velocity, moments).

Build

npm run build   # outputs CJS + ESM + type declarations to dist/
npm run dev     # watch mode

For AI agents

If you are an AI agent or LLM working with this codebase, read AGENTS.md first. It contains the full class hierarchy, design constraints, and what this library does not do.

NetworkTransformer — self-attention over sequences

import { NetworkTransformer } from "@dniskav/neuron";

// Sudoku solver: 81 cells (tokens), values 0–9, predict digit 1–9 per cell
const net = new NetworkTransformer(81, {
  vocabSize: 10,   // digits 0–9
  d_model:   64,   // embedding / hidden dimension
  nHeads:    4,    // attention heads (d_k = d_model / nHeads = 16)
  d_ff:      128,  // FFN hidden size
  nBlocks:   4,    // number of transformer blocks
  nClasses:  9,    // output classes per token (digits 1–9)
});

// tokens: 81 cell values (0 = empty)
const puzzle   = [5,3,0, 0,7,0, 0,0,0, ...];
const targets  = [...];   // 81*9 one-hot values
const mask     = puzzle.map(v => v === 0);   // only train on empty cells

const loss = net.train(puzzle, targets, 0.001, mask);
// loss is cross-entropy (not MSE) — decreases from ~2.2 toward 0 as training progresses
const logits = net.predict(puzzle);   // 729 logits (81 × 9)

// Attention weights from all blocks for visualization
const weights = net.getAttentionWeights();
// weights[blockIdx][headIdx]  → seqLen × seqLen matrix

Each head in each block learns a different type of relationship (row, column, 3×3 box). The network figures this out by itself through training.

NetworkTransformerRL — Transformer for reinforcement learning

NetworkTransformerRL uses causal self-attention over a sliding window of past states to output Q-values. Unlike NetworkLSTM, the agent attends to specific past moments rather than compressing them into a single hidden vector.

import { NetworkTransformerRL } from "@dniskav/neuron";

// Agent sees the last 8 steps, each step is a 7-value sensor vector → 4 actions
const net = new NetworkTransformerRL(8, 7, {
  d_model:  32,
  nHeads:   2,
  d_ff:     64,
  nBlocks:  2,
  nActions: 4,
});

// Each step: feed the last N states as a sequence
const sequence = getLastNStates();      // number[][] — shape: [8, 7]
const qValues  = net.predict(sequence); // number[4]

// Q-learning update: train toward Bellman target
const action  = argmax(qValues);
const reward  = env.step(action);
const targets = qValues.slice();
targets[action] = reward + 0.99 * Math.max(...net.predict(nextSequence));

const loss = net.train(sequence, targets, 0.001);

The last step in the sequence gets 2× pooling weight — the most recent state contributes more to the decision.

// Inspect what the agent is attending to
const attnWeights = net.getAttentionWeights();
// attnWeights[blockIdx][headIdx] → seqLen × seqLen matrix

Possible improvements

Support for batches in training to improve efficiency and gradient stability.
Global gradient norm clipping — WeightMatrix.update supports per-element clipping; a utility to clip across all matrices by total norm would be more principled.
Learning rate warmup — standard practice for Transformers; ramp LR from 0 to target over the first N steps.
Pre-norm architecture — LayerNorm before the residual add (instead of after) is more stable for deep stacks.

License

MIT