npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@dniskav/neuron

v0.2.3

Published

Minimal neural network from scratch — neuron, layer, network, backpropagation. No dependencies.

Readme

npm license

A minimal, dependency-free neural network library built from scratch in TypeScript. Designed for learning and experimentation — every line of math is readable.

What's inside

| Export | Description | |--------|-------------| | Neuron | Single-input neuron. The simplest possible unit: one weight, one bias. | | NeuronN | N-input neuron with Xavier initialization and configurable activation. | | Layer | A group of NeuronN neurons that share the same inputs. | | Network | Two-layer network (hidden + output) with backpropagation. | | NetworkN | Deep network of arbitrary depth. Define your architecture as [inputs, ...hidden, outputs]. | | LSTMLayer | Recurrent layer with persistent hidden and cell state. Learns sequences via BPTT. | | NetworkLSTM | Wraps an LSTMLayer + dense layers. Maintains memory across steps within an episode. | | NetworkTransformer | Full token-classification Transformer: embeddings → N blocks → per-token logits. | | NetworkTransformerRL | Transformer for RL agents: continuous input projection → causal attention → Q-values. Remembers the last N steps. | | TransformerBlock | One Transformer block: multi-head attention + FFN + LayerNorm × 2 with residuals. | | MultiHeadAttention | N parallel attention heads concatenated and projected to d_model. | | AttentionHead | Single scaled dot-product self-attention head (Q / K / V projections + backprop). | | LayerNorm | Layer normalization with learnable γ / β per feature. | | WeightMatrix | 2D weight matrix with per-scalar Adam optimizers. Optional per-element gradient clipping via update(dW, lr, clipValue). | | EmbeddingMatrix | Lookup-table embedding matrix with SGD updates. | | sigmoid relu tanh linear | Built-in activation functions. | | SGD Momentum Adam | Optimizers. Each instance tracks its own state per weight. | | mse crossEntropy | Loss functions for evaluation and logging. | | mseDelta crossEntropyDelta | Output-layer delta functions for use with trainWithDeltas. |

Install

npm install @dniskav/neuron

Usage

Single neuron — learn a threshold

import { Neuron } from "@dniskav/neuron";

const neuron = new Neuron();

// Train: output 1 if input >= 18, else 0
for (let epoch = 0; epoch < 1000; epoch++) {
  neuron.train(20, 1, 0.1); // adult
  neuron.train(15, 0, 0.1); // minor
}

console.log(neuron.predict(17)); // ~0.1 (minor)
console.log(neuron.predict(25)); // ~0.9 (adult)

N-input neuron — multi-feature classification

import { NeuronN } from "@dniskav/neuron";

const neuron = new NeuronN(3); // 3 inputs: R, G, B

// Teach it to detect bright colors (luminance > 0.65)
neuron.train([1, 1, 1], 1, 0.05); // white → bright
neuron.train([0, 0, 0], 0, 0.05); // black → dark

console.log(neuron.predict([0.9, 0.9, 0.9])); // close to 1

Network — non-linear classification

import { Network } from "@dniskav/neuron";

// 2 inputs → 8 hidden neurons → 1 output
const net = new Network(2, 8, 1);

// Train on XOR (not linearly separable — needs hidden layer)
const data = [[0,0,0], [0,1,1], [1,0,1], [1,1,0]];

for (let epoch = 0; epoch < 5000; epoch++) {
  for (const [x, y, t] of data) {
    net.train([x, y], t, 0.3);
  }
}

console.log(net.predict([0, 1])); // ~0.97
console.log(net.predict([1, 1])); // ~0.03

NetworkN — deep network with custom architecture

import { NetworkN } from "@dniskav/neuron";

// 3 inputs → 24 hidden → 16 hidden → 2 outputs
const net = new NetworkN([3, 24, 16, 2]);

// Train with multiple targets
net.train([0.5, 0.3, 0.8], [1, 0], 0.05);

// Predict returns an array — one value per output neuron
const [out1, out2] = net.predict([0.5, 0.3, 0.8]);

Activations — ReLU, tanh, and more

Pass an activation per layer. The last layer typically uses sigmoid for binary output or linear for regression.

import { NetworkN, relu, sigmoid } from "@dniskav/neuron";

const net = new NetworkN([3, 64, 32, 1], {
  activations: [relu, relu, sigmoid],
});

Available: sigmoid, relu, tanh, linear.

Optimizers — Adam, Momentum, SGD

Pass an optimizer factory. Each weight gets its own instance with independent state.

import { NetworkN, relu, sigmoid, Adam } from "@dniskav/neuron";

const net = new NetworkN([2, 64, 1], {
  activations: [relu, sigmoid],
  optimizer: () => new Adam(),          // default: beta1=0.9, beta2=0.999
});

// Momentum example
import { Momentum } from "@dniskav/neuron";
const net2 = new NetworkN([2, 32, 1], {
  optimizer: () => new Momentum(0.9),
});

Optimizers also work in NetworkLSTM (applied to the dense layers):

import { NetworkLSTM, relu, Adam } from "@dniskav/neuron";

const net = new NetworkLSTM(1, 8, [4, 1], {
  denseActivation: relu,
  optimizer: () => new Adam(0.001),
});

Loss utilities

import { mse, crossEntropy } from "@dniskav/neuron";

const predicted = net.predict([0.5, 0.3]);
console.log(mse(predicted, [1, 0]));
console.log(crossEntropy(predicted, [1, 0]));

trainWithDeltas — custom loss / physics-based gradients

NetworkN also exposes trainWithDeltas for when you compute your own output-layer deltas (e.g., from a physics simulation or a custom loss function):

import { NetworkN, mseDelta } from "@dniskav/neuron";

const net = new NetworkN([3, 16, 2]);
const pred = net.predict(inputs);

// Compute deltas manually using a helper, or from any external signal
const deltas = pred.map((p, i) => mseDelta(p, targets[i]));
net.trainWithDeltas(inputs, deltas, 0.01);

NetworkLSTM — recurrent network with memory

NetworkLSTM adds within-episode memory: the network can remember what happened in previous steps of the same sequence.

import { NetworkLSTM } from "@dniskav/neuron";

// 1 input → LSTM(8 hidden) → Dense(4) → 1 output
const net = new NetworkLSTM(1, 8, [4, 1]);

// Task: predict 1 if we're past step 3 in the episode, else 0
// A feedforward net can't do this — it has no memory of step count.

for (let epoch = 0; epoch < 300; epoch++) {
  net.resetState();             // clear memory at episode start

  const targets: number[][] = [];
  for (let step = 0; step < 6; step++) {
    net.predict([1]);           // same input every step
    targets.push([step >= 3 ? 1 : 0]);
  }

  net.train(targets, 0.05);    // BPTT across the full episode
}

// Run a fresh episode and check predictions
net.resetState();
for (let step = 0; step < 6; step++) {
  const [out] = net.predict([1]);
  console.log(`step ${step}: ${out.toFixed(2)}  (expected: ${step >= 3 ? 1 : 0})`);
}
// step 0: 0.07  (expected: 0)
// step 1: 0.11  (expected: 0)
// step 2: 0.18  (expected: 0)
// step 3: 0.81  (expected: 1)
// step 4: 0.89  (expected: 1)
// step 5: 0.93  (expected: 1)

The network learns to count steps using its hidden state — no external counter needed.

How it works

Each class applies an activation function to the weighted sum of inputs and uses gradient descent to update weights:

weight += lr × delta × input
bias   += lr × delta

NetworkN implements full backpropagation across all layers, propagating deltas from the output back to the first layer using the chain rule. The derivative of the chosen activation is applied at each layer.

NeuronN uses simplified Xavier initialization — weights start in [-√(1/n), +√(1/n)] — so gradients flow well from the start of training.

When an optimizer is used (e.g., Adam), the raw gradient is passed to the optimizer instead of being applied directly. Each weight maintains its own optimizer state (velocity, moments).

Build

npm run build   # outputs CJS + ESM + type declarations to dist/
npm run dev     # watch mode

For AI agents

If you are an AI agent or LLM working with this codebase, read AGENTS.md first. It contains the full class hierarchy, design constraints, and what this library does not do.

NetworkTransformer — self-attention over sequences

import { NetworkTransformer } from "@dniskav/neuron";

// Sudoku solver: 81 cells (tokens), values 0–9, predict digit 1–9 per cell
const net = new NetworkTransformer(81, {
  vocabSize: 10,   // digits 0–9
  d_model:   64,   // embedding / hidden dimension
  nHeads:    4,    // attention heads (d_k = d_model / nHeads = 16)
  d_ff:      128,  // FFN hidden size
  nBlocks:   4,    // number of transformer blocks
  nClasses:  9,    // output classes per token (digits 1–9)
});

// tokens: 81 cell values (0 = empty)
const puzzle   = [5,3,0, 0,7,0, 0,0,0, ...];
const targets  = [...];   // 81*9 one-hot values
const mask     = puzzle.map(v => v === 0);   // only train on empty cells

const loss = net.train(puzzle, targets, 0.001, mask);
// loss is cross-entropy (not MSE) — decreases from ~2.2 toward 0 as training progresses
const logits = net.predict(puzzle);   // 729 logits (81 × 9)

// Attention weights from all blocks for visualization
const weights = net.getAttentionWeights();
// weights[blockIdx][headIdx]  → seqLen × seqLen matrix

Each head in each block learns a different type of relationship (row, column, 3×3 box). The network figures this out by itself through training.

NetworkTransformerRL — Transformer for reinforcement learning

NetworkTransformerRL uses causal self-attention over a sliding window of past states to output Q-values. Unlike NetworkLSTM, the agent attends to specific past moments rather than compressing them into a single hidden vector.

import { NetworkTransformerRL } from "@dniskav/neuron";

// Agent sees the last 8 steps, each step is a 7-value sensor vector → 4 actions
const net = new NetworkTransformerRL(8, 7, {
  d_model:  32,
  nHeads:   2,
  d_ff:     64,
  nBlocks:  2,
  nActions: 4,
});

// Each step: feed the last N states as a sequence
const sequence = getLastNStates();      // number[][] — shape: [8, 7]
const qValues  = net.predict(sequence); // number[4]

// Q-learning update: train toward Bellman target
const action  = argmax(qValues);
const reward  = env.step(action);
const targets = qValues.slice();
targets[action] = reward + 0.99 * Math.max(...net.predict(nextSequence));

const loss = net.train(sequence, targets, 0.001);

The last step in the sequence gets 2× pooling weight — the most recent state contributes more to the decision.

// Inspect what the agent is attending to
const attnWeights = net.getAttentionWeights();
// attnWeights[blockIdx][headIdx] → seqLen × seqLen matrix

Possible improvements

  1. Support for batches in training to improve efficiency and gradient stability.
  2. Global gradient norm clippingWeightMatrix.update supports per-element clipping; a utility to clip across all matrices by total norm would be more principled.
  3. Learning rate warmup — standard practice for Transformers; ramp LR from 0 to target over the first N steps.
  4. Pre-norm architecture — LayerNorm before the residual add (instead of after) is more stable for deep stacks.

License

MIT