npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

cactus-needlejs

v0.1.0

Published

Needle tool-calling model — runs locally via ONNX with bundled weights.

Readme

NeedleJS

Run the Cactus Compute Needle tool-calling model from JavaScript — no Python, no server, no model hosting. The ONNX weights and SentencePiece tokenizer ship inside this package, so npm install needlejs is everything you need.

Needle is a 26M-parameter encoder-decoder transformer that takes a natural-language query and a list of tool definitions and returns a JSON tool call. NeedleJS ports it to onnxruntime-web (which runs in both Node and the browser via WebAssembly).

How it works

User query + tools JSON
         │
         ▼
  NeedleTokenizer          (SentencePiece BPE, vocab 8192)
         │
         ▼
  needle_encoder.onnx      (12-layer encoder, d_model=512)
         │
         ▼
  needle_decoder.onnx      (8-layer decoder, autoregressive)
    + ConstrainedDecoder   (trie + JSON state machine keeps output valid)
         │
         ▼
  [{"name":"tool_name","arguments":{...}}]

Quickstart

import { Needle } from 'needlejs';

const needle = new Needle();
await needle.load();  // uses bundled model files

const tools = [{
  name: 'get_weather',
  description: 'Get current weather for a city.',
  parameters: {
    location: { type: 'string', description: 'City name.' },
  },
}];

const result = await needle.generate('What is the weather in San Francisco?', tools);
// '[{"name":"get_weather","arguments":{"location":"San Francisco"}}]'

load() resolves the bundled model files relative to the installed package and reads them from disk. Pass encoderPath, decoderPath, tokenizerPath, or vocabPath to override.

JavaScript API

Needle (high-level facade)

import { Needle } from 'needlejs';

const needle = new Needle();

// Load using bundled files (default), or override any subset:
await needle.load({
  // encoderPath, decoderPath, tokenizerPath, vocabPath
  // useWebGPU: true,  // try the WebGPU EP first, fall back to wasm
});

const json = await needle.generate(query, tools, {
  maxGenLen: 256,    // max tokens to generate
  maxEncLen: 1024,   // max encoder input length
  constrained: true, // enable JSON-constrained decoding
  onToken: (piece) => process.stdout.write(piece), // streaming callback
});

NeedleModel + NeedleTokenizer (low-level)

import { NeedleModel, NeedleTokenizer, generate, bundledModelPaths } from 'needlejs';

const { encoderPath, decoderPath, tokenizerPath, vocabPath } = bundledModelPaths();

const model = new NeedleModel();
await model.loadFromPaths(encoderPath, decoderPath);

const tokenizer = await NeedleTokenizer.fromPath(tokenizerPath, vocabPath);

const result = await generate(model, tokenizer, query, tools);

Regenerating the ONNX model files

The model files are pre-built and shipped with the package. To regenerate them (e.g. after a Needle upstream release):

# Install Python deps
pip install -r scripts/requirements_export.txt
pip install git+https://github.com/cactus-compute/needle

# Export (downloads checkpoint from HuggingFace, ~52 MB fp16 output)
python scripts/export_onnx.py --fp16 --validate --output-dir models/

This writes to:

models/
├── needle_encoder_fp16.onnx   (~28 MB)
├── needle_decoder_fp16.onnx   (~43 MB)
└── tokenizer/
    ├── needle.model
    └── needle.vocab

--validate runs onnxruntime against the PyTorch reference and asserts the max absolute difference is under 0.05 (fp16 tolerance). --fp16 halves model size with negligible accuracy loss.

Export options

| Flag | Default | Description | |------|---------|-------------| | --output-dir | ../models | Where to write the ONNX files | | --fp16 | off | Convert to float16 after export | | --validate | off | Validate output against PyTorch | | --checkpoint | (HuggingFace) | Path to a local checkpoint.pkl |

Development

npm install
npm test       # runs unit tests + end-to-end against the bundled models
npm run build  # build ESM + UMD library bundles to dist/

Project layout

needlejs/
├── scripts/
│   ├── export_onnx.py          # Python: JAX/Flax → PyTorch → ONNX
│   └── requirements_export.txt
├── src/
│   ├── tokenizer.js            # SentencePiece wrapper + tool name normalization
│   ├── constrained.js          # JSON-constrained decoding (Trie + state machine)
│   ├── model.js                # onnxruntime-web encoder/decoder sessions
│   ├── generator.js            # Autoregressive generation loop
│   └── index.js                # Public API + Needle facade
├── test/
│   ├── constrained.test.ts     # Trie, JsonStateMachine, logit masking
│   ├── tokenizer.test.ts       # snake_case conversion, tool normalization
│   └── needle.test.ts          # End-to-end against bundled models
└── models/                     # ONNX + tokenizer files shipped with the package

Technical notes

Why PyTorch re-implementation instead of jax2tf? Flax's nn.scan compiles to while_loop in XLA, which becomes an ONNX loop node. ONNX loop nodes perform poorly in onnxruntime-web's WASM backend. Re-implementing the model in PyTorch produces unrolled layers and a flat, fast ONNX graph.

Grouped-query attention Needle uses 8 query heads and 4 KV heads. The JS model wrapper repeats K/V tensors (repeat_interleave) to match the query head count, matching the Python implementation exactly.

Constrained decoding The decoder is constrained to only produce valid JSON matching the provided tool schema. A character-level trie over tool names and parameter keys, combined with a JSON state machine that tracks buffer context, masks invalid tokens to -Infinity before each argmax. This is a direct port of needle/model/constrained.py.

Vocab table source The WASM build of sentencepiece-js doesn't expose GetPieceSize / IdToPiece, so constrained decoding reads the companion needle.vocab file (shipped alongside needle.model) to build its token-string table.

License

MIT