npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

vps-vector-node

v0.4.1

Published

One-command tool to turn any VPS into a production-ready, OpenAI-compatible embedding API server

Readme

VectorNode

One-command tool to turn any VPS into a production-ready, OpenAI-compatible embedding API server.

📖 Table of Contents

✨ Features

  • OpenAI-compatible API: Drop-in replacement for /v1/embeddings
  • CPU-only inference: Runs on low-end VPS (4GB RAM, 2 vCPU)
  • Hugging Face integration: Download and cache models from HF Hub
  • API key authentication: Secure your API with local key management
  • Concurrency control: Automatic concurrency limits based on model size and hardware
  • Request queueing: Handle traffic spikes gracefully
  • Memory safety: Pre-flight checks prevent OOM crashes

🚀 Quick Start

Prerequisites

  • Node.js 18+ (npm or npx)
  • Hugging Face Account with an API token
  • 4GB+ RAM (for small models; 8GB+ recommended)

Installation

npm install -g vps-vector-node

Or run without installing:

npx vps-vector-node --help

Step 1: Get Your Hugging Face Token

VectorNode requires a Hugging Face token to download models.

Get a token:

  1. Go to https://huggingface.co/join (create account if needed)
  2. Navigate to https://huggingface.co/settings/tokens
  3. Click "New token" → Give it a name (e.g., "vectornode")
  4. Select "Read" permission
  5. Click "Generate token" and copy it immediately

Login to VectorNode:

npx vps-vector-node login hf --token hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Or if installed globally:

vectornode login hf --token hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

The token is stored securely in ~/.vectornode/config.json.

Token troubleshooting:

  • Invalid token? Ensure you copied the entire token including hf_ prefix
  • Permission denied? Token must have "Read" permission
  • Lost token? Create a new one at https://huggingface.co/settings/tokens

Step 2: Download a Model

Download the model before starting the server (avoids delays on first request):

npx vps-vector-node models:download bge-small-en

Or if installed globally:

vectornode models:download bge-small-en

See Available Models for other options.

Step 3: Create an API Key

npx vps-vector-node key create --name dev

This outputs an API key like sk-abc123.... Store it securely—you'll need it for every API request.

Step 4: Start the Server

npx vps-vector-node serve --model bge-small-en --port 3000

Or if installed globally:

vectornode serve --model bge-small-en --port 3000

You should see:

[INFO] Starting VectorNode server { model: 'bge-small-en', port: '3000', host: '0.0.0.0' }
[INFO] Model loaded successfully { model: 'bge-small-en', dimensions: 384 }
[INFO] Server listening on 0.0.0.0:3000

Step 5: Test the API

Using curl:

curl -H "Authorization: Bearer sk-abc123..." \
  -H "Content-Type: application/json" \
  -X POST http://localhost:3000/v1/embeddings \
  -d '{
    "model": "bge-small-en",
    "input": "hello world"
  }'

Using Postman:

  • Method: POST
  • URL: http://localhost:3000/v1/embeddings
  • Headers:
    • Authorization: Bearer sk-abc123...
    • Content-Type: application/json
  • Body (raw JSON):
    {
      "model": "bge-small-en",
      "input": "hello world"
    }

Important Notes:

  • /v1/embeddings only accepts POST requests (GET will return 404)
  • Send the payload as raw JSON in the body, not as query parameters
  • VectorNode handles Unicode (spaces, tabs, emoji) automatically:
    {
      "model": "bge-small-en",
      "input": "hello world 🚀\tTabbed text"
    }
  • If your JSON has extra surrounding quotes, remove them (e.g., remove " before { and after })

📊 Available Models

Model Comparison Table

| Model ID | Dimensions | Size (GB) | Parameters | Min RAM | Rec RAM | Best Use Case | Multilingual | Latency | |----------|-----------|-----------|-----------|---------|---------|---------------|--------------|---------| | bge-small-en | 384 | 0.5 | ~110M | 3GB | 6GB | RAG, production search, English-focused | No | ~12ms | | bge-base-en | 768 | 1.5 | ~125M | 4GB | 8GB | Better accuracy than small, English RAG | No | ~30ms | | bge-m3 | 1024 | 2.0 | ~335M | 6GB | 12GB | Mixed dense/sparse search (M3) | Yes | ~40ms | | e5-small | 384 | 0.4 | ~33M | 2GB | 4GB | General embedding, retrieval | Yes | ~10ms | | e5-base | 768 | 0.8 | ~82M | 4GB | 8GB | Higher quality semantic search | Yes | ~20ms | | e5-large | 1024 | 1.5 | ~335M | 8GB | 16GB | High-quality embeddings, heavy workloads | Yes | ~40ms | | multilingual-e5-small | 384 | 0.5 | ~33M | 2GB | 4GB | Lightweight multilingual search | Yes | ~12ms | | multilingual-e5-base | 768 | 1.0 | ~82M | 4GB | 8GB | Multilingual RAG / retrieval | Yes | ~25ms | | gte-small | 384 | 0.4 | ~33M | 2GB | 4GB | General-purpose English embeddings | No | ~10ms | | gte-base | 768 | 0.8 | ~82M | 4GB | 8GB | Better English search / retrieval | No | ~20ms | | gte-large | 1024 | 1.5 | ~335M | 8GB | 16GB | Max accuracy English search | No | ~40ms | | gte-multilingual-base | 768 | 1.0 | ~82M | 4GB | 8GB | Multilingual embeddings | Yes | ~25ms | | gte-multilingual-large | 1024 | 1.8 | ~335M | 8GB | 16GB | High-quality multilingual | Yes | ~45ms | | sentence-t5-base | 768 | 0.9 | ~220M | 4GB | 8GB | General semantic similarity | Yes | ~35ms | | sentence-t5-large | 1024 | 1.8 | ~330M | 8GB | 16GB | Higher quality semantic tasks | Yes | ~60ms |

Hugging Face Repositories

These are the upstream model repositories used by VectorNode (all supported by @xenova/transformers):

| Model ID | Hugging Face Repo | |----------|-------------------| | bge-small-en | Xenova/bge-small-en-v1.5 | | bge-base-en | Xenova/bge-base-en-v1.5 | | bge-m3 | Xenova/bge-m3 | | e5-small | intfloat/e5-small | | e5-base | intfloat/e5-base | | e5-large | intfloat/e5-large | | multilingual-e5-small | intfloat/multilingual-e5-small | | multilingual-e5-base | intfloat/multilingual-e5-base | | gte-small | Supabase/gte-small | | gte-base | thenlper/gte-base | | gte-large | thenlper/gte-large | | gte-multilingual-base | Alibaba-NLP/gte-multilingual-base | | gte-multilingual-large | Xenova/gte-multilingual-large | | sentence-t5-base | sentence-transformers/gtr-t5-base | | sentence-t5-large | sentence-transformers/sentence-t5-large |

Quick Decision Tree

Choose your model based on your needs:

  • Ultra-fast (<15ms), edge device?
    e5-small or gte-small

  • Production English RAG/search?
    bge-small-en (balanced) or bge-base-en (better quality) or gte-large (best quality)

  • Multilingual support needed?
    multilingual-e5-small (fast) or multilingual-e5-base (balanced) or gte-multilingual-base (quality)

  • General purpose, cost-conscious?
    e5-small or gte-small

  • Maximum accuracy, sufficient hardware?
    bge-m3 (multilingual, mixed search) or gte-large (English) or gte-multilingual-large (multilingual)

Recommended Defaults by Hardware

| Hardware Profile | Recommended Model | Why | |------------------|------------------|-----| | 2GB RAM, 1-2 vCPU | e5-small or gte-small | Minimal overhead, fast inference | | 4GB RAM, 2-4 vCPU | bge-small-en or gte-small | Production-ready, good balance | | 8GB RAM, 4-8 vCPU | bge-base-en or gte-base or multilingual-e5-base | Higher quality, still responsive | | 16GB+ RAM, 8+ vCPU | bge-m3 or gte-large or gte-multilingual-large | Best quality, multilingual support |

List All Available Models

npx vps-vector-node models
# or if installed globally:
vectornode models

🔌 API Reference

POST /v1/embeddings

Generate embeddings for input text(s).

Request:

{
  "model": "bge-small-en",
  "input": "hello world"
}

For multiple inputs:

{
  "model": "bge-small-en",
  "input": ["text1", "text2", "text3"]
}

Query Parameters:

  • normalize (boolean, default: true) - Whether to L2-normalize embeddings

Response (200 OK):

{
  "object": "list",
  "data": [
    {
      "index": 0,
      "embedding": [0.0123, -0.0034, ...],
      "object": "embedding"
    }
  ],
  "model": "bge-small-en",
  "usage": {
    "text_length": [11],
    "tokens_estimated": [3]
  },
  "inference_time_ms": 12
}

Status Codes:

  • 200 - Success
  • 202 - Request queued (includes queue_position in response)
  • 400 - Bad request (missing/invalid fields)
  • 401 - Unauthorized (invalid/missing API key)
  • 429 - Queue full
  • 500 - Server error

GET /v1/models

List available models on the server.

Headers:

  • Authorization: Bearer sk-... (required)

Response (200 OK):

{
  "object": "list",
  "data": [
    {
      "id": "bge-small-en",
      "dimensions": 384,
      "size_gb": 0.5,
      "quantization": "Q4_K",
      "source": "huggingface",
      "loaded": true
    }
  ]
}

GET /health

Health check endpoint (no authentication required).

Response (200 OK):

{
  "status": "ok",
  "model_loaded": "bge-small-en",
  "uptime_seconds": 1023
}

GET /metrics

Server performance metrics.

Headers:

  • Authorization: Bearer sk-... (required)

Response (200 OK):

{
  "active_requests": 3,
  "queue_depth": 2,
  "memory_free_mb": 2432,
  "cpu_load": 0.43,
  "requests_per_sec": 12.5,
  "avg_latency_ms": 45
}

💻 CLI Commands

Login to Hugging Face

npx vps-vector-node login hf --token <token>
# or if installed globally:
vectornode login hf --token <token>

Token is stored in ~/.vectornode/config.json.

Download Models

npx vps-vector-node models:download bge-small-en
# or if installed globally:
vectornode models:download bge-small-en

Serve

Start the embedding API server:

npx vps-vector-node serve --model <model-id> [options]

Common Options:

  • --port <port> - Port to listen on (default: 3000)
  • --host <host> - Host to bind to (default: 0.0.0.0)
  • --max-queue-size <size> - Maximum queue size (default: 100)
  • --batch-size <size> - Batch size for processing (default: 10)
  • --threads <threads> - Number of threads (auto-detected)
  • --verbose - Enable verbose logging
  • --trace - Enable trace logging

Key Management

Create a new API key:

npx vps-vector-node key create --name <name>

List all keys:

npx vps-vector-node key list

Revoke a key:

npx vps-vector-node key revoke --name <name>

⚙️ Configuration

Configuration is stored in ~/.vectornode/:

~/.vectornode/
├── config.json          # HF token and settings
├── keys.json            # API keys
├── models/              # Downloaded models cache
│   └── bge-small-en/
│   └── bge-base-en/
│   └── ...
└── logs/                # Trace logs (if --trace enabled)

System Requirements

Minimum (for small models):

  • RAM: 4GB
  • CPU: 2 vCPU
  • Storage: 5GB free

Recommended (for base models):

  • RAM: 8GB
  • CPU: 4 vCPU
  • Storage: 10GB free

The server checks available memory before loading models and will refuse to start if insufficient.


🐳 Deployment

Docker

Dockerfile:

FROM node:18-alpine

WORKDIR /app
COPY package*.json ./
RUN npm ci --production

COPY . .

EXPOSE 3000
CMD ["node", "cli/index.js", "serve", "--model", "bge-small-en"]

Build and run:

docker build -t vectornode .
docker run -p 3000:3000 vectornode

Systemd Service (Linux)

Create /etc/systemd/system/vectornode.service:

[Unit]
Description=VectorNode Embedding API
After=network.target

[Service]
Type=simple
User=vectornode
WorkingDirectory=/opt/vectornode
ExecStart=/usr/bin/node /opt/vectornode/cli/index.js serve --model bge-small-en --port 3000
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

Enable and start:

sudo systemctl enable vectornode
sudo systemctl start vectornode

🔧 Troubleshooting

Model Download Fails

Invalid or missing token:

  • Check if token is set: cat ~/.vectornode/config.json
  • Re-login with correct token: npx vps-vector-node login hf --token <your-token>
  • Get a new token: https://huggingface.co/settings/tokens

Token permission issues:

  • Ensure your token has "Read" permission

Network/connectivity issues:

  • Check connection to huggingface.co: ping huggingface.co
  • Try again (may be temporary)

Not enough disk space:

  • Check available space: df -h
  • Clean up space or use a smaller model

Model Not Found

[ERROR] Model not found: bge-small-en. 
Run: vectornode models:download bge-small-en

Solution: Download the model first:

npx vps-vector-node models:download bge-small-en

See vectornode models for a list of all available models.

Out of Memory Errors

  • Use a smaller model (e.g., e5-small or gte-small instead of gte-large or bge-m3)
  • Reduce --max-queue-size flag
  • Add swap space to the system
  • Increase available RAM

Slow Inference

  • Check CPU usage: top or htop
  • Try reducing --batch-size
  • Use a quantized model
  • Add more CPU resources

API Returns 401 Unauthorized

Missing or invalid API key:

  • List keys: npx vps-vector-node key list
  • Create new key: npx vps-vector-node key create --name dev
  • Verify header format: Authorization: Bearer sk-...

JSON Parse Errors

[ERROR] Unexpected token '"', ""{\\n  \\\"mo\"... is not valid JSON

Solution in Postman:

  • Use Body tab → select raw → pick JSON from dropdown
  • Place payload in body (not in query params)
  • Remove any extra quotes around the JSON

👨‍💻 Development

Setup

git clone <repo>
cd vectornode
npm install

Run Tests

npm test

Local Development with Verbose Logging

node cli/index.js serve --model bge-small-en --verbose --port 3000

Enable Trace Logging

node cli/index.js serve --model bge-small-en --trace --port 3000

Logs are saved to ~/.vectornode/logs/


📄 License

MIT

🤝 Contributing

Contributions welcome! Please:

  1. Open an issue to discuss your idea
  2. Fork the repository
  3. Submit a PR with a clear description