vps-vector-node
v0.4.1
Published
One-command tool to turn any VPS into a production-ready, OpenAI-compatible embedding API server
Maintainers
Readme
VectorNode
One-command tool to turn any VPS into a production-ready, OpenAI-compatible embedding API server.
📖 Table of Contents
- Features
- Quick Start
- Available Models
- API Reference
- CLI Commands
- Configuration
- Deployment
- Troubleshooting
- Development
✨ Features
- OpenAI-compatible API: Drop-in replacement for
/v1/embeddings - CPU-only inference: Runs on low-end VPS (4GB RAM, 2 vCPU)
- Hugging Face integration: Download and cache models from HF Hub
- API key authentication: Secure your API with local key management
- Concurrency control: Automatic concurrency limits based on model size and hardware
- Request queueing: Handle traffic spikes gracefully
- Memory safety: Pre-flight checks prevent OOM crashes
🚀 Quick Start
Prerequisites
- Node.js 18+ (npm or npx)
- Hugging Face Account with an API token
- 4GB+ RAM (for small models; 8GB+ recommended)
Installation
npm install -g vps-vector-nodeOr run without installing:
npx vps-vector-node --helpStep 1: Get Your Hugging Face Token
VectorNode requires a Hugging Face token to download models.
Get a token:
- Go to https://huggingface.co/join (create account if needed)
- Navigate to https://huggingface.co/settings/tokens
- Click "New token" → Give it a name (e.g., "vectornode")
- Select "Read" permission
- Click "Generate token" and copy it immediately
Login to VectorNode:
npx vps-vector-node login hf --token hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxOr if installed globally:
vectornode login hf --token hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxThe token is stored securely in ~/.vectornode/config.json.
Token troubleshooting:
- Invalid token? Ensure you copied the entire token including
hf_prefix- Permission denied? Token must have "Read" permission
- Lost token? Create a new one at https://huggingface.co/settings/tokens
Step 2: Download a Model
Download the model before starting the server (avoids delays on first request):
npx vps-vector-node models:download bge-small-enOr if installed globally:
vectornode models:download bge-small-enSee Available Models for other options.
Step 3: Create an API Key
npx vps-vector-node key create --name devThis outputs an API key like sk-abc123.... Store it securely—you'll need it for every API request.
Step 4: Start the Server
npx vps-vector-node serve --model bge-small-en --port 3000Or if installed globally:
vectornode serve --model bge-small-en --port 3000You should see:
[INFO] Starting VectorNode server { model: 'bge-small-en', port: '3000', host: '0.0.0.0' }
[INFO] Model loaded successfully { model: 'bge-small-en', dimensions: 384 }
[INFO] Server listening on 0.0.0.0:3000Step 5: Test the API
Using curl:
curl -H "Authorization: Bearer sk-abc123..." \
-H "Content-Type: application/json" \
-X POST http://localhost:3000/v1/embeddings \
-d '{
"model": "bge-small-en",
"input": "hello world"
}'Using Postman:
- Method:
POST - URL:
http://localhost:3000/v1/embeddings - Headers:
Authorization: Bearer sk-abc123...Content-Type: application/json
- Body (raw JSON):
{ "model": "bge-small-en", "input": "hello world" }
Important Notes:
/v1/embeddingsonly accepts POST requests (GET will return 404)- Send the payload as raw JSON in the body, not as query parameters
- VectorNode handles Unicode (spaces, tabs, emoji) automatically:
{ "model": "bge-small-en", "input": "hello world 🚀\tTabbed text" }- If your JSON has extra surrounding quotes, remove them (e.g., remove
"before{and after})
📊 Available Models
Model Comparison Table
| Model ID | Dimensions | Size (GB) | Parameters | Min RAM | Rec RAM | Best Use Case | Multilingual | Latency |
|----------|-----------|-----------|-----------|---------|---------|---------------|--------------|---------|
| bge-small-en | 384 | 0.5 | ~110M | 3GB | 6GB | RAG, production search, English-focused | No | ~12ms |
| bge-base-en | 768 | 1.5 | ~125M | 4GB | 8GB | Better accuracy than small, English RAG | No | ~30ms |
| bge-m3 | 1024 | 2.0 | ~335M | 6GB | 12GB | Mixed dense/sparse search (M3) | Yes | ~40ms |
| e5-small | 384 | 0.4 | ~33M | 2GB | 4GB | General embedding, retrieval | Yes | ~10ms |
| e5-base | 768 | 0.8 | ~82M | 4GB | 8GB | Higher quality semantic search | Yes | ~20ms |
| e5-large | 1024 | 1.5 | ~335M | 8GB | 16GB | High-quality embeddings, heavy workloads | Yes | ~40ms |
| multilingual-e5-small | 384 | 0.5 | ~33M | 2GB | 4GB | Lightweight multilingual search | Yes | ~12ms |
| multilingual-e5-base | 768 | 1.0 | ~82M | 4GB | 8GB | Multilingual RAG / retrieval | Yes | ~25ms |
| gte-small | 384 | 0.4 | ~33M | 2GB | 4GB | General-purpose English embeddings | No | ~10ms |
| gte-base | 768 | 0.8 | ~82M | 4GB | 8GB | Better English search / retrieval | No | ~20ms |
| gte-large | 1024 | 1.5 | ~335M | 8GB | 16GB | Max accuracy English search | No | ~40ms |
| gte-multilingual-base | 768 | 1.0 | ~82M | 4GB | 8GB | Multilingual embeddings | Yes | ~25ms |
| gte-multilingual-large | 1024 | 1.8 | ~335M | 8GB | 16GB | High-quality multilingual | Yes | ~45ms |
| sentence-t5-base | 768 | 0.9 | ~220M | 4GB | 8GB | General semantic similarity | Yes | ~35ms |
| sentence-t5-large | 1024 | 1.8 | ~330M | 8GB | 16GB | Higher quality semantic tasks | Yes | ~60ms |
Hugging Face Repositories
These are the upstream model repositories used by VectorNode (all supported by @xenova/transformers):
| Model ID | Hugging Face Repo |
|----------|-------------------|
| bge-small-en | Xenova/bge-small-en-v1.5 |
| bge-base-en | Xenova/bge-base-en-v1.5 |
| bge-m3 | Xenova/bge-m3 |
| e5-small | intfloat/e5-small |
| e5-base | intfloat/e5-base |
| e5-large | intfloat/e5-large |
| multilingual-e5-small | intfloat/multilingual-e5-small |
| multilingual-e5-base | intfloat/multilingual-e5-base |
| gte-small | Supabase/gte-small |
| gte-base | thenlper/gte-base |
| gte-large | thenlper/gte-large |
| gte-multilingual-base | Alibaba-NLP/gte-multilingual-base |
| gte-multilingual-large | Xenova/gte-multilingual-large |
| sentence-t5-base | sentence-transformers/gtr-t5-base |
| sentence-t5-large | sentence-transformers/sentence-t5-large |
Quick Decision Tree
Choose your model based on your needs:
Ultra-fast (<15ms), edge device?
→e5-smallorgte-smallProduction English RAG/search?
→bge-small-en(balanced) orbge-base-en(better quality) orgte-large(best quality)Multilingual support needed?
→multilingual-e5-small(fast) ormultilingual-e5-base(balanced) orgte-multilingual-base(quality)General purpose, cost-conscious?
→e5-smallorgte-smallMaximum accuracy, sufficient hardware?
→bge-m3(multilingual, mixed search) orgte-large(English) orgte-multilingual-large(multilingual)
Recommended Defaults by Hardware
| Hardware Profile | Recommended Model | Why |
|------------------|------------------|-----|
| 2GB RAM, 1-2 vCPU | e5-small or gte-small | Minimal overhead, fast inference |
| 4GB RAM, 2-4 vCPU | bge-small-en or gte-small | Production-ready, good balance |
| 8GB RAM, 4-8 vCPU | bge-base-en or gte-base or multilingual-e5-base | Higher quality, still responsive |
| 16GB+ RAM, 8+ vCPU | bge-m3 or gte-large or gte-multilingual-large | Best quality, multilingual support |
List All Available Models
npx vps-vector-node models
# or if installed globally:
vectornode models🔌 API Reference
POST /v1/embeddings
Generate embeddings for input text(s).
Request:
{
"model": "bge-small-en",
"input": "hello world"
}For multiple inputs:
{
"model": "bge-small-en",
"input": ["text1", "text2", "text3"]
}Query Parameters:
normalize(boolean, default:true) - Whether to L2-normalize embeddings
Response (200 OK):
{
"object": "list",
"data": [
{
"index": 0,
"embedding": [0.0123, -0.0034, ...],
"object": "embedding"
}
],
"model": "bge-small-en",
"usage": {
"text_length": [11],
"tokens_estimated": [3]
},
"inference_time_ms": 12
}Status Codes:
200- Success202- Request queued (includesqueue_positionin response)400- Bad request (missing/invalid fields)401- Unauthorized (invalid/missing API key)429- Queue full500- Server error
GET /v1/models
List available models on the server.
Headers:
Authorization: Bearer sk-...(required)
Response (200 OK):
{
"object": "list",
"data": [
{
"id": "bge-small-en",
"dimensions": 384,
"size_gb": 0.5,
"quantization": "Q4_K",
"source": "huggingface",
"loaded": true
}
]
}GET /health
Health check endpoint (no authentication required).
Response (200 OK):
{
"status": "ok",
"model_loaded": "bge-small-en",
"uptime_seconds": 1023
}GET /metrics
Server performance metrics.
Headers:
Authorization: Bearer sk-...(required)
Response (200 OK):
{
"active_requests": 3,
"queue_depth": 2,
"memory_free_mb": 2432,
"cpu_load": 0.43,
"requests_per_sec": 12.5,
"avg_latency_ms": 45
}💻 CLI Commands
Login to Hugging Face
npx vps-vector-node login hf --token <token>
# or if installed globally:
vectornode login hf --token <token>Token is stored in ~/.vectornode/config.json.
Download Models
npx vps-vector-node models:download bge-small-en
# or if installed globally:
vectornode models:download bge-small-enServe
Start the embedding API server:
npx vps-vector-node serve --model <model-id> [options]Common Options:
--port <port>- Port to listen on (default:3000)--host <host>- Host to bind to (default:0.0.0.0)--max-queue-size <size>- Maximum queue size (default:100)--batch-size <size>- Batch size for processing (default:10)--threads <threads>- Number of threads (auto-detected)--verbose- Enable verbose logging--trace- Enable trace logging
Key Management
Create a new API key:
npx vps-vector-node key create --name <name>List all keys:
npx vps-vector-node key listRevoke a key:
npx vps-vector-node key revoke --name <name>⚙️ Configuration
Configuration is stored in ~/.vectornode/:
~/.vectornode/
├── config.json # HF token and settings
├── keys.json # API keys
├── models/ # Downloaded models cache
│ └── bge-small-en/
│ └── bge-base-en/
│ └── ...
└── logs/ # Trace logs (if --trace enabled)System Requirements
Minimum (for small models):
- RAM: 4GB
- CPU: 2 vCPU
- Storage: 5GB free
Recommended (for base models):
- RAM: 8GB
- CPU: 4 vCPU
- Storage: 10GB free
The server checks available memory before loading models and will refuse to start if insufficient.
🐳 Deployment
Docker
Dockerfile:
FROM node:18-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci --production
COPY . .
EXPOSE 3000
CMD ["node", "cli/index.js", "serve", "--model", "bge-small-en"]Build and run:
docker build -t vectornode .
docker run -p 3000:3000 vectornodeSystemd Service (Linux)
Create /etc/systemd/system/vectornode.service:
[Unit]
Description=VectorNode Embedding API
After=network.target
[Service]
Type=simple
User=vectornode
WorkingDirectory=/opt/vectornode
ExecStart=/usr/bin/node /opt/vectornode/cli/index.js serve --model bge-small-en --port 3000
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.targetEnable and start:
sudo systemctl enable vectornode
sudo systemctl start vectornode🔧 Troubleshooting
Model Download Fails
Invalid or missing token:
- Check if token is set:
cat ~/.vectornode/config.json - Re-login with correct token:
npx vps-vector-node login hf --token <your-token> - Get a new token: https://huggingface.co/settings/tokens
Token permission issues:
- Ensure your token has "Read" permission
Network/connectivity issues:
- Check connection to huggingface.co:
ping huggingface.co - Try again (may be temporary)
Not enough disk space:
- Check available space:
df -h - Clean up space or use a smaller model
Model Not Found
[ERROR] Model not found: bge-small-en.
Run: vectornode models:download bge-small-enSolution: Download the model first:
npx vps-vector-node models:download bge-small-enSee vectornode models for a list of all available models.
Out of Memory Errors
- Use a smaller model (e.g.,
e5-smallorgte-smallinstead ofgte-largeorbge-m3) - Reduce
--max-queue-sizeflag - Add swap space to the system
- Increase available RAM
Slow Inference
- Check CPU usage:
toporhtop - Try reducing
--batch-size - Use a quantized model
- Add more CPU resources
API Returns 401 Unauthorized
Missing or invalid API key:
- List keys:
npx vps-vector-node key list - Create new key:
npx vps-vector-node key create --name dev - Verify header format:
Authorization: Bearer sk-...
JSON Parse Errors
[ERROR] Unexpected token '"', ""{\\n \\\"mo\"... is not valid JSONSolution in Postman:
- Use Body tab → select
raw→ pickJSONfrom dropdown - Place payload in body (not in query params)
- Remove any extra quotes around the JSON
👨💻 Development
Setup
git clone <repo>
cd vectornode
npm installRun Tests
npm testLocal Development with Verbose Logging
node cli/index.js serve --model bge-small-en --verbose --port 3000Enable Trace Logging
node cli/index.js serve --model bge-small-en --trace --port 3000Logs are saved to ~/.vectornode/logs/
📄 License
MIT
🤝 Contributing
Contributions welcome! Please:
- Open an issue to discuss your idea
- Fork the repository
- Submit a PR with a clear description
