n8n-nodes-qwen-embedding
v0.10.0
Published
n8n nodes for Ollama embeddings - Dynamic model loading + Support for Qwen, EmbeddingGemma, Nomic and more. Generate text embeddings for vector stores and AI applications.
Downloads
191
Maintainers
Readme
n8n-nodes-ollama-embeddings
n8n community nodes for generating text embeddings via Ollama with your n8n workflows. Supports multiple embedding models including Qwen, EmbeddingGemma, Nomic and more for vector stores, similarity search, and AI applications.
🌟 Features
- Two Specialized Nodes:
- Ollama Embeddings: For vector store integration (Supabase, Qdrant, PGVector, etc.)
- Ollama Embeddings Tool: For direct embedding generation in workflows
- Multi-Model Support: Works with Qwen, EmbeddingGemma, Nomic, Snowflake and more
- LangChain Compatible: Seamlessly integrates with n8n's AI ecosystem
- Flexible Dimensions: Auto-detection and validation based on model capabilities
- Instruction-Aware: Optimized embeddings for queries vs documents
- Batch Processing: Efficient bulk embedding generation
- Ollama Integration: Direct connection to Ollama for embedding generation
- No Middleware Required: Works directly with Ollama's API endpoints
- Compact Format Option: Single-line embeddings for easy copy/paste
📦 Installation
In n8n
- Go to Settings > Community Nodes
- Search for
n8n-nodes-qwen-embedding - Click Install
Manual Installation
npm install n8n-nodes-qwen-embedding🚀 Prerequisites
You need to have Ollama installed and running with a Qwen model:
Install Ollama: Visit https://ollama.com for installation instructions
Pull an embedding model:
# Qwen3-Embedding models (specialized for embeddings):
ollama pull qwen3-embedding:0.6b # 1024 dimensions, 32K context
# EmbeddingGemma models (Google's lightweight embeddings):
ollama pull embeddinggemma:300m # 768 dimensions, 2K context
ollama pull embeddinggemma:300m-bf16 # Higher precision variant
# Nomic Embed models (performant embeddings):
ollama pull nomic-embed-text # 768 dimensions, 8K context
# Snowflake Arctic Embed:
ollama pull snowflake-arctic-embed:110m # 1024 dimensions- Verify Ollama is running:
ollama list # Should show your pulled modelsOllama will be available at http://localhost:11434 by default
📊 Supported Models Comparison
| Model | Dimensions | Context | Size | Best For | |-------|------------|---------|------|----------| | qwen3-embedding:0.6b | 32-1024 (flexible) | 32K tokens | ~639MB | General purpose, multilingual | | embeddinggemma:300m | 128-768 (MRL) | 2K tokens | ~338MB | Lightweight, fast inference | | nomic-embed-text | 768 (fixed) | 8K tokens | ~274MB | Balanced performance | | snowflake-arctic-embed | 1024 (fixed) | 512 tokens | ~332MB | Short text, high precision |
Choosing a Model:
- Qwen: Best overall flexibility with largest context window (32K)
- EmbeddingGemma: Best for resource-constrained environments
- Nomic: Good balance of performance and context size
- Snowflake: Optimized for short text with high dimensional output
🔧 Setup
⚠️ CRITICAL: Ollama URL Configuration
ALWAYS remove trailing slashes from your Ollama URL!
✅ CORRECT: http://localhost:11434
❌ WRONG: http://localhost:11434/Why this matters: A trailing slash creates a double-slash in the API path (http://host:11434//api/embed), which causes HTTP parsers to silently transform POST requests to GET requests, resulting in HTTP 405 "Method Not Allowed" errors.
This is the #1 cause of 405 errors with this node. Always verify your Ollama URL format first.
1. Configure Credentials (Optional)
For self-hosted Ollama without authentication: You can skip credential configuration. The node will connect directly to your Ollama instance.
For authenticated Ollama instances:
- In n8n, go to Credentials > New
- Select Qwen Embedding API (Ollama)
- Enter:
- Ollama URL:
http://localhost:11434(NO trailing slash!) - Model Name:
qwen3-embedding:0.6b(or your chosen model) - API Key: Your authentication token (if required)
- Ollama URL:
- IMPORTANT: Verify your URL has NO trailing slash before saving
- Click Test Connection to verify
2. Using the Nodes
Qwen Embedding (Vector Store Integration)
Connect to any vector store node:
[Vector Store] ← [Qwen Embedding]
(Stores) (Provides embeddings)Use Cases:
- RAG applications
- Semantic search
- Document indexing
- Knowledge bases
Qwen Embedding Tool (Direct Usage)
Use in any workflow for direct embedding generation:
[Trigger] → [Qwen Embedding Tool] → [Process/Store]Use Cases:
- Similarity calculations
- Text clustering
- Anomaly detection
- Content deduplication
⚙️ Configuration Options
Performance Mode
Controls timeout and retry behavior based on your hardware:
Auto-Detect (default): Automatically detects GPU/CPU on first request
- GPU detected (<1s response): 10s timeout, 2 retries
- CPU detected (>5s response): 60s timeout, 3 retries
- Works great for dynamic environments
GPU Optimized: Manual setting for GPU hardware
- 10 second timeout
- 2 retry attempts
- Best for NVIDIA GPU setups
CPU Optimized: Manual setting for CPU hardware
- 60 second timeout
- 3 retry attempts
- Prevents timeout errors on CPU-only systems
Custom: User-defined timeout and retry settings
- Set your own timeout (in milliseconds)
- Configure max retry attempts (0-5)
How Auto-Detection Works:
First request measures actual response time:
- Response < 1s → GPU detected → timeout = 10s
- Response > 5s → CPU detected → timeout = 60s
- 1s ≤ response ≤ 5s → keep default 30sDimensions
Adjust embedding vector size based on model capabilities:
- Auto-detect (0): Uses optimal dimensions for the selected model
- Manual adjustment: Set specific dimensions (validated per model)
- Qwen: 32-1024 (flexible via MRL)
- EmbeddingGemma: 128-768 (flexible via MRL)
- Nomic: 768 (fixed)
- Snowflake: 1024 (fixed)
Implementation: Models with MRL support (Qwen, EmbeddingGemma) can adjust dimensions without retraining. Fixed-dimension models will show a warning if you try to adjust.
Instruction Type
Optimize embeddings for specific use cases:
- None (default): Standard embeddings without special instructions
- Query: Optimized for search queries
- Prefix:
"Instruct: Retrieve semantically similar text.\nQuery: " - Use for: User questions, search inputs
- Prefix:
- Document: Optimized for document storage
- Prefix:
"Instruct: Represent this document for retrieval.\nDocument: " - Use for: Indexing documents, knowledge base entries
- Prefix:
Performance Impact: 1-5% better semantic matching when query/document types match their use case.
Context Prefix
Add custom context to all texts before embedding:
Example:
Context Prefix: "Medical context:"
Input text: "patient symptoms include fever"
Embedded as: "Medical context: patient symptoms include fever"Use Cases:
- Domain-specific context (legal, medical, technical)
- Language hints
- Task-specific framing
Return Format (Tool Only)
Controls output structure:
Full (default):
{ "embedding": [0.123, 0.456, ...], "dimensions": 1024, "text": "original text", "model": "qwen3-embedding:0.6b", "metadata": { ... } // if includeMetadata enabled }Simplified:
{ "text": "original text", "vector": [0.123, 0.456, ...], "dimensions": 1024 }Embedding Only:
{ "embedding": [0.123, 0.456, ...] }
Include Metadata (Tool Only)
When enabled, adds processing metadata to output:
{
"metadata": {
"prefix": "Medical context:", // Context prefix used
"instruction": "query", // Instruction type applied
"timestamp": "2025-10-07T19:44:23Z", // Processing time
"batchSize": 5 // Number of texts (batch mode only)
}
}Use Cases:
- Debugging embedding configurations
- Tracking when embeddings were generated
- Auditing processing parameters
Custom Timeout & Max Retries (Custom Mode Only)
Fine-tune request behavior:
- Custom Timeout: Milliseconds to wait before timeout (default: 30000)
- Max Retries: Number of retry attempts on failure (0-5, default: 2)
Retry Logic:
- Exponential backoff: 1s → 2s → 4s → 5s (capped)
- Clear console logging for debugging
- Automatic retry on transient failures
Operation (Tool Only)
- Generate Embedding: Single text embedding
- Generate Batch Embeddings: Multiple texts in one request
📝 Examples
Example 1: Vector Store with RAG
1. Add "Supabase Vector Store" node
2. Add "Qwen Embedding" node
3. Connect Qwen to Vector Store's embedding input
4. Configure your collection and start indexingExample 2: Semantic Search
1. Add "Manual Trigger" node
2. Add "Qwen Embedding Tool" node (set to "Generate Embedding")
3. Add another "Qwen Embedding Tool" for documents
4. Add "Code" node to calculate similaritiesExample 3: Batch Processing
// Input: Array of texts
{
"texts": [
"First document",
"Second document",
"Third document"
]
}
// Qwen Embedding Tool (Batch mode) output:
{
"embeddings": [[...], [...], [...]],
"count": 3,
"dimensions": 1024
}🔬 Technical Details
Model Information
Qwen Models:
qwen3-embedding:0.6b- 1024d, 32K context, multilingual (100+ languages)- MRL support for flexible dimensions (32-1024)
EmbeddingGemma Models:
embeddinggemma:300m- 768d, 2K context, lightweight- MRL support for flexible dimensions (128-768)
- Variants: bf16, q8, q4 for different precision/size tradeoffs
Nomic Models:
nomic-embed-text- 768d, 8K context, balanced performance- Fixed dimensions (no MRL)
Snowflake Models:
snowflake-arctic-embed- 1024d, 512 context, high precision- Fixed dimensions (no MRL)
Performance Characteristics
- GPU Mode: ~200-270ms per embedding (NVIDIA GPU)
- CPU Mode: 5-10 seconds per embedding
- Auto-Detection: Automatically adjusts timeouts based on hardware
- Query Optimization: 1-5% performance boost with instruction type
- Batch Processing: Processes texts sequentially (Ollama API limitation)
- Timeout: Adaptive (10s GPU, 60s CPU, or 30s default)
API Compatibility
- Ollama API:
/api/embedendpoint (POST method) - Request Format:
{model: string, input: string} - Response Format:
{embeddings: number[][]} - Authentication: Optional (credentials not required for self-hosted)
🤝 Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
🙏 Acknowledgments
- Qwen Team for the amazing embedding models
- n8n Community for support and feedback
- LangChain for the embedding interface
🐛 Troubleshooting
HTTP 405 "Method Not Allowed" Errors
Most Common Cause (90% of cases): Trailing slash in Ollama URL configuration.
# Check your credential configuration:
✅ Correct: http://localhost:11434
❌ Wrong: http://localhost:11434/Quick Fix:
- Go to N8N Credentials
- Edit your Ollama credential
- Remove the trailing slash from the URL
- Save and test again
For other 405 error causes, see the comprehensive HTTP 405 Troubleshooting Guide which covers:
- URL formatting issues (trailing slashes)
- N8N authentication system issues
- Working patterns vs anti-patterns
- Step-by-step verification
Model Not Found Errors
# Pull the correct model first:
ollama pull qwen3-embedding:0.6b
# Verify it's available:
ollama listPerformance Issues
- CPU mode timing out: Use Performance Mode "CPU Optimized" or "Auto-Detect"
- GPU not detected: Ensure NVIDIA drivers and Docker GPU runtime are configured
- Slow responses: Check Ollama is running with
ollama list
🔗 Links
- NPM Package
- GitHub Repository
- Troubleshooting Guide
- n8n Community Nodes
- Qwen3-Embedding Paper
- Ollama Documentation
📮 Support
For issues and questions:
Made with ❤️ for the n8n community
