@llamafarm/llamafarm
v0.1.0
Published
๐พ Plant and harvest AI models, agents, and databases into single deployable binaries
Readme
๐พ LLaMA Farm CLI
Deploy AI models, agents, and databases into single deployable binaries - no cloud required.
Installation
npm install -g @llamafarm/llamafarmQuick Start
# Deploy a model
llamafarm plant llama3-8b
# Deploy with optimization
llamafarm plant llama3-8b --optimize
# Deploy to specific target
llamafarm plant mistral-7b --target raspberry-pi
# Development/Testing (no model download)
llamafarm plant llama3-8b --mockComplete Workflow Example
# 1. Plant - Configure your AI deployment
llamafarm plant llama3-8b \
--device mac-arm \
--agent chat-assistant \
--rag \
--database vector
# 2. Bale - Compile to single binary
llamafarm bale ./.llamafarm/llama3-8b \
--device mac-arm \
--optimize
# 3. Harvest - Deploy anywhere
llamafarm harvest llama3-8b-mac-arm-v1.0.0.bin --run
# Or just copy and run directly (no dependencies needed!)
./llama3-8b-mac-arm-v1.0.0.binFeatures
- ๐ฏ One-Line Deployment - Deploy complex AI models with a single command
- ๐ฆ Zero Dependencies - Compiled binaries run anywhere
- ๐ 100% Private - Your data never leaves your device
- โก Lightning Fast - 10x faster than traditional deployments
- ๐พ 90% Smaller - Optimized models use fraction of original size
Commands
plant
Deploy a model to create a standalone binary.
llamafarm plant <model> [options]
Options:
--target <platform> Target platform (mac, linux, windows, raspberry-pi)
--optimize Enable size optimization
--agent <name> Include an agent
--rag Enable RAG pipeline
--database <type> Include database (vector, sqlite)Examples
# Basic deployment
llamafarm plant llama3-8b
# Deploy with RAG and vector database
llamafarm plant mixtral-8x7b --rag --database vector
# Deploy optimized for Raspberry Pi
llamafarm plant llama3-8b --target raspberry-pi --optimize
# Deploy with custom agent
llamafarm plant llama3-8b --agent customer-servicebale
๐ฏ The Baler - Compile your deployment into a single executable binary.
llamafarm bale <project-dir> [options]
Options:
--device <platform> Target platform (mac, linux, windows, raspberry-pi)
--output <path> Output binary path
--optimize <level> Optimization level (none, standard, max)
--sign Sign the binary for distribution
--compress Extra compression (slower but smaller)The Baler packages everything into a single binary:
- ๐ง Quantized model (GGUF format)
- ๐ค Agent configuration & code
- ๐๏ธ Embedded vector database
- ๐ Web UI
- ๐ Node.js runtime
- ๐ง Platform-specific optimizations
Supported Platforms:
mac/mac-arm/mac-intel- macOS with Metal accelerationlinux/linux-arm- Linux with CUDA supportwindows- Windows with DirectML/CUDAraspberry-pi- Optimized for ARM devicesjetson- NVIDIA Jetson edge devices
Typical Binary Sizes:
- 7B models: 4-8GB (depending on quantization)
- 13B models: 8-13GB
- Mixtral: 25-45GB
Bale Examples
# Standard compilation
llamafarm bale ./.llamafarm/llama3-8b --device mac-arm
# Optimized for size
llamafarm bale ./.llamafarm/llama3-8b --device raspberry-pi --optimize max --compress
# Enterprise deployment with signing
llamafarm bale ./.llamafarm/mixtral --device linux --sign --output production.binharvest
Deploy and run a compiled binary.
llamafarm harvest <binary-or-url> [options]
Options:
--run Run immediately after deployment
--daemon Run as background service
--port <number> Override default port
--verify Verify binary integrityConfiguration
Create a llamafarm.yaml file for advanced configurations:
name: my-assistant
base_model: llama3-8b
plugins:
- vector_search
- voice_recognition
data:
- path: ./company-docs
type: knowledge
optimization:
quantization: int8
target_size: 2GBThen build:
llamafarm buildRequirements
- Node.js 18+
- 8GB RAM (minimum)
- 10GB free disk space
Documentation
For full documentation, visit https://docs.llamafarm.ai
Support
- ๐ Documentation
- ๐ฌ Discord Community
- ๐ Issue Tracker
Baler FAQ
Q: Can I run the binary on a different OS than where I compiled it?
A: No, you need to compile for each target platform. Use --device to specify the target.
Q: How much disk space do I need? A: During compilation, you need ~3x the final binary size. The final binary is typically 4-8GB for 7B models.
Q: Can I update the model without recompiling? A: No, the model is embedded in the binary. This ensures zero dependencies but means updates require recompilation.
Q: Does the binary need internet access? A: No! Everything runs completely offline once deployed.
License
MIT ยฉ LLaMA Farm Team
