@vicistack/vicidial-kamailio-load-balancing
v1.0.0
Published
When Your VICIdial Outgrows One Asterisk Box: A Kamailio Load Balancing Primer — ViciStack call center engineering guide
Maintainers
Readme
When Your VICIdial Outgrows One Asterisk Box: A Kamailio Load Balancing Primer
When your VICIdial deployment crosses the 100-agent threshold, a single Asterisk server stops being viable. Not because Asterisk cannot handle the call volume -- a well-tuned Asterisk box can manage 300+ concurrent calls -- but because you lose all fault tolerance. One kernel panic, one runaway process, one failed disk, and your entire operation goes silent. The answer is horizontal scaling: multiple Asterisk servers behind a SIP load balancer. And in the VICIdial world, that load balancer is Kamailio. This guide covers why Kamailio is the right tool, how to configure its dispatcher module for VICIdial, how to implement health probing and automatic server removal, and how to scale from a single server to a 10-server cluster. Every configuration example has been tested in production VICIdial environments running 100-500 agents. ## Why You Need Kamailio at Scale VICIdial's built-in multi-server support handles database replication and web interface distribution well. But for SIP traffic -- the actual calls -- VICIdial relies on Asterisk's native capabilities, which were not designed for clustered operation. Here is what breaks without a proper SIP load balancer: ### Problem 1: Uneven Call Distribution VICIdial assigns calls to servers based on the campaign's server configuration. If Server A and Server B are both assigned to a campaign, VICIdial's hopper tries to distribute leads across servers, but the distribution is based on lead assignment, not real-time server load. Server A might have 180 concurrent calls while Server B sits at 60. ### Problem 2: No Graceful Failure Handling If Asterisk crashes on Server A, the calls in progress are lost. There is no mechanism to redirect new calls to Server B until VICIdial's keepalive process detects the failure -- which can take 30-60 seconds. During that window, new calls to Server A fail. ### Problem 3: Carrier-to-Server Routing When inbound calls arrive from your SIP carrier, they hit a single IP address. Without a load balancer, that IP belongs to one Asterisk server. If that server is overloaded or down, the carrier gets a SIP error and the call is lost. ### Problem 4: Codec Transcoding Bottleneck If different carriers use different codecs, Asterisk must transcode. Transcoding is CPU-intensive -- a single G.729 to G.711 transcoding operation uses roughly 10x the CPU of a native G.711 call. Under heavy load, transcoding on a single server can cause audio quality issues across all calls on that server. Kamailio solves all of these problems. It sits in front of your Asterisk servers as a SIP proxy, distributing calls based on configurable algorithms, monitoring server health, and transparently rerouting traffic when servers fail. ## Architecture Overview Here is the target architecture: +-----------+ SIP Carriers ---->| Kamailio | (inbound/ | SIP Proxy | outbound) | Port 5060 | +-----+-----+ | +-----------+-----------+ | | | +-----+---+ +----+----+ +----+----+ |Asterisk | |Asterisk | |Asterisk | |Server 1 | |Server 2 | |Server 3 | |Port 5080| |Port 5080| |Port 5080| +----+----+ +----+----+ +----+----+ | | | +-----------+-----------+ | +-----+-----+ | VICIdial | | Database | | (MySQL) | +-----------+ Kamailio listens on port 5060 (the standard SIP port) and proxies calls to Asterisk servers listening on port 5080. Each Asterisk server connects to the shared VICIdial MySQL database for call routing, agent assignment, and CDR logging. ## Dispatcher Module Configuration The Kamailio dispatcher module is the core of SIP load balancing. It maintains a list of destinations (your Asterisk servers), monitors their health, and distributes calls across them. ### Step 1: Install Kamailio with Dispatcher On CentOS/RHEL (common for VICIdial): bash # Add Kamailio repository cat > /etc/yum.repos.d/kamailio.repo << 'REPO' [kamailio] name=Kamailio packages baseurl=https://rpm.kamailio.org/stable/el9/ gpgcheck=0 enabled=1 REPO # Install Kamailio with required modules yum install -y kamailio kamailio-mysql kamailio-utils kamailio-tls ### Step 2: Configure the Dispatcher List The dispatcher list defines your Asterisk servers, their weights, and their group assignments. Create the dispatcher list file: # /etc/kamailio/dispatcher.list # Format: setid destination flags priority attributes # Set 1: Outbound Asterisk servers 1 sip:10.0.0.11:5080 0 10 weight=50;duid=ast1 1 sip:10.0.0.12:5080 0 5 weight=30;duid=ast2 1 sip:10.0.0.13:5080 0 3 weight=20;duid=ast3 # Set 2: Inbound Asterisk servers (may be the same or different) 2 sip:10.0.0.11:5080 0 10 weight=40;duid=ast1 2 sip:10.0.0.12:5080 0 10 weight=40;duid=ast2 2 sip:10.0.0.13:5080 0 10 weight=20;duid=ast3 Each line defines: - setid -- A group identifier. Use different sets for different routing purposes (outbound vs inbound, or by campaign). - destination -- The SIP URI of the Asterisk server. Use the internal IP and the port Asterisk listens on (5080 in our setup). - flags -- 0 for normal operation. Set to 1 to mark a destination as inactive (maintenance mode). - priority -- Higher values mean higher priority. Used with priority-based routing algorithms. - attributes -- Key-value pairs. The weight attribute controls proportional distribution. The duid is a unique identifier for logging. ### Step 3: Core Kamailio Configuration Here is a production-tested kamailio.cfg for VICIdial load balancing: #!KAMAILIO # # VICIdial SIP Load Balancer Configuration # Kamailio 5.7+ ####### Global Parameters ######### debug=2 log_stderror=no log_facility=LOG_LOCAL0 fork=yes children=8 auto_aliases=no listen=udp:0.0.0.0:5060 listen=tcp:0.0.0.0:5060 server_header="Server: ViciStack-LB" user_agent_header="User-Agent: ViciStack-LB" ####### Modules Section ######## loadmodule "tm.so" loadmodule "sl.so" loadmodule "rr.so" loadmodule "pv.so" loadmodule "maxfwd.so" loadmodule "textops.so" loadmodule "siputils.so" loadmodule "xlog.so" loadmodule "sanity.so" loadmodule "nathelper.so" loadmodule "rtpengine.so" loadmodule "dispatcher.so" # ----- tm params ----- modparam("tm", "fr_timer", 5000) # 5 second final response timeout modparam("tm", "fr_inv_timer", 30000) # 30 second INVITE timeout modparam("tm", "restart_fr_on_each_reply", 1) # ----- rr params ----- modparam("rr", "enable_full_lr", 1) modparam("rr", "append_fromtag", 1) # ----- nathelper params ----- modparam("nathelper", "received_avp", "$avp(RECEIVED)") modparam("nathelper", "sipping_bflag", 7) # ----- rtpengine params ----- modparam("rtpengine", "rtpengine_sock", "udp:127.0.0.1:2223") # ----- dispatcher params ----- modparam("dispatcher", "list_file", "/etc/kamailio/dispatcher.list") modparam("dispatcher", "ds_probing_mode", 1) # Enable probing modparam("dispatcher", "ds_ping_interval", 15) # Probe every 15 seconds modparam("dispatcher", "ds_probing_threshold", 3) # 3 failures = inactive modparam("dispatcher", "ds_ping_reply_codes", "class2;class3;class4") modparam("dispatcher", "ds_ping_from", "sip:[email protected]") ####### Routing Logic ######## request_route { # Per-request initial checks if (!mf_process_maxfwd_header("10")) { sl_send_reply("483", "Too Many Hops"); exit; } if (!sanity_check("1511", "7")) { xlog("L_WARN", "Malformed SIP message from $si:$sp\n"); exit; } # CANCEL processing if (is_method("CANCEL")) { if (t_check_trans()) { t_relay(); } exit; } # Handle retransmissions if (!is_method("ACK")) { if (t_precheck_trans()) { t_check_trans(); exit; } t_check_trans(); } # Record-Route for mid-dialog requests if (is_method("INVITE|SUBSCRIBE")) { record_route(); } # Handle sequential requests (in-dialog) if (has_totag()) { if (loose_route()) { if (is_method("INVITE")) { record_route(); } route(RELAY); exit; } if (is_method("ACK")) { if (t_check_trans()) { t_relay(); exit; } exit; } sl_send_reply("404", "Not Here"); exit; } # Handle REGISTER - not typically needed for VICIdial if (is_method("REGISTER")) { sl_send_reply("404", "No registrar"); exit; } # Handle OPTIONS for health checks from carriers if (is_method("OPTIONS") && uri==myself) { sl_send_reply("200", "OK"); exit; } # Route INVITE requests through dispatcher if (is_method("INVITE")) { route(DISPATCH); exit; } route(RELAY); } # Dispatch route - load balance across Asterisk servers route[DISPATCH] { # Use set 1 for outbound, set 2 for inbound # Detect direction based on source: carrier IPs = inbound $var(ds_set) = 1; # Default: outbound # If source is a known carrier IP, use inbound set if ($si == "203.0.113.10" || $si == "203.0.113.20" || $si == "198.51.100.5") { $var(ds_set) = 2; # Inbound from carrier } # Algorithm 4 = weighted round-robin # Algorithm 10 = weighted based on load if (!ds_select_dst("$var(ds_set)", "4")) { # All servers down - return 503 xlog("L_ERR", "CRITICAL: No Asterisk servers available for call from $fu to $ru\n"); sl_send_reply("503", "Service Unavailable"); exit; } xlog("L_INFO", "Dispatching $rm from $fu to $du (set=$var(ds_set))\n"); # Set failure route for automatic failover t_on_failure("DISPATCH_FAILURE"); route(RELAY); } # Relay route route[RELAY] { if (!t_relay()) { sl_reply_error(); } exit; } # Failure route - try next server if current one fails failure_route[DISPATCH_FAILURE] { if (t_is_canceled()) { exit; } # If we get a server error, try the next server if (t_check_status("500|503")) { xlog("L_WARN", "Server $du returned error, trying next server\n"); # Mark current destination as probing (temporarily suspect) ds_mark_dst("p"); # Try next destination in the set if (ds_next_dst()) { xlog("L_INFO", "Failing over to $du\n"); t_on_failure("DISPATCH_FAILURE"); t_relay(); exit; } xlog("L_ERR", "CRITICAL: All servers exhausted for call from $fu\n"); } # For other failure codes (busy, no answer), don't failover # These are called-party issues, not server issues } This configuration handles the complete SIP flow: initial request routing, mid-dialog routing (so audio continues to flow correctly), health monitoring, and automatic failover when a server returns an error. ## Weighted Round-Robin vs Least-Connections The dispatcher module supports multiple load balancing algorithms. The two most relevant for VICIdial are weighted round-robin and call-load based. ### Algorithm 4: Weighted Round-Robin Distributes calls proportionally based on the weight assigned to each server. A server with weight=50 receives roughly 50% of calls, weight=30 gets 30%, and weight=20 gets 20%. Best for: Heterogeneous server hardware. If Server 1 has 32 CPU cores and Server 2 has 16 cores, weight them 2:1. # dispatcher.list with weighted round-robin 1 sip:10.0.0.11:5080 0 10 weight=50;duid=ast1-32core 1 sip:10.0.0.12:5080 0 5 weight=30;duid=ast2-16core 1 sip:10.0.0.13:5080 0 3 weight=20;duid=ast3-8core ### Algorithm 10: Call-Load Based (Weight + Active Calls) Distributes calls based on both weight and the current number of active calls on each server. Kamailio tracks how many calls it has proxied to each destination and routes new calls to the least-loaded server, adjusted by weight. Best for: Homogeneous hardware where you want even distribution based on actual load rather than static weights. To use algorithm 10, change the dispatch line in the config: # In route[DISPATCH]: if (!ds_select_dst("$var(ds_set)", "10")) { ### Which Should You Use? For most VICIdial deployments, start with algorithm 4 (weighted round-robin). It is simpler, more predictable, and easier to debug. Switch to algorithm 10 if you observe uneven load distribution due to varying call durations or mixed inbound/outbound traffic patterns. ## Health Probing and Automatic Removal The dispatcher module's probing system is critical for production reliability. Here is how it works and how to tune it. ### How Probing Works When ds_probing_mode=1, Kamailio sends SIP OPTIONS requests to each destination at the interval specified by ds_ping_interval. If a destination fails to respond to ds_probing_threshold consecutive probes, Kamailio marks it as inactive and stops sending calls to it. When the destination starts responding again, Kamailio automatically re-activates it. ### Tuning Probe Parameters # Aggressive probing for fast failover modparam("dispatcher", "ds_ping_interval", 10) # Probe every 10 seconds modparam("dispatcher", "ds_probing_threshold", 2) # 2 failures = inactive # Failover time: 10s * 2 = 20 seconds worst case # Conservative probing to avoid false positives modparam("dispatcher", "ds_ping_interval", 30) # Probe every 30 seconds modparam("dispatcher", "ds_probing_threshold", 3) # 3 failures = inactive # Failover time: 30s * 3 = 90 seconds worst case For a 100+ agent center, we recommend 15-second intervals with a threshold of 3. This gives you failover within 45 seconds while avoiding false positives from temporary network hiccups. ### Monitoring Dispatcher State Check the current state of your dispatch destinations from the Kamailio CLI: bash # Show all dispatcher destinations and their status kamcmd dispatcher.list # Output example: # SET: 1 # URI: sip:10.0.0.11:5080 FLAGS:AP PRIORITY:10 ATTRS:weight=50;duid=ast1 # URI: sip:10.0.0.12:5080 FLAGS:AP PRIORITY:5 ATTRS:weight=30;duid=ast2 # URI: sip:10.0.0.13:5080 FLAGS:IP PRIORITY:3 ATTRS:weight=20;duid=ast3 # FLAGS meaning: # A = Active # P = Probing enabled # I = Inactive (failed probes) # D = Disabled (manually) ### Manual Server Maintenance To take a server out of rotation for maintenance without affecting active calls: bash # Mark server as inactive (stops new calls, existing calls continue) kamcmd dispatcher.set_state i 1 sip:10.0.0.13:5080 # Perform maintenance on the server... # Re-activate the server kamcmd dispatcher.set_state a 1 sip:10.0.0.13:5080 ## Scaling from 1 to 10 Asterisk Servers Here is a practical scaling roadmap for growing your VICIdial cluster. ### Stage 1: Single Server (Up to 50 Agents) No Kamailio needed. Single Asterisk server handles all traffic. Focus on optimizing that server: tuning AMD, carrier configuration, and agent settings. ### Stage 2: Dual Server with Kamailio (50-100 Agents) Add a second Asterisk server and deploy Kamailio as the SIP proxy. At this stage, Kamailio can run on one of the Asterisk servers or on a separate lightweight VM.
About
Built by ViciStack — enterprise VoIP and call center infrastructure.
- VICIdial Hosting & Optimization
- Call Center Performance Guides
- Full Article: When Your VICIdial Outgrows One Asterisk Box: A Kamailio Load Balancing Primer
License
MIT
