Caddy: Load Balancing

Welcome back! Today we're looking at Caddy's load-balancing and health check features to improve the resiliency of your application.

What is load-balancing, again?

Load balancing distributes traffic (load) across a network of resources to support an application. Let's say that you run an application which receives lots of traffic. In order to deal with the load and for redundancy, you need to run more than one application server. Load-balancing enables you to route traffic to different servers based on different criteria.

My Blog

Let's pretend that I run a highly successful blog (ha-ha). The last time I released a post, the inrush of new visitors crashed my server. To solve this, I'm going to run two more servers and use Caddy to load-balance between them. Here's my infrastructure:

blog1
blog2
blog3
proxy (Caddy)

For the Caddyfile on the proxy server (this is the public endpoint)

/etc/caddy/Caddyfile

blog.alexjoelee.com {
    reverse_proxy blog1 blog2 blog3
}

Run Caddy and that's all! You're load-balancing traffic for blog.alexjoelee.com between three backends, blog1, blog2, and blog3. Maybe you prefer to use the IP instead and skip DNS resolution:

/etc/caddy/Caddyfile

blog.alexjoelee.com {
    reverse_proxy 10.0.0.2 10.0.0.3 10.0.0.4
}

Done! This configuration works great, but you might consider customizing your load balancer's behaviour.

Load-balancing methods

By default, Caddy offers a ton of different load-balancing methods:

random (Default) - Totally random!
random_choose # - Selects # or more upstreams randomly, then chooses one with least load.
first - First available, from the order defined in the configuration, allowing for failover.
round_robin - Iterates each upstream in turn
weighted_round_robin (weights) - Iterates each upstream in turn, respected the provided weight values.
least_conn - Selects upstream with least current requests.
ip_hash - Maps the remote IP to a sticky upstream
uri_hash - Maps the URI (path and query) to a sticky upstream
query (key) - Maps a request query to a sticky upstream by hashing the value of the query
header (field) - Maps a request query to a sticky upstream by hashing the value of the header field
cookie (name (secret)) - I'm not writing the whole explanation here, see the docs

Depending on what your goals are, I would argue there are a few methods which are most commonly used.

For failover (think "hot spare"): use first - all traffic is sent to your main server unless it's down, in which case it'll be sent to the next available server.
For dealing with lots of static requests: use round_robin or least_conn - traffic will be evenly distributed between all backend servers.
For dealing with applications that require sticky load-balancing, try ip_hash first if you're not sure that the other options will work for you.

Adding the least_conn method

Here's our config now:

/etc/caddy/Caddyfile

blog.alexjoelee.com {
    reverse_proxy {
	    to blog1 blog2 blog3
		lb_policy least_conn
	}
}

The load balancer is now sending traffic to the blog# server with the least current traffic. This is great, but we need to make sure we don't send traffic to a blog# server that's gone offline.

Adding on health-checks

Active vs. Passive health checks

Caddy offers both active and passive health checks. Setting active checks tells Caddy to send requests to the upstream regularly to make sure it's still online. Passive checks keep track of failed or slow forwarded requests. I think it's best to use both!

Note: Only health_port (or health_uri) are required to enable Active health checks. Only fail_duration is required to enable Passive health checks. The below example offers more options:

/etc/caddy/Caddyfile

blog.alexjoelee.com {
    reverse_proxy {
	    to blog1 blog2 blog3
		lb_policy least_conn
		# active health checks
		health_port 81 # Tells Caddy to send HTTP health checks to port :81
		health_interval 15s # Tells Caddy to send a health check every 15s
		health_timeout 10s # Tells Caddy that if it takes longer than 10s to receive a response, the backend is down
		health_status 202 # Tells Caddy to expect a health response to include HTTP response code 202
		# passive health checks
		fail_duration 30s # Tells Caddy to keep track of failures for 30s before forgetting
		max_fails 5 # Tells Caddy to mark the upstream as offline if it remembers more than 5 failures
		unhealthy_latency 150ms # Tells Caddy to mark the upstream as offline if responses to queries are taking longer than 150ms
	}
}

Bonus: The backend server Caddyfile (an example)

/etc/caddy/Caddyfile

# Main blog site
blog.alexjoelee.com {
    file_server
	root /www/
}
# Test blog. Not proxied.
blog-dev.alexjoelee.com {
	file_server
	root /wwdev/
}
# Health check
:81 {
	respond 202
}

Wrapping up

Load-balancing is made so much easier with Caddy's built-in health checks and variety of configuration options. Next week, we'll be back to learn about Caddyfile imports!

Want to learn more about Caddy?

Visit their official website at caddyserver.com Read through the docs at caddyserver.com/docs If you use it, sponsor the project (we do!) caddyserver.com/sponsor

This post is one of a series of tutorials about Caddy Server. You can view all relevant posts here.

Learn about Skip2

Get Started

What is load-balancing, again?

My Blog

Load-balancing methods

Adding the least_conn method

Adding on health-checks

Active vs. Passive health checks

Wrapping up

Want to learn more about Caddy?

Learn about Skip2

Sign up for our newsletter