Two of the most critical requirements for any online service provider are availability and redundancy. The time it takes for a server to respond to a request varies by its current capacity. If even a single component fails or is overwhelmed by requests, the server is overloaded and both the customer and the business suffer.
Load balancing attempts to resolve this issue by sharing the workload across multiple components. An incoming request can be routed from an overtaxed server to one that has more resources available. Load balancing has a variety of applications from network switches to database servers.
Service providers typically build their networks by using Internet-facing front-end servers to shuttle information to and from backend servers. These frontend servers contain load balancing software, which forwards requests to one of the backend servers based on resource availability. Load balancing software contains internal rules and logic to determine when and where to forward each request.
Here’s a rundown of how load balancing works:
If all goes well, the user will have received a response in a timely manner regardless of the state of the service provider’s network. If at least one front-end server and at least one back-end server is available, the user’s request is handled properly.
Google’s Compute Engine is built on the same load balancing techniques used by several Google products including Gmail, Search and Google Ads. Compute Engine periodically reviews the state of all backend servers and marks them as healthy or unhealthy based on their current load.
When a user connects to a Google service, Compute Engine forwards the request to a healthy server. The response is then forwarded from the healthy server through Compute Engine back to the user. Meanwhile, unhealthy servers are repaired, replaced or taken offline.
With load balancing, a server can be upgraded with no interruptions to the end user’s experience. Google and other service providers push application updates by upgrading their backend servers in waves. For instance, as a server is taken offline for upgrade, other servers take responsibility for the workload and are subsequently updated in turn.
In Compute Engine, the ability to take a system offline for maintenance and upgrade is known as “lame-duck mode”. This is how Google’s web products can be seamlessly updated even between active sessions.
For many of us, we rely on web services to be available 24/7. A 30-minute downtime for Facebook could cost almost $600,000. When dealing with high traffic web applications, load balancing is essential for maintaining the integrity and availability of a service.
From DNS requests to web servers, load balancing can mean the difference between costly downtime and a seamless end user experience.