Load balancing is a crucial part of modern network infrastructure. We’re in a world where most websites and web applications cope with increased load by scaling out — that is, running multiple copies of a single application or service and then dividing incoming traffic between them. Load balancers divide incoming requests equitably, so no single server is overwhelmed. They increase application reliability by providing the ability to distribute thousands of requests between multiple instances. Load balancers are redundant by nature, so if a request fails, it can be rerouted to working servers.
Organizations can scale an application using two approaches: horizontal scaling and vertical scaling. Horizontal scaling means powering up instance specifications in terms of CPU, RAM, and storage. Alternatively, organizations achieve vertical scaling by increasing the number of instances. In that scenario, the load balancer adds value by simultaneously sending requests to all the available instances.
Different types of load balancers work in different ways to address multiple levels of the OSI model. The OSI model divides networks into seven layers, ranging from physical hardware at layer 1 to end-user applications at layer 7. Most load balancing occurs at layer 2 (L2) and layer 3 (L3), so that’s where we’ll be focusing in this article.
L2 is where data transmission occurs between machines connected to the same physical or virtual network. Hardware and software operating at L2 handle intranet data flow and error control.
Communication between two different networks or subnets, such as sending network requests across the internet, happens at L3. Hardware and software operating at L3 find the best physical path for the message and route traffic accordingly. Protocols operating at L3, like Transmission Control Protocol (TCP) and User Datagram Protocol (UDP), typically create packets of data on the sending side of a request and then reassemble them on the receiving side. L3 is responsible for flow and error control in communication between networks.
Let’s dive into these two layers, their relative advantages, and how they can work together to provide optimal load balancing performance.
The primary difference between the layers is the way routing and switching works. L3 comes into the picture when routing takes place between multiple networks. L3 routing makes it possible for network requests to travel smoothly from one machine to another, even if the request must pass through multiple routers or computers to reach its destination.
Switching, however, occurs at L2 within a single network. L2 uses the hardware MAC address of each computer connected to a local network and does not use IP addresses or TCP or UDP ports. In contrast, modern L3 routing almost exclusively uses IP addresses — although less common technologies like IPX and AppleTalk can also be routed at L3.
In other words, L2 switching uses MAC addresses to switch the packets from a source port to a destination port, maintaining a MAC table to map the network relationships. L3 switching technology, commonly known as routing, maintains an IP routing table to find the shortest path from a source to a destination.
Now that we have a little context about L2 and L3 and how routing and switching works, it’s time to learn about L2 and L3 load balancing.
Sometimes, too many requests go to a single server, causing it to become overloaded and unresponsive. In cases like these, you might use several copies of the same application or service to help share the load, but you still must find a way to divide the traffic between them. That’s where a load balancer comes in. A load balancer splits the load between the available servers and devices to minimize overhead on a single server or device.
At layer 2, load balancing can distribute traffic among the machines on the same network. But, as we discussed, it only routes traffic based on MAC addresses. It’s impossible to route traffic to another device on a different network using L2 load balancing. There are many scenarios when this kind of load balancing is helpful. One such use case divides the bandwidth equally to all devices within the network.
In contrast, L3 load balancers operate at a higher level, making it possible to route traffic using IPv4 and IPv6 addresses. Organizations use them to distribute the load across different virtual machines. They ensure high availability and reliability by circulating requests among servers or VMs.
Many L3 load balancers also enable developers and DevOps engineers to choose from various algorithms to perform load balancing effectively. Some standard L3 load balancing algorithms include round robin, weighted round robin, fewest connections (least connection), IP hash, consistent hashing, least time, and equal-cost multi-path (ECMP). Let’s examine a few of them.
In contrast to L2 and L3 load balancers, L7 load balancers are software that operates at the application layer. Popular L7 load balancing tools include Nginx and HAProxy. Many of them are open source, making it easy to add SSL termination and implement new protocols like HTTP/3.
L7 load balancers can also use extra information available at the application layer to carry out complex and intelligent load balancing. They can read the content of incoming HTTP(S) and use that data to route the request directly to where it needs to go. Most L7 load balancers also can define rules and targets to achieve the expected routing.
The main disadvantage of L7 load balancers is that they usually act as proxies, so they must maintain two open connections: one to the machine that made the request and one to the target machine that is serving the request. L2 and L3 load balancers, in comparison, simply forward packets to their destination.
The most basic approach to load balancing is the round-robin DNS method. This method creates multiple DNS records for the same host, pointing to different servers. So, for instance, if we do nslookup on one of our domains, it returns three A records associated with it.
Though this approach is easy to implement, there are disadvantages to using it. First, it doesn’t check whether the configured server is running. In other words, the request will still go to the non-working server if configured in DNS management. Another drawback is that the effect is not instantaneous — time to live (TTL) plays a massive role. That means the user may be doing something on Server A, refresh the page, and the load balancer redirects them to another server.
Server load balancing (SLB) is another advanced approach to load balance the request to multiple servers. SLB has more features and benefits than the round-robin DNS method. Unlike round-robin DNS, SLB supports health checks, which means the request will only go to a healthy server instance.
It also supports the mechanism of connection draining, ensuring that current requests complete before removing a server from the load balancer. If the connection draining feature is enabled, after deregistering the instance, it immediately stops new requests to that instance but allows existing requests to complete. We explain various techniques and algorithms associated with the SLB below.
We briefly discussed the different techniques and algorithms that we can use with load balancers. However, load balancers do more than just distribute traffic in the server pool. The modern load balancer also manages the user’s active session. This functionality — known as session persistence or sticky sessions — means the load balancer can connect the same user to the same server over an extended period.
For example, say a user adds an item to a shopping cart and accidentally closes their browser. Load balancers can intelligently identify and redirect users to the same server to continue from where they left off. This can improve efficiency as it allows applications to store user session data in memory on a single server rather than keeping it in a database or key-value store like Redis.
Load balancers can also check the underlying server’s status using regular health checks and let the administrator know if any server fails. Even in cases of failure, load balancers can stop directing traffic to that server until the health check is green again. This feature helps reduce downtime and eliminate system failure.
Load balancers also add a protective layer to the application without changing the application’s underlying structure. Load balancers can improve security by protecting against distributed denial-of-service (DDoS) attacks. They also enable adding an optional authentication layer.
Load balancers make it easy to add SSL/TLS because you only have to add a certificate to the load balancer instead of adding it to every back-end server. They even make it easy to add support for new protocols like HTTP/3 and QUIC even if your back-end servers don’t yet support the new protocol.
Finally, many well-known load balancers are open-source and maintained by a large community. Modern load balancers support most of the features we’ve covered in this article. Organizations can even use multiple load balancers to get the most availability and scalability. It’s a common practice to mix and match L3 and L7 load balancing to achieve the best results.
In this article, we learned about the basic idea of the OSI model, examined how packets are switched and routed, explored L2 and L3, and discussed different load balancing algorithms along with some of the problems they solve.
Understanding load balancing is an integral part of understanding web application performance — whether you’re controlling load balancing directly or just using your cloud or edge provider’s load balancing behind the scenes.