vADC Docs

Techniques for Direct Server Return with Stingray Traffic Manager

by on ‎02-25-2013 05:07 AM (6,429 Views)

What is Direct Server Return?

Layer 2/3 Direct Server Return (DSR), also referred to as ‘triangulation’, is a network routing technique used in some load balancing situations:

  • Incoming traffic from the client is received by the load balancer and forwarded to a back-end node
  • Outgoing (return) traffic from the back-end node is sent directly to the client and bypasses the load balancer completely

dsr1.png

Incoming traffic (blue) is routed through the load balancer, and return traffic (red) bypasses the load balancer

Direct Server Return is fundamentally different from the normal load balancing mode of operation, where the load balancer observes and manages both inbound and outbound traffic.

In contrast, there are two other common load balancing modes of operation:

  • NAT (Network Address Translation): layer-4 load balancers and simple layer 7 application delivery controllers use NAT (Network Address Translation) to rewrite the destination value of individual network packets.  Network connections are load-balanced by the choice of destination value.

They often use a technique called ‘delayed binding’ to delay and inspect a new network connection before sending the packets to a back-end node; this allows them to perform content-based routing.  NAT-based load balancers can switch TCP streams, but have limited capabilities to inspect and rewrite network traffic.

  • Proxy: Modern general-purpose load balancers like Stingray Traffic Manager operate as full proxies.  The proxy mode of operation is the most compute-intensive, but current general purpose hardware is more than powerful enough to manage traffic at multi-gigabit speeds.

Whereas NAT-based load balancers manage traffic on a packet-by-packet basis, proxy-based load balancers can read entire request and responses.  They can manage and manipulate the traffic based on a full understanding of the transaction between the client and the application server.

Note that some load balancers can operated in a dual-mode fashion - a service can be handled either in a NAT-like fashion or in a Proxy-like fashion.  This introduces are tradeoff between hardware performance and software sophistication - see SOL4707 - Choosing appropriate profiles for HTTP traffic for an example.  Stingray Traffic Manager can only function in a Proxy-like fashion.

This article describes how the benefits of direct server return can be applied to a layer 7 traffic management device such as Stingray Traffic Manager.

Why use Direct Server Return?

Layer 2/3 Direct Server Return was very popular from 1995 to about 2000 because the load balancers of the time were seriously limited in performance and compute power; DSR uses less compute resources then a full NAT or Proxy load balancer.  DSR is no longer necessary for high performance services as modern load balancers on modern hardware can easily handle multi-gigabits of traffic without requiring DSR.

DSR is still an appealing option for organizations who serve large media files, or who have very large volumes of traffic.

Stingray Traffic Manager does not support a traditional DSR mode of operation, but it is straightforward to manage traffic to obtain a similar layer 7 DSR effect.

Disadvantages of Layer2/3 Direct Server Return

There are a number of distinct limitations and disadvantages with DSR:

1. The load balancer does not observe the response traffic

The load balancer has no way of knowing if a back-end server has responded correctly to the remote client.   The server may have failed, or it may have returned a server error message.  An external monitoring service is necessary to verify the health and correct operation of each back-end server.

2. Proper load balancing is not possible

The load balancer has no idea of service response times so it is difficult for it to perform effective, performance-sensitive load balancing.

3. Session persistence is severely limited

Because the load balancer only observes the initial ‘SYN’ packet before it makes a load balancing decision, it can only perform session persistence based on the source IP address and port of the packet, i.e. the IP address of the remote client.

The load balancer cannot perform cookie-based session persistence, SSL session ID persistence, or any of the many other session persistence methods offered by other load balancers.

4. Content-based routing is not possible

Again, because the load balancer does not observe the initial request, it cannot perform content based routing.

5. Limited traffic management and reporting

The load balancer cannot manage traffic, performing operations like SSL decryption, content compression, security checking, SYN cookies, bandwidth management, etc.  It cannot retry failed requests, or perform any traffic rewriting.  The load balancer cannot report on traffic statistics such as bandwidth sent.

6. DSR can only be used within a datacenter

There is no way to perform DSR between datacenters (other than proprietary tunnelling, which may be limited by ISP egress filtering).

In addition, many of the advanced capabilities of an application delivery controller that depend on inspection and modification (security, acceleration, caching, compression, scrubbing etc) cannot be deployed when a DSR mode is in use.

Performance of Direct Server Return

The performance benefits of DSR are often assumed to be greater than they really are.  Central to this doubt is the observation that client applications will send TCP ‘ACK’ packets via the load balancer in response to the data they receive from the server, and the volume of the ACK packets can overwhelm the load balancer.

Although ACK packets are small, in many cases the rated capacities of network hardware assume that all packets are the size of the maximum MTU (typically 1500 bytes).  A load balancer running on a 100 MBit network could receive a little over 8,000 ACK packets per second.

On a low-latency network, ACK packets are relatively infrequent (1 ACK packet for every 4 data packets), but for large downloads over a high-latency network (8 hops) the number of ACK packets closely approaches 1:1 as the server and client attempt to optimize the TCP session.  Therefore, over high-latency networks, a DSR-equipped load balancer will receive a similar volume of ACK packets to the volume of outgoing data packets (and the difference in size between the ACK and data packets has little effect to packet-based load balancers).

Stingray alternatives to Layer 2/3 DSR

There are two alternatives to direct server return:

Use Stingray Traffic Manager in its usual full proxy mode

Stingray Traffic Manager is comfortably able to manage over many Gbits of traffic in its normal ‘proxy’ mode on appropriate hardware, and can be scaled horizontally for increased capacity.  In benchmarks, modern Intel and AMD-based systems can achieve multiple 10's of Gbits of fully load-balanced traffic, and up to twice as much when serving content from Stingray Traffic Manager’s content cache.

Redirect requests to the chosen origin server (a.k.a. Layer 7 DSR)

For the most common protocols (HTTP and RTSP), it is possible to handle them in ‘proxy’ mode, and then redirect the client to the chosen server node once the load balancing and session persistence decision has been made.  For the large file download, the client communicates directly with the server node, bypassing Stingray Traffic Manager completely:

  1. Client issues HTTP or RTSP request to Stingray Traffic Manager
  2. Stingray Traffic Manager issues ‘probe’ request via pool to back-end server
  3. Stingray Traffic Manager verifies that the back-end server returns a correct response
  4. Stingray Traffic Manager sends a 302 redirect to the client, telling it to retry the request against the chosen back-end server

dsr2.png

Requests for small objects (blue) are proxied directly to the origin. 

Requests for large objects (red) elicit a lightweight probe to locate the resource,

and then the client is instructed (green)to retrieve the resource directly from the origin.

This technique would generally be used selectively.  Small file downloads (web pages, images, etc) would be managed through the Stingray Traffic Manager.  Only large files – embedded media for example – would be handled in this redirect mode.  For this reason, the HTTP session will always run through the Stingray Traffic Manager.

Layer 7 DSR with HTTP

Layer 7 DSR with HTTP is fairly straightforward.  In the following example, incoming requests that begin “/media” will be converted into simple probe requests and sent to the ‘Media Servers’ pool.  The Stingray Traffic Manager will determine which node was chosen, and send the client an explicit redirect to retrieve the requested content from the chosen node:

Request rule: Deploy the following TrafficScript request rule:


$path = http.getPath();



if( string.startsWith( $path, "/media/" ) || 1 ) {


   # Store the real path


   connection.data.set( "path", $path );




   # Convert the request to a lightweight HEAD for '/'


   http.setMethod( "HEAD" );


   http.setPath( "/" );



   pool.use( "Media Servers" );


}



Response rule: This rule reads the response from the server; load balancing and session persistence (if relevant) will ensure that we’ve connected with the optimal server node.  The rule only takes effect if we did the request rewrite, the $saved_path value will begin with ‘/media/’, so we can issue the redirect.



$saved_path = connection.data.get( "path" );



if( string.startsWith( $saved_path, "/media" ) ) {


   $chosen_node = connection.getNode();


   http.redirect( "http://".$chosen_node.$saved_path );


}


Layer 7 DSR  with RTSP

An RTSP connection is a persistent TCP connection.  The client and server communicate with HTTP-like requests and responses. 

In this example, Stingray Traffic Manager will receive initial RTSP connections from remote clients and load-balance them on to a pool of media servers.  In the RTSP protocol, a media download is always preceded by a ‘DESCRIBE’ request from the client; Stingray Traffic Manager will replace the ‘DESCRIBE’ response with a 302 Redirect response that tells the client to connect directly to the back-end media server.

This code example has been tested with the Quicktime, Real and Windows media clients, and against pools of Quicktime, Helix (Real) and Windows media servers.

The details

Create a virtual server listening on port 554 (standard port for RTSP traffic).  Set the protocol type to be “RTSP”.

In this example, we have three pools of media servers, and we’re going to select the pool based on the User-Agent field in the RTSP request.  The pools are named “Helix Servers”, “QuickTime Servers” and “Windows Media Servers”.

Request rule: Deploy the following TrafficScript request rule:


$client = rtsp.getRequestHeader( "User-Agent" );



# Choose the pool based on the User-Agent


if( string.Contains( $client, "RealMedia" ) ) {


   pool.select( "Helix Servers" );


} else if ( string.Contains( $client, "QuickTime" ) ) {


   pool.select( "QuickTime Servers" );


} else if ( string.Contains( $client, "WMPlayer" ) ) {


   pool.select( "Windows Media Servers" );


}


This rule uses pool.select() to specify which pool to use when Stingray is ready to forward the request to a back-end server. 

Response rule: All of the work takes place in the response rule.  This rule reads the response from the server.  If the request was a ‘DESCRIBE’ method, the rule then replaces the response with a 302 redirect, telling the client to connect directly to the chosen back-end server. 

Add this rule as a response rule, setting it to run every time (not once).


# Wait for a DESCRIBE response since this contains the stream


$method = rtsp.getMethod();


if( $method != "DESCRIBE" ) break;



# Get the chosen node


$node = connection.getnode();



# Instruct the client to retry directly against the chosen node


rtsp.redirect( "rtsp://" . $node . "/" . $path );


Appendix: How does DSR work?

It’s useful to have an appreciation of how DSR (and Delayed Binding) functions in order to understand some of its limitations (such as content inspection).

TCP overview

A simplified overview of a TCP connection is as follows:

Connection setup

  1. The client initiates a connection with a server by sending a ‘SYN’ packet.  The SYN packet contains a randomly generated client sequence number (along with other data).
  2. The server replies with a ‘SYN ACK’ packet, acknowledging the client’s SYN and sending its own randomly generated server sequence number.
  3. The client completes the TCP connection setup by sending an ACK packet to acknowledge the server’s SYN. 

The TCP connection setup is often referred to as a 3-way TCP handshake.  Think of it as the following conversation:

  1. Client: “Can you hear me?” (SYN)
  2. Server: “Yes.  Can you hear me?” (ACK, SYN)
  3. Client: “Yes” (ACK)

Data transfer

Once the connection has been established by the 3-way handshake, the client and server exchange data packets with each other.  Because packets may be dropped or re-ordered, each packet contains a sequence number; the sequence number is incremented for each packet sent.

When a client receives intact data packets from the server, it sends back an ACK (acknowledgement) with the packet sequence number.  When a client acknowledges a sequence number, it is acknowledging it received all packets up to that number, so ACKs may be sent less frequently than data packets. 

The server may send several packets in sequence before it receives an ACK (determined by the (“window size”), and will resend packets if they are not ACK’d rapidly enough.

Simple NAT-based Load Balancing

There are many variants for IP and MAC rewriting used in simple NAT-based load balancing.  The simplest NAT-based load balancing technique uses Destination-NAT (DNAT) and works as follows:

  1. The client initiates a connection by sending a SYN packet to the Virtual IP (VIP) that the load balancer is listening on
  2. The load balancer makes a load balancing decision and forwards the SYN packet to the chosen node.  It rewrites the destination IP address in the packet to the IP address of the node.  The load-balancer also remembers the load-balancing decision it made.
  3. The node replies with a SYN/ACK.  The load-balancer rewrites the source IP address to be the VIP and forwards the packet on to the remote client.
  4. As more packets flow between the client and the server, the load balancer checks its internal NAT table to determine how the IP addresses should be rewritten.

This implementation is very amenable to a hardware (ASIC) implementation.  The TCP connection is load-balanced on the first SYN packet; one of the implications is that the load balancer cannot inspect the content in the TCP connection before making the routing decision.

Delayed Binding

Delayed binding is a variant of the DNAT load balancing method.  It allows the load balancer to inspect a limited amount of the content before making the load balancing decision.

  1. When the load balancer receives the initial SYN, it chooses a server sequence number and returns a SYN/ACK response
  2. The load balancer completes the TCP handshake with the remote client and reads the initial few data packets in the client’s request.
  3. The load balancer reassembles the request, inspects it and makes the load-balancing decision.  It then makes a TCP connection to the chosen server, using DNAT (i.e., the client’s source IP address) and writes the request to the server.
  4. Once the request has been written, the load balancer must splice the client-side and server-side connection together.  It does this by using DNAT to forward packets between the two endpoints, and by rewriting the sequence numbers chosen by the server so that they match the initial sequence numbers that the load balancer used.

This implementation is still amenable to hardware (ASIC) implementation.  However, layer 4-7 tasks such as detailed content inspection and content rewriting are beyond implementation in specialized hardware alone and are often implemented using software approaches (such as F5's FastHTTP profile), albeit with significant functional limitations.

Direct Server Return

Direct Server Return is most commonly implemented using MAC address translation (layer 2).

A MAC (Media Access Control) address is a unique, unchanging hardware address that is bound to a network card.  Network devices will read all network packets destined for their MAC address.

Network devices use ARP (address resolution protocol) to announce the MAC address that is hosting a particular IP address.  In a Direct Server Return configuration, the load balancer and the server nodes will all listen on the same VIP.  However, only the load balancer makes ARP broadcasts to tell the upstream router that the VIP maps to its MAC address.

  1. When a packet destined for the VIP arrives at the router, the router places it on the local network, addressed to the load balancer’s MAC address.  The load balancer picks that packet up.
  2. The load balancer then makes a load balancing decision, choosing which node to send it to.  The load balancer rewrites the MAC address in the packet and puts it back on the wire.
  3. The chosen node picks the packet up just as if it were addressed directly to it.
  4. When the node replies, it sends its packets directly to the source node.  They are immediately picked up by the upstream router and forwarded on.

In this way, reply packets completely bypass the load balancer machine.

Why content inspection is not possible

Content inspection (delayed binding) is not possible because it requires that the load balancer first completes the three-way handshake with the remote source node, and possibly ACK’s some of the data packets.

When the load balancer then sends the first SYN to the chosen node, the node will respond with a SYN/ACK packet directly back to the remote source.  The load balancer is out-of-line and cannot suppress this SYN/ACK.  Additionally, the sequence number that the node selects cannot be translated to the one that the remote client is expecting.  There is no way to persuade the node to pick up in the TCP connection from where the load balancer left off.

For similar reasons, SYN cookies cannot be used by the load balancer to offload SYN floods from the server nodes.

Alternative Implementations of Direct Server Return

There are two alternative implementations of DSR (see this 2002 paper entitled 'The State of the Art'), but neither is widely used any more:

  • TCP Tunnelling: IP tunnelling (aka IP encapsulation) can be used to tunnel the client IP packets from the load balancer to the server.  All client IP packets are encapsulated within IP datagrams, and the server runs a tunnel device (an OS driver and configuration) to strip off the datagram header before sending the client IP packet up the network stack.

This configuration does not support delayed binding, or any equivalent means of inspecting content before making the load balancing decision

  • TCP Connection Hopping: Resonate have implemented a proprietary protocol (Resonate Exchange Protocol, RXP) which interfaces deeply with the server node’s TCP stack.  Once a TCP connection has been established with the Resonate Central Dispatch load balancer and the initial data has been read, the load balancer can hand the response side of the connection off to the selected server node using RXP.  The RXP driver on the server suppresses the initial TCP handshake packets, and forces the use of the correct TCP sequence number. 


This uniquely allows for content-based routing and direct server return in one solution.

Neither of these methods are in wide use now.