vADC Docs

What happens when Stingray Traffic Manager receives more HTTP connections than the servers can handle?

by on ‎02-22-2013 09:55 AM (2,563 Views)

A complex topic that relates to the many techniques that Stingray uses to accelerate services, offload work so they function more efficiently and rate-limits excessive transactions to maintain acceptable levels of performance.


Question 1. Are client connections terminated on Stingray itself?

Yes. Stingray handles slow WAN-side connections very efficiently, and terminates them completely.

A  separate TCP connection is established with the server, with TCP options chosen to optimize the local link between Stingray and the server, and (generaly) minimal packet loss, latency and jitter.

This places the server into the optimal benchmark-like environment.

Question 2. Is there any type of request multiplexing done at all to the pool servicing a given virtual server? Or are client connections simply passthrough?

Yes. For HTTP, Stingray carefully manages a pool of connections to each node in a pool. When a request to a pool completes, provided the server does not close the connection, we then keep the connection open (it is idle). For subsequent requests, Stingray prefers to reuse an idle connection rather than creating a new one.

Stingray holds a maximum of max_idle_connections_pernode connections in the idle state (so that we don’t tie up too many resources (e.g. threads or processes) on the server) up to a limit of max_idle_connections in total (don’t use too many resources on the traffic manager), and we will only ever open a total of max_connections_pernode connections simultaneously (default – no limit) so that if the server has a concurrency limit (e.g. mpm_common - Apache HTTP Server: maxclients) we won’t overload it.

If the incoming request rate cannot be serviced within the max_connections_pernode limit, requests are queued internally in the traffic manager and released when a concurrency slot becomes available.

3. Are there any request buffering done, in both directions?

Full request buffering, up to the memory limits defined in max_client_buffer and max_server_buffer. We override these limits if you read a request or response using a trafficscript rule.

The implication is that if the client is slow, then we:

  1. Accept the client connection
  2. Read the entire request (slow, over slow, lossy WAN)
  3. Process it internally
  4. Choose a pool and node
  5. Select an idle connection to the node, or open a new connection
  6. Write the request to the node (fast, over local LAN)
  7. Node processes request and generates response
  8. Read the response from the node (fast, over the local LAN)
  9. Release the connection to the node (either close it or hold it as idle)
  10. Process the response internally
  11. Write the response back to the client (slow, over WAN)
  12. Either close the client connection, or (more typically) keep it open as a KeepAlive connection

The connection to the node only lasts for steps 5-9, i.e. is very quick. This lets the nodes process connections as quickly as possible, offloading the slow TCP connection on the WAN side; this is one aspect of the acceleration we deliver (putting the node in the optimal environment so that you can get benchmark performance from the node).

If we do not read the entire response, and it exceeds the max_server_buffer, then we will read as much as we can, write to the client and refill the buffer as fast as possible.

Finally, don't forget the potential to use caching on the Load Balancer / Traffic Manager to reduce the number of transactions the servers must handle.