vADC Docs

Dynamic rate shaping slow applications

by on ‎02-21-2013 03:58 AM - edited on ‎06-11-2015 12:40 PM by PaulWallace (6,378 Views)

shockabsorber.jpgIn a recent conversation, a user wished to use Stingray's rate shaping capability to throttle back the requests to one part of his web site that was particularly sensitive to high traffic volumes (think a CGI, JSP Servlet, or other type of dynamic application). This article describes how you might go about doing this, testing and implementing a suitable limit using Service Level Monitoring, Rate Shaping and some TrafficScript magic.

 

The problem

 

Imagine that part of your website is particularly sensitive to traffic load and is prone to overloading when a crowd of visitors arrives. Connections queue up, response time becomes unacceptable and it looks like your site has failed.

 

If your website were a tourist attraction or a club, you’d employ a gatekeeper to manage entry rates. As the attraction began to fill up, you’d employ a queue to limit entry, and if the queue got too long, you’d want to encourage new arrivals to leave and return later rather than to join the queue.

 

This is more-or-less the solution we can implement for a web site. In this worked example, we're going to single out a particular application (named search.cgi) that we want to control the traffic to, and let all other traffic (typically for static content, etc) through without any shaping.

 

The approach

 

We'll first measure the maximum rate at which the application can process transactions, and use this value to determine the rate limit we want to impose when the application begins to run slowly.

 

Using Stingray's Service Level Monitoring classes, we'll monitor the performance (response time) of the search.cgi application. If the application begins to run slower than normal, we'll deploy a queuing policy that rate-limits new requests to the application. We'll monitor the queue and send a 'please try later' message when the rate limit is met, rather than admitting users to the queue and forcing them to wait.

 

Our goal is to maximize utilization (supporting as many transactions as possible), but minimise response time, returning a 'please wait' message rather than queueing a user.

 

Measuring performance

 

We first use zeusbench to determine the optimal performance that the application can achieve. We several runs, increasing the concurrency until the performance (responses-per-second) stabilizes at a consistent level:

 

zeusbench –c  5 –t 20 http://host/search.cgi

zeusbench –c  10 –t 20 http://host/search.cgi

zeusbench –c  20 –t 20 http://host/search.cgi

 

... etc

 

Run:

 

zeusbench –c 20 –t 20 http://host/search.cgi

 

zb1.jpg

 

From this, we conclude that the maximum number of transactions-per-second that the application can comfortably sustain is 100.

 

We then use zeusbench to send transactions at that rate (100 / second) and verify that performance and response times are stable. Run:

 

zeusbench –r 100 –t 20 http://host/search.cgi

 

zb2.jpg

 

Our desired response time can be deduced to be approximately 20ms.

 

Now we perform the 'destructive' test, to elicit precisely the behaviour we want to avoid. Use zeusbench again to send requests to the application at higher than the sustainable transaction rate:

 

zeusbench –r 110 –t 20 http://host/search.cgi

 

zb3.jpg

 

Observe how the response time for the transactions steadily climbs as requests begin to be queued and the successful transaction rate falls steeply. Eventually when the response time falls past acceptable limits, transactions are timed out and the service appears to have failed.

 

This illustrates how sensitive a typical application can be to floods of traffic that overwhelm it, even for just a few seconds. The effects of the flood can last for tens of seconds afterwards as the connections complete or time out.

 

Defining the policy

 

We wish to implement the following policy:

 

  • If all transactions complete within 50 ms, do not attempt to shape traffic.
  • If some transactions take more than 50 ms, assume that we are in danger of overload. Rate-limit traffic to 100 requests per second, and if requests exceed that rate limit, send back a '503 Too Busy' message rather then queuing them.
  • Once transaction time comes down to less than 50ms, remove the rate limit.

 

Our goal is to repeat the previous zeusbench test, showing that the maximum transaction rate can be sustained within the desired response time, and any extra requests receive an error message quickly rather than being queued.

 

Implementing the policy

 

The Rate Class

 

Create a rate shaping class named Search limit with a limit of 100 requests per second.

 

zb-rateclass.jpg

 

The Service Level Monitoring class

 

Create a Service Level Monitoring class named Search timer with a target response time of 50 ms.

 

zb-slmclass.jpg

 

If desired, you can use the Activity monitor to chart the percentage of requests that confirm, i.e. complete within 50 ms while you conduct your zeusbench runs. You’ll notice a strong correlation between these figures and the increase in response time figures reported by zeusbench.

 

zbslm.jpg

The TrafficScript rule

 

Now use these two classes with the following TrafficScript request rule:

 

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# We're only concerned with requests for /search.cgi 
$url = http.getPath(); 
if( $url != "/search.cgi" ) break; 
   
# Time this request using the Service Level Monitoring class 
connection.setServiceLevelClass( "Search timer" ); 
   
# Test if any of the recent requests fell outside the desired SLM threshold 
if( slm.conforming( "Search timer" ) < 100 ) { 
   if( rate.getBacklog( "Search limit" ) > 0 ) { 
      # To minimize response time, always send a 503 Too Busy response if the  
      # request exceeds the configured rate of 100/second. 
      # You could also use http.redirect() to a more pleasant 'sorry' page, but 
      # 503 errors are easier to monitor when testing with ZeusBench 
      http.sendResponse( 503, "Too busy",  
         "<h1>We're too busy!!!</h1>"
         "Pragma: no-cache" ); 
   } else
      # Shape the traffic to 100/second 
      rate.use( "Search limit" ); 
   
}  

 

Testing the policy

 

Rerun the 'destructive' zeusbench run that produced the undesired behaviour previously:

 

Running:

 

zeusbench –r 110 –t 20 http://host/search.cgi

 

zb4.jpg

 

Observe that:

 

  • Stingray processes all of the requests without excessive queuing; the response time stays within desired limits.
  • Stingray typically processes 110 requests per second. There are approximately 10 'Bad' responses per second (these are the 503 Too Busy responses generated by the rule), so we can deduce that the remaining 100 (approx) requests were served correctly.

 

These tests were conducted in a controlled environment, on an otherwise-idle machine that was not processing any other traffic. You could reasonably expect much more variation in performance in a real-world situation, and would be advised to set the rate class to a lower value than the experimentally-proven maximum.

 

In a real-world situation, you would probably choose to redirect a user to a 'sorry' page rather than returning a '503 Too Busy' error. However, because ZeusBench counts 4xx and 5xx responses as 'Bad', it is easy to determine how many requests complete successfully, and how many return the 'sorry' response.

 

For more information on using ZeusBench, take a look at the Introducing Zeusbench article.

Contributors