vADC Docs

Hiding Application Errors

by on ‎02-21-2013 06:30 AM - edited on ‎06-11-2015 03:18 PM by PaulWallace (2,228 Views)

Mac-Bomb.pngWhat can you do if an isolated problem causes one or more of your application servers to fail? How can you prevent vistors to your website seeing the error, and instead send them a valid response?

 

This article shows how to use TrafficScript to inspect responses from your application servers and retry the requests against several different machines if a failure is detected.

 

The Scenario

 

Consider the following scenario. You're running a web based service on a cluster of four application servers, running .NET, Java, PHP, or some other application environment. An occasional error on one of the machines means that one particular application sometimes fails on that one machine. It might be caused by a runaway process, a race condition when you update configuration, or by failing system memory.

 

With Stingray, you can check the responses coming back from your application servers. For example, application errors may be identified by a '500 Internal Error' or '502 Bad Gateway' message (refer to the HTTP spec for a full list of error codes).

 

You can then write a Response rule that retries the request a certain number of times against different servers to see if it gets a better response before sending it back to the remote user.

 

$code = http.getResponseCode();  
if( $code >= 500 && $code != 503 ) {  
   # Not retrying 503s here, because they get retried  
   # automatically before response rules are run  
   if( request.getRetries() < 3 ) {  
      # Avoid the current node when we retry,  
      # if possible  
      request.avoidNode( connection.getNode() );  
      log.warn( "Request " . http.getPath() .  
                " to site " . http.getHostHeader() .  
                " from " . request.getRemoteAddr() .  
                " caused error " . http.getResponseCode() .  
                " on node " . connection.getNode() );  
      request.retry();  
   }  
}  

 

How does the rule work?

 

The rule does a few checks before telling Stingray to retry the request:

 

1. Did an error occur?

 

First of all, the rule checks to see if the response code indicated that an error occurred:

 

if( $code >= 500 && $code != 503 ) {  
   ...  
}  

 

If your service was prone to other types of error - for example, Java backtraces might be found in the middle of a response page - you could write a TrafficScript test for those errors instead.

 

2. Have we retried this request before?

 

Some requests may always generate an error response. We don't want to keep retrying a request in this case - we've got to stop at some point:

 

if( request.getRetries() < 3 ) {  
   ...  
} 

 

request.getRetries() returns the number of times that this request has been resent to a back-end node. It's initially 0; each time you call request.retry(), it is incremented.

 

This code will retry a request 3 times, in addition to the first time that it was processed.

 

3. Don't use the same node again!

 

When you retry a request, the load-balancing decision is recalculated to select the target node. However, you will probably want to avoid the node that generated the error before, as it may be likely to generate the error again.

 

request.avoidNode( connection.getNode() );

 

connection.getNode() returns the name of the node that was last used to process the request. request.avoidNode() gives the load balancing algorithm a hint that it should avoid that node. The hint is just advisory - if there are no other available nodes in the pool, that node will be used anyway.

 

4. Log what we're about to do.

 

This rule conceals problems with the service so that the end user does not see them. It it works well, these problems may never be found!

 

log.warn( "Request " . http.getPath() .  
            " to site " . http.getHostHeader() .  
            " from " . request.getRemoteAddr() .  
            " caused error " . http.getResponseCode() .  
            " on node " . connection.getNode() );  

 

It's a sensible idea to log the fact that a request caused an unexpected error so that the problem can be investigated later.

 

5. Retry the request

 

Finally, tell Stingray to resubmit the request again, in the hope that this time we'll get a better response:

 

request.retry();

 

And that's it.

 

Notes

 

If a malicious user finds an HTTP request that always causes an error, perhaps because of an application bug, then this rule will replay the malicious request against 3 additional machines in your cluster. This makes it easier for the user to mount a DoS-style attack against your site, because he only needs to send 1/4 of the number of requests.

 

However, the rule explicitly logs that a failure occured, and logs both the request that caused the failure and the source of the request. This information is vital when performing triage, i.e., rapid fault fixing. Once you have noticed that the problem exists, you can very quickly add a request rule to drop the bad request before it is ever processed:

 

if( http.getPath() == "/known/bad/request" ) connection.discard();
Contributors