vADC Docs

Load Testing Recommendations for Brocade Virtual Traffic Manager

by on ‎02-21-2013 11:06 AM - edited on ‎12-16-2015 05:48 AM by PaulWallace (3,817 Views)

Load Testing is a useful activity to stress test a deployment to find weaknesses or instabilities, and it’s a useful way to compare alternative configurations to determine which is more efficient.  It should not be used for sizing calculations unless you can take great care to ensure that the synthetic load generated by the test framework is an accurate representation of real-world traffic.

 

One useful application of load testing is to verify whether a configuration change makes a measurable difference to the performance of the system under test.  You can usually infer that a similar effect will apply to a production system.

 

Introducing zeusbench

 

The zeusbench load testing tool is a load generator that Brocade vADC engineering uses for our own internal performance testing.  zeusbench can be found in $ZEUSHOME/admin/bin.   Use the --help option to display comprehensive help documentation.

 

Typical uses include:

 

  • Test the target using 100 users who each repeatedly request the named URL; each user will use a single dedicated keepalive connection.  Run for 30 seconds and report the result:

 

# zeusbench –t 30 –c 100 –k http://hostSmiley Tongueort/path

 

  • Test the target, starting with a request rate of 200 requests and stepping up by 50 requests per second every 30 seconds, to a maximum of 10 steps up.  Run forever (until Ctrl-C), using keepalive connection; only use a keepalive connection 3 times, then discard.  Print verbose (per-second) progress reports:

 

# zeusbench –f –r 200,50,10,30 –k –K 3 –v http://hostSmiley Tongueort/path

 

For more information, please refer to Introducing Zeusbench

 

Load testing checklist


If you conduct a load-testing exercise, bear the following points in mind:

 

Understand your tests

 

Ensure that you plan and understand your test fully, and use two or more independent methods to verify that it is behaving the way that you intend.  Common problems to watch out for include:

 

  • Servers returning error messages rather than correct content; the test will only measure how quickly a server can error!;
  • Incorrect keepalive behavior; verify that connections are kept-alive and reused as you intended;
  • Connection rate limits and concurrency control will limit the rate at which Brocade will forward requests to the servers;
  • SSL handshakes; most simple load tests will perform an SSL handshake for each request; reusing SSL session data will significantly alter the result.

 

Verify that you have disabled or de-configured features that you do not want to skew the test results.  You want to reduce the configuration to the simplest possible so that you can focus on the specific configuration options you intend to test.  Candidates to simplify include:

 

  • Access and debug logging;
  • IP Transparency (and any other configuration that requires iptables and conntrack);
  • Optimization techniques like compression or other web content optimization;
  • Security policies such as service protection policies or application firewall rules;
  • Unnecessary request and response rules;
  • Advanced load balancing methods (for simplicity, use round robin or least connections).

 

It’s not strictly necessary to create a production-identical environment if the goal of your test is simply to compare various configuration alternatives – for example, which rule is quicker.  A simple environment, even if suboptimal, will give you more reliable test results.

 

Run a baseline test and find the bottleneck

 

Perform end-to-end tests directly from client to server to determine the maximum capacity of the system and where the bottleneck resides.  The bottleneck is commonly either CPU utilization on the server or client, or the capacity of the network between the two.

 

Re-run the tests through the traffic manager, with a basic configuration, to determine where the bottleneck is now.  This will help you to interpret the results and focus your tuning efforts.  Measure your performance data using at least two independent methods – benchmark tool output, activity monitor, server logs, etc – to verify that your chosen measurement method is accurate and consistent.  Investigate any discrepancies and ensure that you understand their cause, and disable the additional instrumentation before you run the final tests.

 

Important: Note that tests that do not overload the system can be heavily skewed by latency effects.  For example, a test that repeats the same fast request down a small number of concurrent connections will not overload the client, server or traffic manager, but the introduction of an additional hop (adding in the traffic manager for example) may double the latency and halve the performance result.  In reality, you will never see such an effect because the additional latency added by the traffic manager hop is not noticeable, particularly in the light of the latency of the client over a slow network.

 

Understand the different between concurrency and rate tests

 

zeusbench and other load testing tools can often operate in two different modes – concurrent connections tests (-c) and connection rate tests (-r).

 

The charts below illustrate two zeusbench tests against the same service; one where the concurrency is varied, and one where the rate is varied:

 

candr.jpg

Measuring transactions-per-second (left hand axis, blue) and response times (right hand axis, red) in concurrency and rate-based tests

 

The concurrency-based tests apply load in a stable manner, so are effective at measuring the maximum achievable transactions-per-second. However, they can create a backlog of requests at high concurrencies, so the response time will grow accordingly.

 

The rate-based tests are less prone to creating a backlog of requests so long as the request rate is lower then the maximum transactions-per-second. For lower request rates, they give a good estimate of the best achievable response time, but they quickly overload the service when the request rate nears or exceeds the maximum sustainable transaction rate.

 

Concurrency-based tests are often quicker to conduct (no binary-chop to find the optimal request rate) and give more stable results.  For example, if you want to determine if a configuration change affects the capacity of the system (by altering the CPU demands of the traffic manager or kernel), it’s generally sufficient to find a concurrency value that gives a good, near-maximum result and repeat the tests with the two configurations.

 

Always check dmesg and other OS logs

 

Resource starvation (file descriptors, sockets, internal tables) will all affect load testing results and may not be immediately obvious.  Make a habit of following the system log and dmesg regularly.

 

Remember to tune and monitor your clients and servers as well as the Traffic Manager; many of the kernel tunables descried above are also relevant to the clients and servers.

Contributors