05-15-2013 05:41 AM
I've heard that Stingray uses a 'greedy' algorithm to schedule connection processing, and this makes it difficult to correlate CPU usage against connection load. Can you explain a little more?
Solved! Go to Solution.
05-15-2013 06:19 AM
Stingray does indeed use a 'greedy'-like algorithm to process connections. When you have a simple configuration and most of your connection processing is simple layer 7 management with few trafficscript rules, then the behavior of this algorithm will skew the CPU usage on your system. For lower traffic levels, Stingray may appear to be more heavily loaded than it really is, but as traffic levels increase, Stingray operates in a more efficient (less greedy) fashion to accommodate the higher load.
Consider the Lightning Fast Pizza Company (LFPC) - always striving to get your pizza delivered as quickly as is humanly possible.
LFPC operate a fleet of 8 delivery trucks (just like your Stingray server has 8 cores). Pizzas roll out of the conveyor oven as fast as they are ordered by customers, and when a delivery truck pulls into the bakery, it scoops up all the waiting pizzas and heads out to deliver them. Pizza deliveries take a matter of seconds - the trucks really are that fast!
When orders are slow, trucks will often wait in a line at the bakery, waiting for a pizza to pop out of the oven. As soon as one is cooked, it's collected and the truck departs. In this situation, the trucks are used quite inefficiently - they spend time waiting, and each truck only carries one pizza at a time - but the pizzas are delivered piping hot!
When orders speed up, several pizzas may pop out at the same time. There may not be a queue - a truck might arrive and half a dozen pizzas are there, waiting to be taken to customers. In this situation, we still deliver pizzas as fast as possible with the resources (trucks) we have, but these resources are used more efficiently.
So, just because all of the trucks are busy (delivering single pizzas at a time), that does not mean we are anywhere near the peak capacity of our pizza delivery business.
The trucks are like Stingray processes (Stingray runs one process per CPU core). The pizzas are like network events - they arrive unpredictably and must be handled by a stingray process.
Like the delivery trucks, each Stingray process runs in a loop - polling to see if there is work to do, taking the work off the queue and then processing it.
When there are only a few network events to process, the loop is very short and Stingray can execute it quickly because there's little work to do during each iteration. This uses a lot of CPU time (relatively) due to the frequency with which the loop executes, but it ensures that each connection is serviced as quickly as possible. Stingray becomes more CPU efficient as throughput increases. Each loop iteration can do more work: there are more active connections at each iteration, and there is more data pending for each connection (due to increased loop dwell time). This means Stingray can spend more time servicing requests in bigger chunks, rather then executing the loop itself.
Stingray greedily uses as much CPU as it can to process the current connections as quickly as possible. At low connection rates, the CPU utilization is disproportionately high, and even when Stingray gets near to 100% utilization, there is still plenty of spare capacity to process additional connections.
Sort of.... this is broadly true for simple workloads, where Stingray does not need to run complex rules, Aptimizer optimizations or Application Firewall policies against traffic. In simple situations, you'll see the Stingray system spending disproportionately large amounts of time in 'system' CPU time, and less in 'user' CPU time (use vmstat to monitor this), and it's mostly the 'system' time that is elastic.
When you have a complex configuration that performs a lot of layer 7 processing, the CPU profile is more linear as the load increases. Adding more CPU cores is a cost-efficient way to increase layer 7 and other compute intensive capabilities (SSL, compression, TrafficScript etc).
Of course, there are other bottlenecks in your Stingray system. Just as if the driveway to your bakery is too narrow and the delivery trucks can't get through, your network card and the I/O capabilities of your host can become a bottleneck that limits the peak capacity of your system. Other services (the UI, Rest API, SNMP and management daemons) also require CPU resources, although these will rarely dominate on a production system.