on 02-22-201310:20 AM - last edited on 10-28-201310:55 PM by bcm1
While considering what to write about for this blog, after my previous blog about a really cool NetIron 5.3 feature, I thought I’d stick with that trend for now and talk about another highly anticipated 5.3 feature. This one also happens to be MPLS-based and it’s often a required SP capability within an MPLS-based solution. It’s called Automatic Bandwidth Label-Switched Paths, or Auto-BW LSPs for short.
The Good News
As we know, RSVP-TE based networks are capable of considerable optimizations in terms of bandwidth reservations and traffic engineering. Operators can “plumb” their networks more intelligently, by reserving LSP bandwidth onto specific paths within their network for certain traffic types or overlay services. This makes their networks run more efficiently and with better performance. Operators and network managers like this, as they are getting the most out of the network. In other words, they are “getting their monies worth”.
The Not So Good News
While bandwidth reservations and traffic engineering provide great capabilities in MPLS networks, oftentimes the configured bandwidth reservations turn out to be less than optimal. In other words, it’s great for an operator to be able to say “for this LSP between these two endpoints I want to reserve 2.5 Gbps of bandwidth” and then make that happen in the network. The operator knows that the topology can support the 2.5 Gbps of capacity due to capacity planning exercises or from offline MPLS-based TE tools. Cool. (btw: It may be desirable to integrate sFlow data into the capacity planning capability or maybe even into an offline MPLS-based TE tool, but that’s a topic for a future blog.)
But what if there is a sustained increase in traffic, well above the reserved 2.5 Gbps, for that service? How are those surges handled by the LSP? Or what if the actual sustained traffic load is only 1.5 Gbps? In that case, no other LSP may be able to reserve the “extra” 1 Gbps of capacity since it is already reserved for that specific LSP. Now the operators’ network is plumbed in a less than optimal fashion. They are no longer “getting their monies worth” out of the network.
The (now) Gooder News
Here is where Auto-BW LSPs come onto the scene to save the day and make the operator a hero (again).
Auto-BW LSPs can solve both problems mentioned; handling a sustained surge in traffic, above what was previously planned for & being able to use “extra” capacity that is actually available but because it’s allocated to an LSP, it may not be available to be reserved by other LSPs.
Overview of Auto-BW
In its simplest definition: auto-bandwidth is an RSVP feature which allows an LSP to automatically and dynamically adjust its reserved bandwidth over time (ie: without operator intervention). The bandwidth adjustment uses the ‘make-before-break’ adaptive signaling method so that there is no interruption to traffic flow.
The new bandwidth reservation is determined by sampling the actual traffic flowing through the LSP. If the traffic flowing through the LSP is lower than the configured or current bandwidth of the LSP, the “extra” bandwidth is being reserved needlessly. Conversely, if the actual traffic flowing through the LSP is higher than the configured or current bandwidth of the LSP, it can potentially cause congestion or packet loss. With Auto-BW, the LSP bandwidth can be set to some arbitrary value (even zero) during initial setup time, and it will be periodically adjusted over time based on the actual bandwidth requirement. Sounds neat, huh? Here’s how it works…
First, determine what the desired sample-interval and adjustment-interval should be set at. The traffic rate is repeatedly sampled at each sample-interval. The default sampling interval is 5 minutes. The sampled traffic rates are accumulated over the adjustment-interval period, which has a default of 24 hours. The bandwidth of the LSP is then adjusted to the highest sampled traffic rate amongst the set of samples taken over the adjustment-interval. Note that the highest sampled traffic rate could be higher or lower than the current LSP bandwidth.
That’s basically it in a nutshell, but there are other knobs available to tweak for further control (as expected, operators want more knobs to tweak).
In order to reduce the number of readjustment events (ie: too many LSPs constantly re-sizing), we allow the operator to configure an adjustment-threshold. For example, if the adjustment-threshold is set to 25%, the bandwidth adjustment will only be triggered if the difference between the current bandwidth and the highest sampled bandwidth is more than 25% of the current bandwidth.
As mentioned, the adjustment-interval is typically set pretty high, at around 24 hours. But a high value can lead to a situation where the bandwidth requirement becomes suddenly high but the LSP waits for the remaining adjustment-interval period before increasing the bandwidth. In order to avoid this, we allow the operator to configure an overflow-limit. For example,if this value is set to 3, the LSP bandwidth readjustment will be triggered as soon as the adjustment-threshold is crossed in 3 consecutive samples.
The feature will also allow the operator to set a max-bandwidthandamin-bandwidthvalue to constrain the re-sizing of an LSP to within some reasonably determined bounds.
It is also possible to simply gather statistics based on the configured parameters, without actually adjusting the bandwidth of an LSP. This option involves setting the desired mode of operation to either monitor-only or monitor-and-signal.
The Auto-BW feature also provides a template-based configuration capability, where the operator can create a template of auto-bandwidth parameter values and apply the templates on whichever path of an LSP that needs the same configuration or across multiple LSPs.
This example below shows three adjustment-intervals on the horizontal axis and traffic load of the LSP on the vertical axis. After each adjustment-interval, the LSP bandwidth is automatically adjusted based upon the sampled traffic rate. The diagram also shows where the adjustment-threshold is set and exceeded by the actual traffic rate, which then results in the bandwidth adjustment. The red line is the bandwidth of the LSP, after being adjusted at each adjustment-interval.
In the example above, each adjustment-interval has three sample-intervals. The following graphic shows the relationship between the sample-interval and the adjustment-interval.
Auto-BW Solves Real Problems
Here is one simple but real-life scenario where Auto-BW can prevent packet loss. Consider the topology below.
In this topology there are two LSPs between PE1 and PE3; each with 400 Mbps of reserved bandwidth and each with actual traffic loads approaching the 400 Mbps reservations. The entire topology consists of 1 GbE links so both of these LSPs can share any of the links since their combined bandwidth reservation is 800 Mbps. Constrained Shortest-Path First (CSPF) calculations put both LSPs onto the PE1-PE2-PE3 path.
However, over time LSP2's actual traffic load grows in size to 650 Mbps. Now the combined traffic load of both LSPs exceeds the capacity of a 1 GbE link and packet loss is now happening on the PE1-PE2-PE3 path. RSVP, specifically the CSPF algorithm, cannot take the additional “actual” traffic load into account so both LSPs remain on the same path. This is not good. The reason for this is the Traffic-Engineering Database (TED) that CSPF uses to determine paths in the network is not updated by actual traffic loads on links or in LSPs. This is just how RSVP-TE works.
When Auto-BW is enabled, both LSPs are sampled to determine their actual traffic loads. After the adjustment-interval, LSP2 is re-sized to 650 Mbps. Now both LSPs can no longer share the same path as the CSPF algorithm will compute that the combined bandwidth of the LSPs now exceeds a single 1 GbE link. So, the result is that CSPF will look into the TED for a new path in the network from PE1 to PE3 that meets the bandwidth requirement of LSP2 and will traffic engineer that LSP onto the PE1-PE4-PE5-PE3 path.
The operator is now a hero (again) because the network is back to working at its maximum efficiency and performance levels.
As usual, any questions are comments are welcome! Also, if there are future topics related to MPLS that you would like to see a blog about, please post them in the comments and we will see what we can do.