I recently returned from IETF 86 and would like to update the folks in this community with a brief synopsis of the event. Overall, it was a very well attended, interactive and relevant event! BuRead more...
I recently returned from IETF 86 and would like to update the folks in this community with a brief synopsis of the event. Overall, it was a very well attended, interactive and relevant event! But I think that’s pretty much the norm these days, particularly with all the interest in SDN related technologies and use cases. I will post a separate blog in our SDN community on the SDN related IETF activities, so please go there for that update. In this blog, I will focus on IETF activities related to service providers.
I’ll start off with the discussion around IPv6 in MPLS networks. While we all know that there has been some interest and IETF standards work in the area of MPLS/IPv6, it has yet to garner much real deployment interest. Techniques for providing IPv6 over IPv4-based MPLS networks, such as 6PE and 6VPE, have solved some of the issues with IPv6 and MPLS. However, it appears the IETF community is now getting behind full IPv6 support in MPLS. This would include native IPv6 LDP and RSVP-TE support. Some folks believe that although full IPv6 MPLS networks may not be needed for another 3-5 years, the IETF community should get on board now and start officially driving this. The MPLS WG will start formally tracking progress in this area, as it’s deemed important work.
Entropy labels to improve load balancing in MPLS networks was briefly discussed and this appears to be a done deal in terms of standards (RFC 6790) and having broad community support and consensus.
TRILL over Pseudo-Wires was discussed in the PWE3 WG. This is cool stuff and appears to have some degree of consensus. This basically would allow a TRILL domain in one data center to have layer-2 connectivity to another TRILL domain in another data center.
A similar topic of VXLAN over L2 VPNs was discussed in the L2VPN WG. This would provide a layer-2 MPLS connection between VXLAN or NVGRE logical overlay networks. This is also a pretty cool use case and this appears to be a needed solution if VXLAN/NVGRE solutions become more widely deployed in data centers. A somewhat related topic was discussed on how Ethernet VPNs (E-VPNs) could be leveraged to provide a data center overlay solution. In this context, E-VPNs are based on MPLS technologies. While this solution revolves around Network Virtualization Overlays, it was discussed in the L2VPN WG due to it leveraging MPLS technologies. This Internet Draft was also discussed in the NVO3 WG.
Interesting work on MPLS forwarding compliance and performance requirements was discussed in the different MPLS WGs. This work intends to document the MPLS forwarding paradigm for the perspective of the MPLS implementer, MPLS developer and MPLS network operator. Very useful work!
In the L3VPN WG, there were quite a few IDs that overlap with the NVO3 WG and data center overlay technologies. The general support for MPLS-based solutions for data center overlay architectures appears to be gathering momentum. From a high level, this does make sense as MPLS VPN technologies provide a logical network overlay in the wide area of service provider networks. As data center overlay architectures evolve, why not leverage this work and experience? I will discuss more on this topic in my SDN community blog.
To wrap up the MPLS activities; there were a number of other MPLS-related developments and enhancements that I won’t go into detail about here. Areas such as P2MP LSPs, special purpose MPLS label allocations, OAM, and additional functionality for advertising MPLS labels into the IGP (like an enhanced “forwarding adjacency”) were all discussed and are progressing at various stages though the IETF standards process.
Another WG that generated a fair amount of activity and interest is PCE. This is also an area of IETF work that is somewhat related to the SDN solution space. This WG is focused on how to enhance traffic-engineering decisions in MPLS networks. PCE functionality would “recommend” traffic-engineered LSPs for the network but would not be responsible for the actual instantiation of those LSPs into the network. That would be done by another function; and is deemed outside the scope of the PCE WG.
The WG agreed to make the PCE MIB “read-only”. This makes sense since the MIB is not a good place to implement PCE functionality. They also discussed P2MP LSPs, Service aware LSPs and even the support of wavelength switched optical networks. They also agreed that “Stateful” PCE was indeed in scope and in the charter.
Overall, nothing really ground breaking to report on in the area of routing activity at this IETF. One topic worth a mention is the North-bound distribution of link-state and TE routing information using BGP. This area is somewhat related to the SDN solution space, as it could provide upper layer applications (such as ALTO or PCE) the knowledge of link-state topology state from the network. This would allow those applications to make more intelligent traffic-engineering decisions.
Another area of routing that is interesting to mention is having the ability to make routing decisions based upon additional link-state metric information; such as latency, jitter and loss. This seems like a very logical evolution of IP routing.
And to wrap up the routing activity; as expected, the security of inter-domain routing continues to generate lots of interest. It was interesting that immediately after the IETF, there was a paper published by Boston University on the security implications of the Resource Public Key Infrastructure (RPKI) being discussed on the SIDR mailing list. This paper seems to re- ignite some of the controversy around secure routing.
I2RS was also very well represented and generated lots of interesting dialogue and debate.
This WG is fairly new. The primary goal of this WG is to provide a real-time interface into the IP routing system. This interface will not only provide a configuration and management capability into the routing system, but will also allow the retrieval of useful information from the routing system. Quite a bit of the discussion was centered around what type of state information needs to be injected into the routing system, what type of state information should be extracted from the routing system, and interesting enough, what specifically is the “routing system”. The routing system is generally understood to be the Routing Information Base (RIB) in IP routers but there was a good amount of debate on exactly what constitutes a RIB, what information does it hold and what might the interface to this RIB look and behave like. It appears this WG may have taken a step back to re-group and get more focused before moving on to solutions too rapidly.
There were five use case drafts that were presented and discussed. So, while this WG may have taken a step back to more clearly understand and define the problem space, they are also continuing to move forward with relevant use case definitions and then onward to solutions.
So, that wraps up this short update on the IETF 86 SP related activities. I should mention before closing that the I2RS WG intends to hold an interim meeting after the ONS event in April, so if you are attending the ONS event you may want to attend the I2RS interim meeting as well.
I’d like to follow Greg’s great blog from last week with a related topic. Like his blog, this blog will be focused on router hardware (unlike my previous blogs which were NetIron software related). The topic at hand is a brief discussion of the differences and the pros/cons of FPGA and ASIC technology. I’ll also briefly touch on the advantages of each of these technologies as they apply to high-end IP routers.
FPGAs (Field Programmable Gate Arrays) are specialized chips that are programmed to perform very specific functions in hardware. An FPGA is basically a piece of programmable logic.The first FPGA was invented in 1985, so this technology has been around for quite some time. Rather than executing a function in software, the same function can be implemented in an FPGA and executed in hardware. One can think of an FPGA as “soft-hardware”, since it can be reprogrammed after manufacturing. How many of you remember the bygone days of software-based IP routers? If you do, then you should also remember how poorly the Internet performed at that time! Performance was poor in software-based routers due to the fact that a centralized CPU executed all functions, both the control/management plane functions and the data plane functions of the router. Today, all modern routers execute the data plane functions in hardware; and more frequently, some vendors are moving certain control plane functions into the router hardware as well. The Bi-Directional Forwarding (BFD) protocol is one example of this; where portions of the BFD keep-alive mechanisms are implemented in the line card of the router.
While FPGAs contain vast amounts of programming logic and millions of gates, one thing to note is that there is some programming logic in an FPGA that is not used for the “customer facing” or "mission specific" application or function. In other words, not all the logic in an FPGA is designed to be directly used by the application the FPGA is providing for the customer. There are additional gates needed to connect all the internal logic that is needed to make it programmable; so an FPGA is not fully optimized in terms of “customer facing” logic.
Now, what I find interesting is that some people will still claim that FPGAs cannot scale to the speeds that are required in the today’s Internet. However, Brocade has proven this claim to be quite false and has been shipping line-rate, high-end performance routers using FPGAs for over 10 years. As shown in the line card diagram in Greg’s blog, an FPGA in this context is really a programmable network processor.
One great advantage of an FPGA is its flexibility. By flexibility, I’m referring to the ability to rapidly implement or reprogram the logic in an FPGA for a specific feature or capability that a SP customer requires. When a networking vendor has a new feature that it wants to implement, the vendor may have the choice of deciding whether to put the feature in software or hardware. This is not always the case; for example, OSPF needs to be run in the control plane of the router and cannot be implemented in hardware. The question of whether to implement something in software or hardware basically comes down to a decision of flexibility versus scalability (and cost is always part of that decision process, as one would expect). Implementing something in software usually results in a rapid implementation timeframe, but often at the detriment to performance. As usual, there is always a trade-off to be made. However, if the vendor supports programmable network processors, they can implement the feature in hardware with no detriment to performance. While it takes more time to get the feature into an FPGA rather than implementing it in software, the time-to-market timeframe is still considerably less than doing a similar feature in an ASIC. The real advantage of this becomes evident with deployed systems in a production network. When a customer requires a feature that needs to be implemented in the forwarding plane of a router, once this feature is developed by the vendor the deployed systems in the field can be upgraded to use the new feature. This requires only a software upgrade of the system; no new hardware or line cards would be required. The routers’ software image contains code for the FPGAs, as well as the code for the control and management plane of the router.
Back to the performance question: Industry has shown that high-end FPGAs are growing in density while handling higher-speed applications and more complex designs. Furthermore, if you look at the evolution of FPGAs over the years, they follow Moore's Law just like CPUs have been doing in terms of the amount of logic that you can implement into them. Recent history has shown that FPGA development in terms of density is on an exponential growth curve.
FPGAs can also be used for developing a “snapshot” version of a final ASIC design. In this way, FPGAs can be re-programmed as needed until the final specification is done. The ASIC can then be manufactured based on the FPGA design.
While ASICs have very high density in terms of logic gates on the chip, the result of higher scalability in terms of the same power metric can give ASICs a competitive edge over an FPGA. One thing to note is that an ASIC is designed to be fully optimized in terms of gates and logic. All the internal structures are used for customer facing or mission specific applications or functions. So, while an ASIC may consume more power per unit die size than an FPGA, this power is amortized over a higher density solution; and hence, provides better power efficiency.
Compare/Contrast of FPGA-ASIC
So, FPGAs and ASICs are both specialized chips that perform complex calculations and functions at high levels of performance. FPGAs, however, can be re-programmed after fabrication, allowing the line card's feature set to be upgraded in the field after deployment. Being able to upgrade the data plane of a deployed router extends the useful lifespan of the system; which correlates to extended investment protection. Since an ASIC is not re-programmable, an ASIC-based line card cannot be upgraded in the field. This is a huge differentiator between the two technologies.
One excellent real-world example of this is when Brocade introduced support for 64 ports in a single LAG. This is industry leading scale (64 10GbE ports in a single LAG!) and since this functionality is implemented in the forwarding plane of the line card, it required reprogramming the Brocade network processor. While this type of capability is in the hardware of the router, it was implemented with a system software upgrade and no hardware needed to be replaced.
There are network scenarios or use cases where it makes more sense to have an FPGA-based product and there are use cases when it makes more sense to have an ASIC-based product. For example, a SP may determine that a high density solution is more important than a solution that provides quicker feature velocity and, thus, may choose an ASIC-based product. ASIC-based line cards are often denser in terms of numbers of ports and the cores of SP networks typically do not require high feature velocity. Most of the feature velocity in today’s SP networks is at the edge of the network (ie: at the PE router) or in the data center, where innovation is currently happening at a rapid pace. The general flexibility of an FPGA results in time-to-market advantages for feature implementation and soft-hardware bug fixes.
For smaller applications and/or lower production volumes, FPGAs may be more cost effective than an ASIC. The non-recurring engineering (NRE) cost of an ASIC can run into the millions of dollars. Conversely, in high volume applications the front-end R&D costs of an ASIC are offset by a lower cost to manufacture and produce. For example, in high-end IP core routers, ASIC-based line cards are more economical due to the lower manufacturing cost, combined with the higher port density of the line card that ASICs can provide.
As costs related to ASIC development are increasing, some recent trends may suggest that FPGAs could be a better alternative even for high volume applications that traditionally used ASICs. It is unclear whether this trend is indeed sustaining or a somewhat temporary aberration.
To summarize the primary differences between FPGA and ASIC based line cards; at the highest level it basically comes down to a scalability versus a flexibility question (again, with cost a large contributing factor). ASICs are advantageous when it comes to high port density applications. FPGAs are advantageous when it comes to feature velocity with a shortened time-to-market requirement. In high end core routers, high density ASIC-based line cards can provide higher density at a lower cost than FPGA-based line cards. So, it’s based upon the use case and network application to determine which type of technology would be favored over the other.
As usual, any questions are comments are welcome!
It’s hard to believe that Ethernet is turning 40 this year, isn’t it? Since its conception by Bob Metcalfe and the team of engineers at XEROX PARC in the 1970s, Ethernet technology has continued to evolve to meet the increasing bandwidth, media diversity, cost, and reliability demands of today’s networks. The next Ethernet evolution has officially started, and I'm excited to follow the latest developments on this new technology that will enable networks to support even higher capacities.
“Here is more rough stuff on the ALTO ALOHA network.” Memo sent by Bob Metcalfe on May 22, 1973.
I wrote about 400 GbE in my blog recently as the next likely Ethernet speed, and now it’s official. Last week at the March 2013 IEEE 802 Plenary Session, 400 GbE became an official IEEE 802.3 Study Group that will start work on developing the new standard. Though 100 GbE is only a few years old, it’s important that we start working on the next speed now, so that we have the technology shipping when there is demand from network operators to deploy higher speed Ethernet.
The 400 Gb/s Ethernet Study Group is starting with strong industry consensus this time, which will enable the standard to be developed faster than before. The 400 GbE Call-For-Interest presentation was given last week to measure the interest in starting a 400 GbE Study Group in the IEEE. Based on the hard work of the IEEE 802.3 Ethernet Bandwidth Assessment (BWA) Ad Hoc and the IEEE 802.3 Higher Speed Ethernet (HSE) Consensus Ad Hoc, there was clear consensus on the direction the industry should take on the next Ethernet speed. The straw polls and official vote on the motion to authorize the Study Group formation were all in favor with a few abstains, which showed a high degree of consensus from the individuals and companies represented. This was not so with the last Ethernet speed evolution, which was simply called the Higher Speed Study Group (HSSG) when it was formed. First, the HSSG had to analyze the market and come up with feasible higher speed solutions before even deciding on the speed. This made the standardization process much longer as the HSSG debated 40 GbE and 100 GbE, and eventually standardized both speeds for different applications. Since we are already starting the 400 Gb/s Ethernet Study Group with a clear speed objective in mind, the standardization process should be much faster. This means the Study Group could have the 400 GbE standard finished in 2016 with the first interfaces available on the market soon after.
Stay tuned for more updates as we follow the road to 400 GbE! If you happen to be in the Bay Area next week, check out the Ethernet 40th Anniversary Celebration at the Ethernet Technology Summit on Wednesday evening at 6 pm, April 3, 2013.
While considering what to write about for this blog, after my previous blog about a really cool NetIron 5.3 feature, I thought I’d stick with that trend for now and talk about another highly anticipated 5.3 feature. This one also happens to be MPLS-based and it’s often a required SP capability within an MPLS-based solution. It’s called Automatic Bandwidth Label-Switched Paths, or Auto-BW LSPs for short.
The Good News
As we know, RSVP-TE based networks are capable of considerable optimizations in terms of bandwidth reservations and traffic engineering. Operators can “plumb” their networks more intelligently, by reserving LSP bandwidth onto specific paths within their network for certain traffic types or overlay services. This makes their networks run more efficiently and with better performance. Operators and network managers like this, as they are getting the most out of the network. In other words, they are “getting their monies worth”.
The Not So Good News
While bandwidth reservations and traffic engineering provide great capabilities in MPLS networks, oftentimes the configured bandwidth reservations turn out to be less than optimal. In other words, it’s great for an operator to be able to say “for this LSP between these two endpoints I want to reserve 2.5 Gbps of bandwidth” and then make that happen in the network. The operator knows that the topology can support the 2.5 Gbps of capacity due to capacity planning exercises or from offline MPLS-based TE tools. Cool. (btw: It may be desirable to integrate sFlow data into the capacity planning capability or maybe even into an offline MPLS-based TE tool, but that’s a topic for a future blog.)
But what if there is a sustained increase in traffic, well above the reserved 2.5 Gbps, for that service? How are those surges handled by the LSP? Or what if the actual sustained traffic load is only 1.5 Gbps? In that case, no other LSP may be able to reserve the “extra” 1 Gbps of capacity since it is already reserved for that specific LSP. Now the operators’ network is plumbed in a less than optimal fashion. They are no longer “getting their monies worth” out of the network.
The (now) Gooder News
Here is where Auto-BW LSPs come onto the scene to save the day and make the operator a hero (again).
Auto-BW LSPs can solve both problems mentioned; handling a sustained surge in traffic, above what was previously planned for & being able to use “extra” capacity that is actually available but because it’s allocated to an LSP, it may not be available to be reserved by other LSPs.
Overview of Auto-BW
In its simplest definition: auto-bandwidth is an RSVP feature which allows an LSP to automatically and dynamically adjust its reserved bandwidth over time (ie: without operator intervention). The bandwidth adjustment uses the ‘make-before-break’ adaptive signaling method so that there is no interruption to traffic flow.
The new bandwidth reservation is determined by sampling the actual traffic flowing through the LSP. If the traffic flowing through the LSP is lower than the configured or current bandwidth of the LSP, the “extra” bandwidth is being reserved needlessly. Conversely, if the actual traffic flowing through the LSP is higher than the configured or current bandwidth of the LSP, it can potentially cause congestion or packet loss. With Auto-BW, the LSP bandwidth can be set to some arbitrary value (even zero) during initial setup time, and it will be periodically adjusted over time based on the actual bandwidth requirement. Sounds neat, huh? Here’s how it works…
First, determine what the desired sample-interval and adjustment-interval should be set at. The traffic rate is repeatedly sampled at each sample-interval. The default sampling interval is 5 minutes. The sampled traffic rates are accumulated over the adjustment-interval period, which has a default of 24 hours. The bandwidth of the LSP is then adjusted to the highest sampled traffic rate amongst the set of samples taken over the adjustment-interval. Note that the highest sampled traffic rate could be higher or lower than the current LSP bandwidth.
That’s basically it in a nutshell, but there are other knobs available to tweak for further control (as expected, operators want more knobs to tweak).
In order to reduce the number of readjustment events (ie: too many LSPs constantly re-sizing), we allow the operator to configure an adjustment-threshold. For example, if the adjustment-threshold is set to 25%, the bandwidth adjustment will only be triggered if the difference between the current bandwidth and the highest sampled bandwidth is more than 25% of the current bandwidth.
As mentioned, the adjustment-interval is typically set pretty high, at around 24 hours. But a high value can lead to a situation where the bandwidth requirement becomes suddenly high but the LSP waits for the remaining adjustment-interval period before increasing the bandwidth. In order to avoid this, we allow the operator to configure an overflow-limit. For example,if this value is set to 3, the LSP bandwidth readjustment will be triggered as soon as the adjustment-threshold is crossed in 3 consecutive samples.
The feature will also allow the operator to set a max-bandwidthandamin-bandwidthvalue to constrain the re-sizing of an LSP to within some reasonably determined bounds.
It is also possible to simply gather statistics based on the configured parameters, without actually adjusting the bandwidth of an LSP. This option involves setting the desired mode of operation to either monitor-only or monitor-and-signal.
The Auto-BW feature also provides a template-based configuration capability, where the operator can create a template of auto-bandwidth parameter values and apply the templates on whichever path of an LSP that needs the same configuration or across multiple LSPs.
This example below shows three adjustment-intervals on the horizontal axis and traffic load of the LSP on the vertical axis. After each adjustment-interval, the LSP bandwidth is automatically adjusted based upon the sampled traffic rate. The diagram also shows where the adjustment-threshold is set and exceeded by the actual traffic rate, which then results in the bandwidth adjustment. The red line is the bandwidth of the LSP, after being adjusted at each adjustment-interval.
In the example above, each adjustment-interval has three sample-intervals. The following graphic shows the relationship between the sample-interval and the adjustment-interval.
Auto-BW Solves Real Problems
Here is one simple but real-life scenario where Auto-BW can prevent packet loss. Consider the topology below.
In this topology there are two LSPs between PE1 and PE3; each with 400 Mbps of reserved bandwidth and each with actual traffic loads approaching the 400 Mbps reservations. The entire topology consists of 1 GbE links so both of these LSPs can share any of the links since their combined bandwidth reservation is 800 Mbps. Constrained Shortest-Path First (CSPF) calculations put both LSPs onto the PE1-PE2-PE3 path.
However, over time LSP2's actual traffic load grows in size to 650 Mbps. Now the combined traffic load of both LSPs exceeds the capacity of a 1 GbE link and packet loss is now happening on the PE1-PE2-PE3 path. RSVP, specifically the CSPF algorithm, cannot take the additional “actual” traffic load into account so both LSPs remain on the same path. This is not good. The reason for this is the Traffic-Engineering Database (TED) that CSPF uses to determine paths in the network is not updated by actual traffic loads on links or in LSPs. This is just how RSVP-TE works.
When Auto-BW is enabled, both LSPs are sampled to determine their actual traffic loads. After the adjustment-interval, LSP2 is re-sized to 650 Mbps. Now both LSPs can no longer share the same path as the CSPF algorithm will compute that the combined bandwidth of the LSPs now exceeds a single 1 GbE link. So, the result is that CSPF will look into the TED for a new path in the network from PE1 to PE3 that meets the bandwidth requirement of LSP2 and will traffic engineer that LSP onto the PE1-PE4-PE5-PE3 path.
The operator is now a hero (again) because the network is back to working at its maximum efficiency and performance levels.
As usual, any questions are comments are welcome! Also, if there are future topics related to MPLS that you would like to see a blog about, please post them in the comments and we will see what we can do.
I’d like to continue a previous discussion about Brocades Multi-Chassis Trunking (MCT) technology. Please see the earlier blog: MCT with VPLS. The MCT w/VPLS capability was part of NetIron Software Release 5.3. In NetIron Software Release 5.4, we added a powerful enhancement to provide Multicast over MCT.
A diagram of this capability is shown below. In the diagram, there are two MLXe routers who are MCT peers. They have multicast receivers downstream and multicast sources upstream.
The diagram shows that the MCT Cluster Client Edge Ports (CCEPs) now have the ability to support the Internet Group Management Protocol (IGMP) and the Protocol Independent Multicast (PIM) protocol. As you recall, the CCEPs are the MCT customer facing edge ports. The diagram shows multiple host receivers behind two layer-2 switches, who are the MCT clients, and the host receivers are sending IGMP join requests toward the network. IGMP is used by hosts to establish and manage their multicast group membership. The MCT client layer-2 switches are directly connected to the CCEPs of the MCT cluster. Each layer-2 switch is doing standard link aggregation (LAG) to connect to both of the MLXe routers. As with all MCT configurations, the client layer-2 switches are unaware that they are connected to two MLXe routers; this is the active/active redundancy that MCT provides.
Both of the MCT peers will receive IGMP join requests and will subsequently send PIM joins toward the multicast Rendezvous Point (RP) or multicast source, depending on whether PIM-SM (*, G) or PIM-SSM (S, G) is being used. So, PIM runs on the network facing interfaces of the MCT peers, including the Inter-Chassis Link (ICL). The MCT ICL is also used to synchronize the IGMP membership state between the MCT peers. The result is that both of the MCT peers will install the correct multicast membership state. The diagram shows a few of the scenarios that are possible; where sources can be somewhere inside the IP network or directly attached to either MLXe router. However, the sources and receivers can actually be reversed such that sources are behind the MCT client layer-2 switches and the host receivers are either somewhere in the IP network or directly attached to either MLXe router. All variations are supported.
A hashing algorithm determines if the active multicast outgoing interface (OIF) is a local CCEP interface or the ICL. As shown in the diagram below, multicast traffic can arrive on both MLXe routers from the source but the MLXe with the local forwarding state is the only one that forwards traffic to the host receiver.
Since both MCT peers are properly synchronized, forwarding is performed as expected on the multicast shortest-path tree.
Some of the benefits of this compelling enhancement are:
As an example of a possible failure scenario; If CCEP2 fails, the MCT peers will remain synchronized such that the redundant MCT peer immediately takes over as the active forwarding device.
So, you can see that this is a very powerful feature as it provides for an active/active redundancy capability while maintaining the optimal multicast forwarding tree under failure scenarios. This is no easy feat!
Continue to stay tuned to this page for additional NetIron enhancements, as Brocade continues to lead the industry in innovation!
This is a great opportunity for me to introduce a really cool and highly anticipated feature that is part of the Brocade NetIron 5.3 Software Release. The official release date for this software is sometime next week, but because you are part of this awesome SP community, you get a sneak peak!
While 5.3 contains many new innovative features that our SP customers have been clamoring for, I thought I’d pick one in particular and write a bit about it here. The feature is Multi-Chassis Trunking integration with Virtual Private LAN Service, or MCT w/VPLS for short.
First, a short background refresher on what problem MCT solves. (BTW: Brocade has been supporting MCT for well over a year now.)
Brocade developed MCT to provide a layer-2 “active/active” topology in the data center without the need to run a spanning-tree protocol (STP). STP has traditionally been used to prevent layer-2 forwarding loops when there are alternate paths in a layer-2 switched domain. However, STP has its issues in terms of convergence, robustness, scalability, etc. Orthogonal to STP, link aggregation (IEEE 802.3ad) is also often deployed to group or bundle multiple layer-2 links together. The advantages of link aggregation are:
So, MCT leverages standards-based link aggregation but is capable of providing this “across” two switch chassis instead of just one chassis. This is shown below.
As you can see, there are two chassis that act like a single logical switch. This is called an MCT pair or cluster. The devices on either side of the MCT logical switch believe they are connected to a single switch. Standard LAG is used between these devices and the MCT logical switch. The advantage of doing this is that now both switches in the MCT cluster are functioning at layer-2 in an “active/active” manner. Both can forward traffic and if one chassis has a failure, standard failover mechanisms for a LAG bundle take effect. In addition, there are no layer-2 loops formed by an MCT pair so no STP is needed!
Now, for a short background refresher on what VPLS provides. (BTW: Brocade has been supporting VPLS for many years now.)
VPLS provides a layer-2 service over an MPLS infrastructure. The VPLS domain emulates a layer-2 switched network by providing point-to-multipoint connectivity across the MPLS domain, allowing traffic to flow between remotely connected sites as if the sites were connected by one or more layer-2 switches. The Provider Edge (PE) devices connecting the customer sites provide functions similar to a switch, such as learning the MAC addresses of locally connected customer devices, and flooding broadcast and unknown unicast frames to other PE devices in the VPLS VPN.
MCT with VPLS
Very frequently, a customer network needs to provide layer-2 connectivity between multiple data centers-- to enable VM mobility, for instance. The MCT w/VPLS feature I’m describing provides this type of connectivity in a redundant and high-available fashion. MCT provides the “active/active” layer-2 connectivity from the server farm or access layer to the core layer of the data center. The customer then leverages VPLS on the core layer data center routers to transport the layer-2 Ethernet frames between data centers. This is shown below.
In the diagram above, the CE switch uses a standard LAG to connect to the redundant MCT cluster. The same NetIron routers that form the MCT cluster are also configured to support VPLS to connect to the backbone network. So, the connection from the CE switch in one data center to a remote CE switch in another data center is a layer-2 service. VM mobility between the data centers is now provided in a redundant end-to-end fashion.
Fast failover in the VPLS network is provided by using redundant Pseudo-Wires (PWs), based on IETF Internet Draft <draft-ietf-pwe3-redundancy-03>. As shown below, each PE router signals its own PW to the remote PE. These local PE routers determine, based on configuration, which PE signals an active PW and which PE signals a standby PW. There is also a spoke PW signaled between the PE routers. In the case of the active PW failing, the primary PE router signals to the secondary PE router to bring up its standby PW. This failover is provided in a rapid manner.
The benefits of this solution are:
So, as you can see this is a really awesome capability for SPs who need to integrate their data center infrastructure with their MPLS/VPLS backbone network. We expect this solution to become a very common data center network architecture going forward for providing inter-data center layer-2 connectivity. I should also note that this solution works with Virtual Leased Line (VLL), in addition to VPLS. And, on top of that, it integrates with Ethernet Fabrics in the data center extremely well!
Stay tuned to this forum for more blogs like this.
Last year was another exciting year for 100 GbE as we saw several new technical developments and large deployments by service provider, data center, research and HPC network operators. Here's a quick recap of the highlights for 2012.Read more...
Last year was another exciting year for 100 GbE as we saw several new technical developments and large deployments by service provider, data center, research and HPC network operators. Here's a quick recap of the highlights for 2012. AMS-IX, our biggest 100 GbE customer and one of the biggest 100 GbE networks in the world, upgraded their 10 GbE core to a 100 GbE core with over 90 x 100 GbE ports in their backbone alone for a capacity of over 7.8 Tbps. The IEEE 802.3ba standard for 40 GbE and 100 GbE, now over 2½ years old, was added to the latest IEEE 802.3-2012 "Standard for Ethernet". 2nd generation 100 GbE projects in the IEEE P802.3bj and P802.3bm Task Forces are in progress that will lower cost and increase density. We’re now well underway to the next evolution of 100 GbE technology and even to the next speed of Ethernet, 400 GbE.
One trend that I’ve noticed among service providers is that 100 GbE peering at IXPs (Internet Exchange Points) is on the rise. We saw a lot of 100 GbE deployments primarily in core networks over the past couple of years, and now 100 GbE peering is taking off too. Several IXPs around the world, most of whom are Brocade customers, have announced the availability of 100 GbE peering ports or the intent offer them this year: AMS-IX (Amsterdam), DE-CIX (Frankfurt), JPIX (Tokyo), JPNAP (Tokyo), LINX (London), Netnod (Stockholm) and NIX.CZ (Prague). AMS-IX for example has deployed three 100 GbE customer ports already, and has six more on order that are expected to go live in the next several weeks. They will also have the first customer 2 x 100 GbE LAG, which will upgrade a 12 x 10 GbE LAG.
The motivation for 100 GbE peering is obvious: to reduce the number of 10 GbE LAGs that connect to an IXP for cheaper and simpler peering. 10 GbE LAG is a great solution but when you consider the port costs, cross connect costs, management and troubleshooting costs, etc. it does start to add up. Costs are different for every network operator as all networks are different, but in general 100 GbE starts to make sense when 10 GbE LAGs exceed six to seven links. Incidentally it also made sense to upgrade to a DS3 when a link exceeded six to seven inverse-multiplexed DS1s when I was a network engineer at MindSpring in the late 1990s, so there is some strange commonality in that number of links. AMS-IX’s 100 GbE port price for example is €9000/month, which is six times the 10 GbE port price of €1500/month.
There is another motivation for 100 GbE peering that is not so obvious too, and this demand comes from IXP resellers. IXP resellers are a relatively new development in the peering industry that enables service providers to peer remotely from anywhere in the world through a reseller port. Until recently, service providers were required to have a physical presence at an IXP in order to peer, because IXPs do not offer long haul transport services. Now IXP resellers, in partnership with an IXP, can resell peering ports remotely over their network to their customers. Remote peering capacity demand is what’s driving these 100 GbE ports. In order for a reseller to offer a high capacity service to their customer, say for example 20 Gbps or 40 Gbps, their own peering port to the IXP has to have the capacity available. Deploying a 100 GbE port to the IXP gives a reseller both the capacity and the flexibility to offer more capacity on demand, without having to constantly manage 10 GbE LAGs.
So, expect more announcements from IXPs about 100 GbE ports this year as 100 GbE peering goes mainstream in 2013.
Acknowledgements: I’d like to thank Henk Steenman, AMS-IX CTO, for his valuable insight and interesting data on 100 GbE peering.
There are some interesting developments in the Research & Education Networks (RENs) space. While there is continued interest and innovation in the REN space around OpenFlow and SDN, there are some related developments in terms of networkRead more...
There are some interesting developments in the Research & Education Networks (RENs) space. While there is continued interest and innovation in the REN space around OpenFlow and SDN, there are some related developments in terms of network architecture. After a brief overview of one such interesting development, I will then relate this development back to Brocade.
To start with, I’d like to describe an emerging network architecture called a “Science-DMZ”; which basically moves the high-performance computing (HPC) environment of a research & education campus network into its own DMZ. The reference architecture looks something like this:
As the diagram shows, the traditional “general purpose” campus network sits behind one or more security appliances, which are typically stateful firewalls. This DMZ, or perimeter network, protects the internal network, systems and hosts on the campus network from external security threats. The research and science HPC environment also traditionally connected into the same campus network; so it was also behind the DMZ firewall. This presented some challenges to the HPC environment in terms of data throughput (eg. TCP performance), dynamic “ad-hoc” network connectivity, and general network complexity.
The concept of a Science-DMZ emerged where the connectivity to the HPC environment is moved to its own DMZ; in other words, this environment is no longer connected behind the campus DMZ and firewalls. It now sits on a network that is purposely engineered to support the high performance HPC requirements. As the diagram shows, the science and research enviornment is now connected to a Science-DMZ switch, which in turn connects to a Science-DMZ border router. Access control lists (ACLs) in the border router are leveraged to maintain security of this HPC environment. In addition to simpler access control mechanisms, when a scientist or researcher needs to set up a logical connection to another scientist or researcher to share data, the HPC network can be directly provided that connectivity with provisioning in the border router. For network performance testing and measurement, the perfSONAR tool is included in the reference architecture.
The Science-DMZ concept emerged out of work from the engineers at Energy Sciences network (ESnet). Please take a look at their website for additional details on this architecture. As I have explained, the idea here is pretty simple: to allow the local HPC environment to have better connectivity to other research & education networks by putting it on its own DMZ. The external connectivity is often provided via the national Internet 2 backbone, or it could be provided via a regional REN backbone. To deliver this type of high performance connectivity, there are some hard requirements in terms of scale, performance and feature set of the Science-DMZ Border Router. This is where Brocade enters the conversation.
The hard requirements for this border router are:
• Must be capable of linerate 100GbE, including support for very large, long-lived flows
• Must support pervasive OpenFlow & SDN, for ease of provisioning and innovative applications
• Must support deep packet buffers to handle short data bursts
• Must support linerate ACLs to provide the security mechanisms needed, without impact to data throughput or performance
The Brocade MLXe high-performance router uniquely fits the bill of these requirements! As of software version 5.4, which started shipping in September of this year, the MLXe supports OpenFlow v1.0 in its GA release. The OpenFlow rules are pushed into hardware so the MLXe maintains its high forwarding performance; as it does with IPv4, IPv6 and MPLS forwarding. The largest MLXe chassis can scale to 32 ports of 100GbE or 768 ports of 10GbE, possesses deep packet buffers to handle bursty traffic and performs ACL functions in hardware.
In summary, the Science-DMZ architecture has emerged to solve some of the performance challenges for HPC environments and this reference architecture includes innovative features such as OpenFlow & SDN. The Brocade MLXe platform possesses the unique performance, functionality, and feature set that is required to perform the role of the Science-DMZ border router.
Stay tuned to this space for additional emerging developments in the research & education network arena.
I want to give you a quick overview and update on the industry’s progress toward 400 GbE as the next Ethernet speed. Though 100 GbE is only two years old, it’s important that we start working on the next speed now, so that we have the technRead more...
I want to give you a quick overview and update on the industry’s progress toward 400 GbE as the next Ethernet speed. Though 100 GbE is only two years old, it’s important that we start working on the next speed now, so that we have the technology shipping when there is demand from network operators to deploy higher speed Ethernet. The Call for Interest (CFI) to start the 400 GbE Study Group that will work on defining a new Ethernet standard was just announced yesterday, and is scheduled to be held on March 18, 2013 at the next IEEE Plenary meeting.
Here’s a little history on how we chose 400 GbE as the next Ethernet speed. First, the IEEE 802.3 Ethernet Bandwidth Assessment (BWA) Ad Hoc was formed in 2011 to evaluate future Ethernet wireline bandwidth needs. The BWA gathered input from the industry so that we would have accurate bandwidth growth data and requirements for the next Ethernet speed. The full report was released in July, and found continuing growth of bandwidth demands in core and transport layers beyond 100 GbE. If you are just interested in a summary and overview of the findings, then have a look at Scott Kipp’s NANOG56 presentation from last month. Next, the IEEE 802.3 Higher Speed Ethernet Consensus (HSE) Ad Hoc first met in July to develop consensus on the next speed of Ethernet based on the BWA data. The November IEEE Plenary meeting was just held a couple of weeks ago, where the HSE Ad Hoc made progress on the draft 400 GbE CFI presentation.
Why Not TbE?
I’d really love for us to build TbE! But, in order to make TbE economically feasible the cost per bit needs to be at or below the cost of 100 GbE. This means it would make sense for us to reuse current 100 Gbs technology, which implies a TbE architecture using 40 x 25 Gbps signaling lanes. Unfortunately, reusing 25 Gbps signaling means the resulting size of the pluggable media module, and the large amount of interface signals would simply be impractical to develop. Several good presentations were given at the IEEE HSE Consensus Ad Hoc meeting in September about why we should work on 400 GbE now and defer TbE for a few years. There are a couple of alternatives to 25 Gbps signaling, such as using advanced multilevel or phase modulation signaling, but these are still immature technologies that need more development before we can get the performance and low-cost manufacturing needed for volume production. Higher signaling rates will make TbE more feasible, but this technology isn’t expected to be available for the next several years.
As the 400 GbE CFI is now scheduled for March 2013, it means we will have the 400 GbE standard in mid-2015 at the earliest. It’s likely that the first generation of 400 GbE will use 16 x 25 Gbps signaling and that the first interfaces will be available in the 2016 timeframe. The questions that still need to be answered are the physical layer specifications for reaches and media, and this is what the Study Group will start working to define first. As we had for 100 GbE, the interfaces for 400 GbE will use a pluggable media module which gives network operators the most flexibility and choice. It’s likely that the 400 GbE media module will be called CDFP, which is short for “CD (400) Form-factor Pluggable”. As 400 GbE evolves with faster signaling technology, the second generation is expected to use 8 x 50 Gbps signaling which the Optical Internetworking Forum is already beginning to define. The third generation of 400 GbE is expected to use 4 x 100 Gbps signaling which has more advanced electrical and optical signaling technology that is being worked on in labs today. These key 100 Gbps signaling technologies will also be the building blocks for TbE and aren’t expected until after 2020. Stay tuned for more updates as we follow the road to 400 GbE!
There is much discussion these days in various industry forums and conferences about whether OpenFlow will replace MPLS. I guess this comes historically from the fact that ATM replaced Frame Relay (FR), and MPLS then replaced ATM. And although FR was primarily a WAN technology, ATM (who remembers LANE?) and MPLS (ala, VPLS) was/is also deployable in the LAN.
And although I’m specifically mentioning OpenFlow, I’m really thinking more broadly in terms of SDN. So, the question then becomes: Will SDN replace MPLS at some point in the future?
My personal perception is that early in the OpenFlow evolution there were some folks who thought OpenFlow will indeed replace MPLS at some point. MPLS was deemed “too complex”, amongst other arguments against MPLS. As I’ve witnessed the evolution of MPLS over the last decade, it has indeed become more complex. Becoming “too complex”; however, is an argument I don’t think I fully support. Here is my reasoning.
At one point in time, ATM solved many problems and had many advanced features. But as the technology matured, was more widely deployed and became more feature rich, it evolved into a very complex technology. I remember going to training course after training course to become fully proficient in ATM technologies. Then when I saw how ATM was being adapted into the LAN infrastructure with LANE (yuck!), I think that was my turning point. About that same timeframe, MPLS was being developed and talked about in various industry forums and conferences. It looked simple, it looked “kind of” like ATM in terms of virtual circuits (LSPs in MPLS speak), and it looked like it was starting to gain industry support. Fast forward a decade or so and MPLS is now very widely deployed but as it has matured in terms of features and functionality, it has become more complex. And, ATM is dead. So, some folks are now saying, “we need something else, something simpler than MPLS because it has become too complex”. Enter from stage left, OpenFlow! So here we are…
SDN solutions using OpenFlow, from a high level, can provide some of the basic machinery in terms of forwarding packets as MPLS does (or IP, for that matter). Distributed network routing and signaling protocols ultimately create state to populate the forwarding information base (FIB) of a router or switch, and a centralized SDN application using OpenFlow could also populate the FIB of a router or switch. This is why, I believe, that some folks think SDN with OpenFlow could indeed replace MPLS. And here is where I disagree somewhat. Sure, it appears possible and yes there is at least one widely publicized production WAN deployment that I know of (there could be more) where the network switching devices receive their FIB state from a centralized SDN application using OpenFlow. But one must realize that to fully replace all the required features and functionality of MPLS, OpenFlow and SDN will need to evolve to offer those same features and functions. Would adding those features and functions make OpenFlow and SDN as complex in the future as MPLS is today? If so, what will the industry have gained? So, would we be just moving the complexity problem somewhere else?
I believe the industry is beginning to re-appreciate all that MPLS provides and is re-realizing how widely deployed it is. I think the industry is starting to come to terms that MPLS is here to stay, and is starting to develop ways to leverage the benefits of SDN and OpenFlow into existing MPLS networks. In other words, integrate the technologies instead of having them compete against each other. Perhaps leveraging the OpenFlow classification abilities at the edge of a network using a centralized application, while maintaining the MPLS-based distributed signaling and forwarding state in the core of the network. Or adding an SDN + OpenFlow logical network “overlay” or “slice” to an existing production network for research purposes. Or perhaps even to opportunistically override the normal forwarding decisions for specific packet flows in the network in order to “steer” those flows to some sort of analytics device or value-add services appliance. Those are a few examples, but there are many.
In February, 2011 the ONF standardized OpenFlow v1.1, where MPLS label support was added. That was a great first step. The ONF is very active in the evolution of OpenFlow and SDN. While the ONF focuses more on OpenFlow and SDN, the IETF focuses on SDN and MPLS. I see more and more industry forums and conferences where MPLS and SDN are now being discussed side by side.
So, my personal belief is OpenFlow and SDN can clearly add value and additional services into existing networks, but there is no need today to rush to replace MPLS-based networks with OpenFlow and SDN. So, I guess my answer to my own question is I believe they are “better together” and are not mutually exclusive. In other words, a hybrid network approach seems to be the most feasible and promising option to me.
We announced the new 24-port 10 GbE module for the Brocade MLXe routers at our Analyst and Technology Day last moRead more...
We announced the new 24-port 10 GbE module for the Brocade MLXe routers at our Analyst and Technology Day last month, and I’m really excited about the 3X increase in 10 GbE density that is now available to our customers. This module brings the total 10 GbE density for the MLXe-32 up to an impressive 768 ports in a single chassis, and it also supports advanced MPLS and OpenFlow features for large-scale service provider supercore and data center core networks. It’s our first ASIC-based module for the MLXe routers, and we’re using our 4th generation silicon innovation called the MaxScale-160 to pack all the features and density onto the module. I had the opportunity to sit down with John Terry, one of our principal ASIC engineers, to learn some of the more interesting technical details about the ASIC and to get a tour of their hardware lab (alas, one of the few rooms on campus my badge won’t let me enter).
The MaxScale-160 ASIC was designed and tailored specifically for high-capacity core networks, with a total of 160 Gbps bidirectional capacity that integrates eight 10 GbE ports. A total of three ASICs are used on the module to support 24 10 GbE SFP+ ports. As always in next-generation hardware evolution, the module's density, performance, and cost were key factors in designing the ASIC. We developed the MaxScale-160 processor in order to increase port density without giving up features and to reduce the number of components on the board, which also lowered the overall cost per port. The ASIC integrates a total of 33 active components that include various packet processing and statistics FPGAs, ingress and egress ACL TCAM, packet lookup SRAM, and eight 10 GbE PHYs. You can really see the difference in the 8-port 10 GbE module (2010 era) and the 24-port 10 GbE module (2012 era). Each module also has a daughter board which has been removed for clarity in the picture below, the 8-port module is on the bottom and the 24-port modules is on top. The 24-port module has triple the number of ports as the 8-port module, but many fewer active components, which results in an overall higher MTBF and makes it easier to manufacture.
Altogether the 400 MHz MaxScale-160 ASIC uses 1.38 billion transistors, and is built on 45 nm process geometry. “Process geometry” means the smallest dimension that can be drawn into the silicon to define a transistor, which is the building block used to make logic gates. A total of 352 Mbits of embedded memory plus 29.6 Mbits of TCAM made the ASIC a challenge for our foundry vendor to manufacture, because it has some of the highest integrated TCAM content. The next version of the ASIC, called the MaxScale-400, will deliver 400 Gbps of bidirectional capacity, will have over 2 billion transistors and will use 32 nm process geometry.
The 24-port module is the most green and efficient module available for the MLX Series, because the MaxScale-160 has extremely low maximum power consumption of <45 W. We have been steadily lowering 10 GbE power consumption as we release higher density cards, and the 24-port SFP+ module uses an incredibly low 13.33 W/port. By comparison, the 8-port SFP+ module uses 30.75 W/port , the 4-port XFP module uses 56.25 W/port, and the first generation 2-port XFP module that we released in 2007 uses 75 W/port. This steady gain in power efficiency enables network operators to save on operational expenses in cooling and power, while also consolidating the number of devices in the network.
In case you missed it, I wrote about some of the router architecture challenges we’re solving in my blogs earlier this year. The MaxScale ASICs and our future generation silicon builds on these technology advances, and I can’t wait to see what our ASIC engineers will be able to do next to deliver even higher 10 GbE and 100 GbE density.
For more information on Brocade’s high density 10 GbE and 100 GbE solutions, please visit the Brocade MLX Series product page
Back in January of this year I wrote about a really cool NetIron feature called “MCT with VPLS”. That feature was released in the NetIron 5.3 software. Well, in the NetIron 5.4 software release (which is GA this week!) we have a new innovative twist on how you can use VPLS. It’s called “Routing over VPLS” and this new capability allows a VPLS endpoint to provide simultaneous layer-3 routing on NetIron products.
So, layer-3 routing (& forwarding) is now supported over a VPLS endpoint! Previously, only layer-2 forwarding was supported for VPLS. Recall that VPLS provides a multi-point layer-2 Ethernet service, but with this new feature one can now combine layer-2 and layer-3 services on the same interface that maps to a VPLS instance.
It looks like this –
In the simple diagram above there are two data centers, each with two Brocade NetIron MLXe Series chassis (for high-availability purposes). Each data center also has a Brocade VCS Ethernet fabric. The inter-DC connectivity is provided by the VPLS layer-2 Ethernet service, which runs between the four MLXe chassis. In essence, all four of the MLXe chassis’s are layer-2 adjacent to each other over the VPLS service. Each MLXe is also running the Virtual Router Redundancy Protocol (VRRP-E), which is a layer-3 gateway redundancy protocol. Each data center has its own Internet connection for external connectivity.
The way it works is this: If a server or end host is forwarding packets to another server or end host at layer-2, the packets are forwarded by the VCS Ethernet fabric if the two servers are in the same data center (eg. Intra-DC), or the packets are forwarded over the VPLS service if the two servers are in different data centers (eg. Inter-DC). In either case, the traffic is forwarded at layer-2. In the intra-DC use case, no MLXe should be in the forwarding path; the VCS Ethernet fabric handles all the forwarding. That is pretty straightforward from a layer-2 service perspective.
But what if a server or end host needs to send packets to external destinations; in other words, at layer-3? Previously, this layer-3 traffic had to be forwarded over a different layer-3 enabled interface on the MLXe or depending on the topology and design, by a different layer-3 gateway router. That was an inefficient, non-optimal way of providing both layer-2 and layer-3 services.
With NetIron software release 5.4, both layer-2 and layer-3 services can be simultaneously provided over the same VPLS interface on the MLXe. This maximizes the customers’ investment in terms of ports while also simplifying the topology and network design.
Back to the diagram: Now when a server or end host sends layer-3 packets to external destinations, the traffic entering the MLXe on its VPLS interface can be routed directly to the Internet (assuming the Internet connection is on the same MLXe, as is shown here). The diagram shows that Data Center 1 has an MLXe acting as the VRRP-E master router and it is forwarding the packets directly to the Internet. This is what this new feature provides; enabling the VPLS endpoint to provide both layer-2 and layer-3 services. Another way of thinking of this feature is “VE over VPLS”. VE refers to a Virtual Ethernet interface, which is a virtual routing interface in NetIron terminology.
But what about Data Center 2 in the diagram? Its showing an MLXe acting as a VRRP-E backup router and this router is also forwarding packets directly to its Internet connection. How does that work you might ask? Well, one thing I didn’t mention is that the “E” in the VRRP-E solution is an extended tweak that we’ve done for VRRP. This is not new in NetIron software; this has been in the code for many years. It’s when you combine this feature with the routing over VPLS feature that you gain additional benefits. Basically, an MLXe acting as a backup VRRP-E router can also forward packets to external destinations if it knows how to get there (eg. It has an active routing path there). The backup router does not need to send all its packets to the router acting as the VRRP-E master. If it did, you will notice that the traffic would have to be forwarded back to Data Center 1 where the VRRP-E master router resides, which clearly results in a highly sub-optimal traffic pattern (often referred to as a “trombone traffic” pattern). While not the same scenario as we are describing, here is one explanation of traffic trombone.
So there you have it! Routing over VPLS provides simultaneous layer-2 and layer-3 services on VPLS endpoints. And I should also mention that the configuration of this is really simple. Here is an example where VE 200 is enabled under the VPLS VEoVPLS instance 10:
vpls VEoVPLS 10
router-interface ve 200
tagged ethe 4/1
tagged ethe 2/1
This is an innovative feature and the benefit to the customer is pretty clear in terms of maximizing their investment while also simplifying their design. As usual, comments or questions are always welcome!
My first experience with Application Delivery Controllers (ADCs) was way back when the market for this kind of purpose-built hardware was just developing. We had early access to Foundry’s software image that you could load on a standard NetRead more...
My first experience with Application Delivery Controllers (ADCs) was way back when the market for this kind of purpose-built hardware was just developing. We had early access to Foundry’s software image that you could load on a standard NetIron router to turn it into a server load balancer, and some of the things it could do was pretty cool. We tested SMTP and NNTP load balancing, and it worked quite well for our needs at the time. The ServerIron then became an official product, and after a humble beginning the ADX Series is now an integral part of the Brocade product portfolio. I’m amazed at how sophisticated the feature set is on this product, and at how we continue to innovate and deliver exciting new features that enable our customers to offer new services. One example that I wrote about a while ago is the OpenScript Engine, which enables service providers to write custom Perl scripts that direct application traffic. There’s something else that we’ve developed that will change the way customers can use the ADX, and that’s what I want to tell you about today.
Virtualization has been available on mainframes and servers for a long time, but hasn’t been widely implemented on ADCs yet. At least not in the way server virtualization is implemented, with a hypervisior that virtualizes the system’s software and hardware resources. This is exactly what we’re bringing to the ADX Series, and it will enable service providers to deploy a truly isolated multitenancy solution that saves CapEx and OpEx. We redesigned the software architecture around a custom hypervisor that runs multiple, fully isolated ADX instances on a single physical system, each with its own dedicated software and hardware resources. The virtualization architecture is designed to maximize the consolidation benefits of deploying multitenancy solutions, while still maintaining the isolation and flexibility that using separate ADCs provides.
Provisioning and configuration is simple and flexible too. Since the software is virtualized for each tenant, features are configured independently and can even use overlapping VLANs or IP addresses. Full feature parity across tenants, based on the hypervisor architecture, provides the ability to mix and match tenants and to enable advanced features in any combination on the same module, with no extra hardware or licensing costs. This granular level of flexibility and tenant control gives service providers the most efficient allocation of hardware resources. If you are interested in more details on this feature, please grab the ADX multitenancy white paper.
Speaking of next week, we have our annual Analyst and Technology Day coming up on Wednesday the 12th. We will be sharing our strategic vision and latest innovations with the public, and will also have several technology demonstrations. The ADX demonstrations will show multitenancy and VXLAN gateway (which we just recently demonstrated at VMworld too). Oh, I should also mention our interactive “Ask the Expert” session for Software-Defined Networking (SDN) that will be running during the Tech Day activities. You can attend remotely, so please make sure to register to get all the latest details on the technologies we are developing.
For more information on Brocade’s application delivery solutions, please visit the Brocade ADX product page.
In an effort to find innovative service offerings, we have found that service providers are exploring many alternatives which include how they expose their network assets for profit. In addition, once they do create a new service offering tRead more...
In an effort to find innovative service offerings, we have found that service providers are exploring many alternatives which include how they expose their network assets for profit. In addition, once they do create a new service offering they are looking for ways to increase service velocity to keep ahead of the competition.
Service providers are taking a hard look at their current infrastructure and operations environment and finding that their current operations model and network architectures have some key limitations which are impacting their services from being deployed quickly.
- There is no standard way to change traffic flows to handle user mobility and flip the switch applications
- There are no software tools available that enable them to “dry run” new service options without impacting the production network
- The current back office model does not provide the flexibility needed to make dynamic network changes and create new service offerings.
- Any changes to the production network are difficult, slow, and risky
Although many strides have been taken in collapsing OSS (Operational Support Systems) systems in order to streamline service creation and service deployment, it still continues to be a complex and lengthy process. As you can see from the diagram below, there are multiple devices, multiple databases, and multiple protocols that have to be assimilated at the OSS layer to create an end-to-end service. In addition, with a Video on Demand Service like this example, there is also an unnecessary use of expensive router ports and switches that are involved in the service insertion which all impact the bottom line.
SDN solves both these problems by providing the network abstraction and the operational simplification that will not only accelerate service delivery but will do it more profitably. In some cases, the implementation of SDN with OpenFlow, and RESTful APIs will result in 50% savings under present methods.
This new back office model eliminates some of the complex processes with the use of new APIs like RESTful. These changes will result in service providers increasing the service velocity of these new services. In addition, they will also realize CapEx savings since OpenFlow can set up the paths for the flows, alleviating unnecessary routing and high end router connections.
If you want to hear more about this Cost Analysis Model and the savings result, be sure to join us on Tech Day to hear about our vision and innovation as well as joining the “Ask the Expert” session on SDN here in the Brocade Communities on September 12th and 13th
There are some interesting developments going on in service provider networks these days, to say the least. Besides all the OpenFlow and SDN hoopla, which is quite the hot topic right now, traditional SP backbone networks are undergRead more...
There are some interesting developments going on in service provider networks these days, to say the least. Besides all the OpenFlow and SDN hoopla, which is quite the hot topic right now, traditional SP backbone networks are undergoing their own natural evolution. A few of these trends could impact router architecture in terms of density, scalability and feature set.
Today, most SP networks rely heavily on MPLS technologies. One primary driver for using MPLS over the last decade or so is network convergence. Most SP networks are multi-service networks and MPLS provides this multi-service capability. In a MPLS-based, multi-service
SP network there are two primary roles for routers; the Provider-Edge (PE) or Label-Edge Router (LER) router at the edge of the network and the Provider (P) or Label-Switch Router (LSR) router in the core of the network.
All IP/MPLS routers, both the PE/LERs and the P/LSRs, need to run many different protocols, each with its own unique design and deployment challenge. There is an IGP (OSPF or ISIS), an EGB (by default, MP-BGP) and an MPLS signaling protocol (RSVP or LDP). An IGP is needed in any IP network for internal reachability. In an MPLS network, the IGP also provides information that is required by RSVP and LDP. BGP is required in IP transit networks for external reachability and also provides routing and signaling information in MPLS networks, particularly in MPLS VPN networks. That’s a lot of protocol state being required in all these IP/MPLS routers.
BTW: Does all this sound complicated yet? This is why highly skilled IP/MPLS network engineers and architects are in high demand!
From a forwarding plane performance perspective, it should surprise no-one that Internet traffic growth continues unabated. Over-The-Top (OTT) video and other services are fueling this growth. Core routers that scale to high density 10GbE & 100GbE interfaces, when combined with high performance switch fabric technologies, allow the forwarding capacity of SP networks to continue to scale to meet this demand.
SP WAN Super-Core Architecture
BGP provides external reachability in transit networks but do all IP/MPLS routers really require this external routing information? If all the transit routers forward packets based on an IP lookup, then the answer is clearly yes. But since the P/LSR routers in an MPLS network forward packets based on a label lookup, then why do they still need to run BGP and maintain all that external routing state?
The Internet IPv4 routing table is massive; as of the time of this writing it’s at somewhere around 430k BGP routes! The IPv6 routing table is relatively small in comparison at round 10k BGP routes, but it continues to grow as IPv6 is becoming more widely deployed. It’s difficult to predict how many IPv4 or IPv6 routes there will be in the Internet in 2 years, let alone in 5 or more years.
To address some of the challenges in density, performance and capacity, a simplified “super-core” network architecture has been developing over the recent years. The definition of “super-core” may vary slightly depending on who you ask, but in general it reflects an “inner” core network that is architected differently than the rest of the network. Brocade’s general definition of a super-core architecture is based on a few different factors; one being the high density and scalability requirements of the super-core router and the other being a reduced control plane and route table capacity requirement of the super-core router. A reduced route table capacity requirement is achieved by removing the requirement to run BGP on these super-core routers. Since LSRs in the core network only forward packets by examining the MPLS label information, they do not look into the IP packet header. So, these routers do not need to know about all the external Internet routing information that is distributed by BGP; hence, they no longer need to run BGP. This can be a beautiful thing! One definition of such a network is a “BGP-free core” network. In this type of network, BGP only runs at the edges of the network where external Internet routing information is required. Makes sense, right?
In terms of router architecture, there are always trade-offs that must be made when designing a high-end core router. Please see additional detailed information on router technology trade-offs in previous blogs; here, here, and here. As discussed in those blogs, a router vendor can push technology only so far. Design decisions must be made on which capabilities are more important to SP customers. Capabilities can vary rather dramatically depending on what role in the network the router is performing. At the edge of the network, the LER needs high edge port density and a scalable BGP implementation to carry all those external Internet routes. The LSR in the core however, has different requirements in terms of density, scalability and capacity. The core is the aggregation region of all those high speed interfaces! Today’s core routers need to be more focused on high density interfaces and high forwarding performance.
So, to accommodate this super-core architecture trend that we are seeing in the largest SP networks, higher density 10GbE and 100GbE interfaces are required. This pushes today’s technology to its limits in terms of silicon density and performance. The interesting angle to this is that since the routing table capacity of the super-core LSR is reduced by not running BGP on these routers, vendors have the choice to optimize products for higher density and performance without having to also optimize for forwarding table size.
SP WAN Public IP Offload Architecture
Another trend we are seeing in the more progressive SPs is what we are calling an “IP Offload” architecture. This is basically where “commodity” Internet transit traffic is forwarded over a simplified, higher capacity yet cost efficient network; which runs in parallel to the traditional multi-service MPLS backbone. What some providers are discovering is that while Internet transit traffic continues to grow exponentially, the revenues associated with this traffic are not keeping pace. In addition, enterprise VPN and other services are not growing as rapidly and this is where a large portion of the SPs revenues are derived. This presents a dilemma for the SP; to continue to invest in and upgrade the capacity of the multi-service MPLS network (which carries both Internet and enterprise VPN traffic) or to do something different.
One answer is to move the high growth Internet traffic to a simpler and more cost effective, higher capacity core network while maintaining the enterprise VPN and other services on the traditional MPLS network. Since the traffic that is being moved to this IP Offload network is Internet transit traffic (eg. Public facing), we can refer to this as a “Public IP Offload” strategy. The requirements for the higher capacity offload network are focused around density and performance. Again, high density 10GbE and 100GbE interfaces are required in this type of offload network, similar to the super-core network previously described. Seeing a trend yet?
SP WAN Private IP Offload Architecture
Somewhat similar to the public IP offload strategy, we are seeing an Inter-Data Center offload strategy emerge. I refer to this as a “Private IP Offload” strategy; as within a large Internet facing enterprise network, the inter-DC traffic is seeing unusually high growth rates. This is somewhat of a recent trend that is being driven by cloud networking, big data, and active-active data center architectures. DC network operators that have traditionally optimized their network for “north-south” traffic patterns are now seeing massive growth in “east-west” traffic patterns. As active-active data center architectures emerge, “east-west” often translates to “inter-DC”.
Somewhat of a similar dilemma exists here that was previously described. In this case, the high growth inter-DC traffic is often internal enterprise traffic that is not related to public, Internet facing customers where the enterprise receives a majority of its revenue. So, in this case the “commodity” inter-DC traffic utilizes the same multi-service MPLS infrastructure as the Internet facing customer traffic. A similar answer can be found by moving this high growth traffic onto a simplified, higher capacity and more cost effective infrastructure. An inter-DC backbone network can carry this traffic while the Internet facing traffic is maintained on the traditional MPLS backbone. And as one can imagine, the router requirements for this type of network are high density 10GbE and 100GbE interfaces. One attribute of this private IP offload architecture that differs from the public IP offload architecture is that since this traffic is internal to the enterprise, this backbone should not need to carry the same large routing tables as a real transit network would be required to carry.
Brocade MLXe IP/MPLS Router Platform
Brocade offers industry leading core router performance and density; combined with the “right” feature set for IP and MPLS applications. The MLXe product line is our flagship carrier-class IP/MPLS router. It performs equally well at many roles within the SP network; be it LER, LER aggregation, LSR and even DC applications. The software feature set is very comprehensive and scalable, which enables this platform to perform different roles within the SP network depending on how the MLXe is configured.
In addition to its advanced and flexible software feature set, the MLXe chassis supports exceptionally high density 10GbE and 100GbE interfaces. The forwarding performance of the largest MLXe chassis, the MLXe-32, exceeds 15Tbps! The advanced hardware architecture includes a fully distributed, non blocking CLoS switch fabric design. One particularly interesting capability of this platform is its ability to support 64x10GbE interfaces in a single Link-Aggregation (LAG) bundle. Do the math for the size of that LAG interface!
The MLXe product line meets the requirements of traditional multi-service MPLS backbones, as well as the emerging requirements of the SP super-core backbone and the public & private IP offload network that are described in this blog.
So, as I previously stated – these are some interesting times in the service provider market. Internet traffic growth continues unabated and network architecture solutions are emerging to meet this growth demand; while not exacerbating the revenue challenge that SPs are struggling with. There is a strong drive to simplify networks to reduce CAPEX and OPEX. As cloud computing solutions become widely deployed, traffic patterns internal and external to the data center are changing from being north-south focused to being east-west focused. As new network architecture solutions emerge to solve these challenges, Brocade will continue to innovate to lead the industry in terms of performance, density and capacity requirements for the largest SP networks. So, stay tuned to this space for some exciting developments in this area.
Throughout my past 5 blog posts, I have reported on enterprise demand for new infrastructure services. IPv6 translation, SAN extension, server load balancing-as-a-service, and hosted desktop services are all emerging IaaS services in which service providers can invest to generate new revenue streams. Now what happens when a service provider actually offers these new services? What about enterprise buying behaviors? What do enterprises expect? And more importantly, how should service providers sell to these businesses? Again based on research by Wavelength Market Analytics, today’s blog discusses these important subjects.
Well, to be quite frank, service providers should expect a great deal of scrutiny. Purchasing new infrastructure services is a tremendous, non-trivial purchase and large enterprises find nearly all purchasing aspects equally important. When asked about the importance of each of the seven decision-making criteria in the chart below, only 3% separates the least important (service level agreements at 79%) to the most important (larger discount for larger deals ranks at 82%). Obviously, there is little difference among all purchase criteria in the eyes of large enterprises. Medium-sized enterprises, on the other hand, are not as demanding. At 54.9%, range of service and support options and proactive alerts of system availability, production issues, scheduled downtime and pending updates are most important to medium-sized enterprises.
Investing in a broad product portfolio is another important service provider requirement. This is because nearly 75% of large and more than half of medium-sized enterprises list outsourcing their IaaS services to as few vendors as possible as a top objective. The more services an enterprise has to choose from, the more likely they are to buy from a particular service provider to minimize the number of vendors.
Finally, service providers need to market to top business management, as well as to IT management. Sixty-seven percent of large and over half of medium-sized enterprises list top business management being very involved in the IaaS services decision, so purchasing IaaS services is likely a strategic one for many enterprises. As such, service providers should expect a long sales cycle and marketing/sales need to reach many purchase influencers.
To close out this blog series, using a two-layer data center network for improved performance and open standards for flexibility are increasingly important for service providers. Why? It’s because these are increasingly important to a service provider’s customers. As the graph below shows, nearly three-quarters of large enterprises prefer providers that use a two-tier network and support open standards. Nearly 41% of medium-sized enterprises prefer IaaS and other cloud providers that use a two-layer data center network and 27% prefer providers that support open standards. Finally, messaging to enterprise customers should highlight a two-layer network and opens standards support. Since there is a high awareness of Brocade’s Virtual Cluster Switching (VCS™) as an Ethernet fabric-based data center solution, some service providers may find it beneficial to mention that their flatter data center network is based on Brocade solutions.
In previous blogs through this series, I have addressed large market opportunities about infrastructure services offering service providers significant new revenue sources. These IaaS services includeRead more...