Data Center

VCS Ethernet Fabric and FCoE Traffic

by on ‎12-07-2010 04:16 PM (1,631 Views)

Although a lot of the buzz around our recent announcement of the VDX 6720 Data Center Switch product line has been about how VCS Technology with Ethernet Fabric, Distributed Intelligence and Logical Chassis functionality simplifies networking for IP traffic, there is also an interesting story about how VCS Technology supports storage traffic.

You may have missed it, but the VCS Ethernet Fabric was designed to handle both IP and Fibre Channel over Ethernet (FCoE) traffic at the same time.  For those familiar with the storage world, FCoE is an interesting hybrid of proven Fibre Channel for scalable storage networks and ubiquitous Ethernet.  The folks who develop Fiber Channel standards, ANSI T-11, worked on and published the Fibre Channel over Ethernet standard which is defined in FC-BB-5 in the FC-BB-E (Fibre Channel Backbone, Ethernet) section.

The hybrid specified in FC-BB-5 allows all the proven Fibre Channel technology to go forward, but encapsulates Fibre Channel traffic on top of Ethernet.  This requires a “shim” layer in the network stack that slides on top of the Ethernet frame and underneath the Fibre Channel payload. Take a look at Figure 25 in the FC-BB-5standard to see what I am talking about. One of the key requirements in FC-BB-5 is that the Ethernet used not lose frames, aka “lossless Ethernet”.  And since ANSI T-11 isn’t responsible for Ethernet standards, it is now up to the IEEE 802 subcommittee that is responsible to define the needed standards for “lossless Ethernet”.

IEEE developed several extensions to provide lossless Ethernet.  This effort has had many names, but today is commonly referred to as Data Center Bridging, or DCB.  In part, DCB includes 802.1Qaz and 802.1Qbb which together make Ethernet “lossless”.  There is a nice summary of all the standards surrounding FCoE, so I don’t have to repeat all that (ain’t the web cool!!).

And, if you would like to read more about FCoE and how it works, take a look at the Brocade FCoE Handbook, a nicely written guide to all things FCoE.

Okay, if ANSI T-11 has defined standards for FCoE, and IEEE has defined the Ethernet extensions for lossless Ethernet as part of the DCB effort, what’s left to do, if anything?

There are some things that your network has to provide besides lossless links and encapsulation/de-capsulation of Fibre Channel payloads into an Ethernet frame that isn’t called out in the standards.  In particular low latency, rapid network convergence, reliability plus resiliency, equal cost multi-path (ECMP) support, and easy configuration. That’s where Brocade's VCS Ethernet Fabric comes in, it provides these important capabilities. As the storage industry learned over the past 10+ years of successful Fibre Channel storage networking, network availability, resilience and reliability are critical if you are going to let Ethernet be the transport for storage traffic.  Why is that?

Well, storage traffic is fussy.  The world of storage stretches from the IO subsystem in the operating system kernel to the magnetic domains on a disk drive.  A Storage IO has to complete very quickly compared to web page refreshes.  In many applications, the time to complete a transaction is critical to the business and it is highly influenced by the time to complete a storage IO.  So applications sending data to storage don’t handle “waiting around” well at all. Anything that holds up the storage IO can cause big problems. If a storage IO takes too long, operating systems start coughing up hair balls

or even crash.

Why is that? Well, the OS may boot off of a drive in the storage network or have some application DLL or driver sitting on shared storage. So, the storage network has to abstract the physical path taken by the storate IO from the computer's PCI bus so the OS thinks all storage is locally connected to the PCI bus.  A good shared storage network design keeps the PCI bus oblivious to the fact that a storage IO is traversing a network of multiple switches (referred to as a multi-hop path in the storage world), as it moves back and forth to and from storage. For this to work, and to scale, we need very low latency in the Ethernet fabric when it’s carrying FCoE storage traffic and we need to avoid congestion along the IO path.

The other thing about storage traffic is it can demand lots of bandwidth, often in bursts.  That can cause congestion when that big bump in traffic is forwarded.  If there’s congestion, IO has to wait and that can cause trouble as we already saw.

The VDX 6720 family provides very low latency switching while the VCS Ethernet Fabric supports very fast fabric convergence, avoids congestion and provides high resiliency. That keeps the operating system (all the applications) happy and no need to clean hair balls off your brand new shirt.

A key enhancement to Ethernet when carrying FCoE traffic is to eliminate the STP protocol which is used to prevent loops in a classic Ethernet network.  STP limits inter-switch link bandwidth and requires far too long for convergence when topology changes occurs.  So, STP is not well suited for storage traffic.

VCS Technology uses TRILL frames to provide low latency frame forwarding and, as TRILL does, uses link state routing at Layer-2 instead of STP to prevent loops. Today, VCS Technology uses Fibre Shortest Path First (FSPF) for the layer 2 link state routing protocol.  That’s exactly the link state routing protocol used by all native Fibre Channel switches, and as you would expect, it works very well for FCoE traffic using Ethernet to transport Fibre Channel payloads across an Ethernet fabric.

Finally, ECMP is important so traffic within the VCS Ethernet Fabric can use all available links while ensuring the paths used are always the lowest cost paths available.  Why is using the lowest cost path important?  Well, it’s all about keeping latency as low as possible and avoiding congestion. VCS Technology automatically identifies the lowest cost path across the entire fabric, not just between any two switches. Think about that. If all VCS Technology did was ensure the “next hop” calculation used a link with the lowest path cost between two switches, you could have traffic forwarded to an upstream switch from which all traffic has a very high cost to the next upstream switch.  In that case, frames using that path have higher latency across the fabric, don't they.

To avoid that problem, VCS Technology uses ECMP to select the lowest cost path between all switches in the entire fabric, not just on the next hop to a neighbor switch. That ensures optimum performance for all traffic (FCoE and IP) and helps avoid congestion.  And, VCS Technology does it automatically.  It doesn’t get any simpler than that.

Now in a VCS Ethernet Fabric with more than one lowest cost path, traffic is load balanced so the flows can use all the lowest cost paths available.  Load balancing uses a hashing algorithm that considers up to 7 fields in the frame to ensure even load balancing across multiple, least cost paths to ensure optimum throughput.

Finally, at the hardware level, when multiple links exist between two switches, a VCS Ethernet Fabric can automatically create a logical trunk and send frames at hardware speed over all of the links within the trunk.  This provides a lot of advantages including link resiliency, and very efficient bandwidth utilization, both critical for storage traffic.  In a trunk, link resiliency means any link can be removed for any reason, and the trunk automatically forwards traffic on the remaining links without interruption.  Frames in flight on the lost link maybe lost (for example if the link failed unexpectedly for some reason), but no other traffic is affected.

With the VDX 6720 switches, trunks are very efficient since frames can be striped across all links at line rate using hardware.  This means, there are no “hot spots” and these trunks can achieve very high utilization levels, something FCoE traffic needs when those big bursts occur.

I hope this helps you understand why a VCS Ethernet Fabric provides a great solution for FCoE traffic letting you extend that traffic past the top of rack switch.  As I think you can see, this same Ethernet fabric can efficiently handle IP and FCoE traffic at the same time, so that means the VDX 6720 family of switches offer a single building block you can use for converged IP and storage (FCoE) traffic, just IP traffic, or just storage traffic, it’s your choice.

I plan to blog on ISL connections, trunking and ECMP soon so you can appreciate how those work in an Ethernet fabric.  Let me know of other topics you would like to learn more about.

Comments
by
on ‎05-01-2012 08:19 AM

Brook,

here is already 18 months gone since you posted this article. And Network OS has version 2.1 already. Do we have any changes? Does Brocade use IS-IS instead of FSPF in newest version of firmware? How about debate "TRILL vs SPB"? Where does Brocade stand now?

Thank you, Victor

by lcaywood
on ‎06-21-2012 11:30 AM

Victor, the current firmware continues to use FSPF. We have no plans to support SPB at this time. If you're interested in the details of the similarities and differences between the two protocols, I'd highly recommend Jon Hudson's blog on the subject: TRILL Q&A: Decisions, Decisions, Decisions | Ethernet Fabric