Storage Networks

Scott Shimomura

Latency Matters for Storage

by Scott Shimomura on ‎09-02-2013 09:07 PM (2,539 Views)

Fibre Channel remains the de facto standard for high performance disk-based arrays as well as emerging flash-based arrays. It offers the best combination of performance, scalability, and reliability of the mainstream protocols. Now that Cisco MDS has finally joined the Gen 5 Fibre Channel party, customers have a choice.

 

Often overlooked in comparing Brocade and Cisco is the basic Fibre Channel switching architecture: Brocade uses cut-through switching and Cisco uses store-and-forward switching. The switching architecture has a profound impact on latency, which in turn impacts overall performance.

 

Cut-through switching technology is the lowest latency method for forwarding frames. It’s ideal for storage due to the latency sensitivity of SCSI and the impact on IOPS performance. Fibre Channel frames are forwarded to the destination before the entire frame has been received. Corrupted frames are identified and marked in the switch and discarded at the destination device. Discarding corrupted frames at the destination device minimizes the time to recover bad frames. As soon as the destination device receives the EOF marker as "invalid", recovery of the corrupted frame begins immediately.

 

Store-and-forward switching technology is arguably the highest latency method for forwarding frames. Fibre Channel frames are forwarded to the destination after the entire frame has been received and checked for errors. Corrupted frames are identified and discarded within the switch. However, discarding the corrupted frame at the switch forces a SCSI timeout and a SCSI retry for recovery that can result in delays of tens of seconds.

 

Note, both methods of error handling are 100% reliable and are documented in the Fibre Channel standards. The main difference is the latency penalty associated with waiting for entire frames and the additional latency to recover corrupted frames. (All of the gory details regarding Fibre Channel standards around error handling can be found in T11’s Fibre Channel Framing and Signaling doc (FC-FS-4, Rev 0.20)The relevant content is 11.3.8.3 Invalid frame content.)

 

Here is a simplistic analogy to illustrate the two technologies. Imagine you order a flat screen TV online and it ships from the manufacturer to a shipping hub.

 

In the cut-through world, as the TV is entering the shipping hub on a truck, the ship-to address is read and the TV is immediately put on another truck for delivery. As it is being transferred between trucks, the package is inspected and verified as it leaves.

 

In the store-and-forward world, the TV enters the shipping hub on a truck and is unloaded and moved to a receiving room. The ship-to address is read, the package is inspected and verified before it is put on another truck for delivery.

 

Now imagine there is damage to the TV packaging.

 

In the cut-through world, as the TV is leaving the shipping hub for delivery, the damage is identified and someone slaps a damage label on the packaging. When it is delivered, you see the label and refuse to accept the TV because the TV might be damaged as well. You go back online and request an immediate replacement TV.

 

In the store-and-forward world, the damage to the packaging is found during the inspection in the receiving room and the TV is promptly discarded into the trash. The TV does not get delivered and neither you nor the manufacturer are notified that there was a problem. Eventually after waiting for the delivery, you go back online and request a replacement TV.

 

In both examples you get your TV, it just takes longer with store-and-forward.

 

Cut-through switching is the dominant switching architecture in Fibre Channel due to the low-latency performance required for storage traffic, making it the best architecture for flash-based storage. Brocade has implemented cut-through switching in its Fibre Channel ASICs since the 1990s and it has been implemented in millions of directors and switches in production SANs worldwide. In an ironic twist, while Cisco’s MDS Fibre Channel product line uses store- and-forward switching, Cisco’s Nexus 5500 and UCS FCoE product lines uses cut-through switching when switching between FCoE ports.  I guess we know which switching architecture Cisco thinks performs the best.

 

I encourage you to learn more about SAN infrastructure latency from Tim Lustig at QLogic.

Comments
by hpanda on ‎12-05-2013 08:47 AM

Thats really a great explanation on switching technologies and latency. Cisco gives a message to the world that their switching technology is the best in term sof error handling and Brocade's is the worst. Here teh clarification is there and shows us why Brocade is the best in FC technology.

Announcements

Caption This! Contest

Avoid downtime with the Brocade 7840 and Fabric Vision over distance

Win a $50 Amazon Gift Card! Click here to submit your caption. Happy Captioning!

Labels