When Milliseconds Matter - FCIP and Fibre Channel Performance Over Distance
on 08-22-201205:49 AM - last edited on 10-28-201305:49 PM by bcm1
The other day, I was chatting with a lead test architect in the Brocade Strategic Solution Lab (SSL)--which is where I work these days--about storage replication over distance. Brocade’s customers include the biggest, badest, most detail oriented storage architects and designers in the world and reliable, high performance storage replication over distance is a critical requirement for their SAN architecture.
Some of these customers come to the SSL with their design options and ask us to "prove it". They want measured data to back up performance claims so they stage performance bake-offs with vendor supplied configurations. In response, we develop a proof of concept (PoC) design along with a test plan and then conduct numerous tests with their lead architects and designers looking over our shoulders asking all the ugly questions that seasoned storage veterans are known for. (Our PoC win rate isn't as high as our Fibre Channel availability, six nines , but it's impressive, and one of the many reasons we continue to increase our market share over the competition.)
One topic that often comes up is the performance of native Fibre Channel ISL (E_Port) vs. Fibre Channel over IP (FCIP) with synchronous array-to-array replication at long distance. Milliseconds matter when transaction value is measured in hundreds of dollars a second, 24 hours a day, 365 days a year (yeap, that's billions of dollars per year).
It turns out one of our customers (a household name you would recognize, but I can’t share that with you) conducted their own testing to compare Fibre Channel ISL and FCIP latency using the Brocade DCX 8510 Backbone with the embedded Brocade FX8-24 Extension Blade. I wanted to share this with my friend since this is good stuff, Real world testing of application workloads over real WAN infrastructure is hard to replicate in a lab environment.
This customer’s test was conducted with HDS USPV running TrueCopy replication software and Huawei DWDM Optical Transport over an 80 Km distance with a well known database application. They wanted to understand in detail how Brocade’s Fastwrite and compression for FCIP compared with FCIP without those features and with Fibre Channel ISL links.
These are the four test cases they conducted.
Fibre Channel ISL at 0 Km (Baseline)
Fibre Channel ISL over DWDM at 80 Km
FCIP without Fastwrite and Compression over DWDM at 80 Km
FCIP with Fastwrite and Compression over DWDM at 80 Km
The diagram below shows TrueCopy transaction time for each test case. With synchronous array-to-array replication, the application's write has to complete at the remote array before an acknowledgement for the IO is sent back to the application allowing it proceed with the next transaction (IOs waiting for an acknowledgement are held in a queue to let new write IO to be sent, but it doesn't take long for the queue to fill at high transaction rates and then IO has to halt) . IO latency over distance is critical to for maximizing an application's transactions per second.
The data labeled "FCIP FW & Compression off 80 Km" (brown line) is for FCIP without Fastwrite or compression, “Extended Fabric 80 Km” (purple line) is for the Fibre Channel ISL link over DWDM, "FCIP FW & Compression 80 Km" (dark blue line) is for FCIP with Fastwrite and compression, and the “ISL over 0 distance” (light blue line) is for no synchronous replication to the remote storage array so IO latency doesn't include the round trip to the remote array.
TrueCopy Transaction Time for FCIP and Fibre Channel ISL over Distance
Notice the improvement with FCIP Fastwrite and compression turned on (dark blue) vs. turned off (brown); in some cases reducing response time close to 80%.
Compared to "Extended Fabric 80 Km" (purple), the "FCIP FW&Compression on 80 Km" (dark blue) provided better transaction response time and had the added advantage of reducing latency spikes for more consistent throughput.
These latency improvements are worth their weight in gold when you keep in mind how much a millisecond is worth to multi-billion dollar businesses.
The SSL recently published a Design Guide for dual active/active data centers with vSphere SRM which uses the DCX 8510 with FX8-24 Extension Blade. And, we just published the results of our own lab testing of FCIP and Fibre Channel ISL latency over distance. Here are the links.