Design & Build

Data Center Infrastructure-Validation Test: FCIP & Fibre Channel Extended Distance Latency

by ‎08-22-2012 09:33 AM - edited ‎08-06-2014 08:20 AM (5,404 Views)

Synopsis: Provides latency comparison tests of Brocade Fibre Channel switches for various media and protocols when replicating storage over distances up to 100 Km.

 

Contents

Preface

 Overview

Fibre Channel SAN storage continues to be extensively deployed, scalable with up to 16 Gbps link rate available from Brocade making it a common choice for for array-to-array storage replication over distance. High performance storage replication is critical for disaster recovery and active/active data center designs.

The Brocade Data Center Infrastructure Base Reference Architecture includes Fibre Channel SAN building blocks. One of those blocks, the Core Integrated SAN Services Block shown below, includes Fibre Channel extension for array-to-array replication. Fabric-A uses the Brocade FX8-24 Extension Blade in a Backbone or Director switch while Fabric-B uses a separate Brocade 7800 Extension Switch.

 

DataCenter_BlockSAN_CoreIntegratedSvcs.JPG

   Core Integrated SAN Services Block

 

Brocade also provides a Design Guide for active/active data centers that includes the same building block. See the Related Documents section below.

 

These validation tests include several media choices (dark fiber, IP over a WAN, and DWDM) and Fibre Channel protocol choices (native Fibre Channel ISL (E_Port) and Fibre Channel over IP (FCIP)) that commonly used for storage replication. The test plan includes the following test cases.

·        Test Case #1: Measured Latency vs. Distance for FCIP and Native Fibre Channel

·        Test Case #2: Link Failure Test

 

Purpose of This Document

Provide latency comparison tests for various media and protocols that can be used to replicate storage between data centers over distances up to 100 Km.

 

Audience

The document will be of interest to storage engineers who are responsible for design and operation of Fibre Channel storage and distance replication solutions.

 

Objectives

1.    Characterize the performance of common choices of media and protocol for array-to-array replication over distance.

2.    Demonstrate how Brocade FCIP configuration options, Fastwrite and compression, affect latency over distance compared to a Fibre Channel ISL.

3.    Validate the time for recovery from a link failure with Fibre Channel ISL and FCIP over long distance.

4.    ISL failure will cause frame loss in most cases, it is best practice to enable the lossless feature. The test compare the amount of frame loss if lossless is enable or disabled with trunking on or off.

5.    Compare Disaster Recovery applications ability to recover after ISL failure for Sync and Async applications, the test is performed on TruCopy, HUR.

 

Test Conclusions

1.   Brocade FCIP circuits provided lossless recovery from a link failure over WAN IP links and took approximately 0.5 seconds to resume IO after a link was lost.

2.   Fibre Channel ISL trunks configured for lossless recovery took approximately 18 seconds to resume IO after a link was lost. Since there is no re-transmission at the Fibre Channel layer, recovery depends on the array replication software to perform IO recovery which is considerably slower than the FCIP recovery mechanism.

3.   Both FCIP circuits and Fibre Channel ISL trunks do not halt IO when a new link is added.

4.   FCIP latency was consistently less than Fibre Channel ISL latency for all distances of 25 Km or more. As distance varied, the FCIP Tunnel reduced latency between 18-10% compared to a Fibre Channel ISL Trunk.

 

Design and Deployment Recommendations

1.   For high availability, it is best practice to use multiple links over diverse routes (dark fiber) or multiple WAN service providers. With FCIP Tunnels, in order delivery is guaranteed when circuits use different WAN service providers.

2.   For tape backup between a local host and a remote tape pool, FCIP provides tape pipelining that reduces latency providing full line rate transmission reducing time to complete backup jobs compared to native Fibre Channel ISL connections.

3.    The cost of Leasing WAN IP links is usually cheaper than dark fiber or DWDM solutions.

4.    Brocade FCIP Fastwrite provides lowers latency compared to FCIP without Fastwrite or native Fibre Channel IP links.

5.    Brocade FCIP compression increases the data rate as it can compress common data streams by a factor of 2 to 3 fold.

 

Related Documents

References

 

About Brocade

Brocade® (NASDAQ: BRCD) networking solutions help the world’s leading organizations transition smoothly to a world where applications and information reside anywhere. This vision is designed to deliver key business benefits such as unmatched simplicity, non-stop networking, application optimization, and investment protection.

Innovative Ethernet and storage networking solutions for data center, campus, and service provider networks help reduce complexity and cost while enabling virtualization and cloud computing to increase business agility.

To help ensure a complete solution, Brocade partners with world-class IT companies and provides comprehensive education, support, and professional services offerings. (www.brocade.com)

 

Key Contributors

The content in this guide was provided by the following key contributors.

Test Architect: Haim Gabay, Brocade Strategic Solutions Lab:

 

Document History

Date                  Version        Description

2012-08-22         1.0b               Initial Version

2013-08-12         1.1                Test case #3 added. Minor edits.

 

Test Plan

Scope

The goal of the testing is to compare measured latency and IOPS at various block sizes for a variety of media using an FCIP tunnel and a Fibre Channel ISL trunk for a range of distances, 0, 25, 50, 75 and 100 Km. The time required for IO to resume was measured after a link failure and when a new link was added. The test configuration is shown below.

 

Test Configuration

 

DataCenter-ValidationTest_FCIP-FP_TestConfiguration.JPG

…Test Configuration

 

DUT Descriptions

The following table lists the devices under test (DUT).

 

Identifier

Vendor

Model

Notes

Brocade DCX 8510

Brocade

DCX 8510-8

With FC16-48 16 Gbps Fibre Channel port cards

Brocade FX8-24 Extension Blade

Brocade

FX8-24 Distance Extension Card for DCS Backbone

 

 

Brocade DCX 8510

Brocade DCX 8510 Backbones are the industry’s most powerful Fibre Channel switching infrastructure, providing the most reliable, scalable, high-performance foundation for private cloud storage and highly virtualized environments. They are designed to increase business agility while providing non-stop access to information and reducing infrastructure and administrative costs.

 

Brocade FX8-24 Extension Blade

The Brocade FX8-24 Extension Blade for the Brocade DCX Backbone family provides enterprise-class Fibre Channel, FICON, and FCIP performance and availability for remote replication, backup, and migration. Twelve 8 Gbps Fibre Channel ports, ten 1 Gigabit Ethernet (GbE) ports, and up to two optional 10 GbE ports provide maximum bandwidth, port density, and throughput, extending and optimizing fabric connectivity over distance for business continuity and disaster recovery.

 

DUT Specifications

The following table provides specifications of the DUT.

 

Identifier

Release

Configuration Options

Notes

Brocade DCX 8510

FOS 7.0.1b

FC16-48 port card

 

Brocade FX8-24 Extension Blade

FOS 7.0.1b

1 GbE and 10 GbE Ethernet ports with

Fastwrite and Compression

 

 

Test Equipment

·        A Packetstorm Communications, Inc., WAN simulator, model 4XG.

·        Fiber spools with distance taps for dark fiber tests.

·        Host running KGen IO generation tool with adjustable block size and read/write patterns.

 

References

 

Test Cases

 

Test Case #1: Measured Latency vs. Distance for FCIP and Native Fibre Channel

The host used a test tool, “KGen”, to generate IO to local storage which is replicated using TruCopy to remote storage  over the link. KGen was configured for a 25% random IO at 100% writes. This was chosen to reflect typical application workloads and to account for the fact that with synchronous replication reads are returned from the local array while writes introduce latency since the write has to complete at the remote array before the application receives confirmation.

 

Protocols and link rates were 1GbE and 10GbE Ethernet for FCIP traffic from the FX8-24 card and 4 Gbps Fibre Channel over an ISL. Distances tested were 0, 25, 50, 75 and 100 Km using various combinations of media including direct cable, DWDM with ADVA Optical Transport. A WAN simulator was used to injecte delay to simulate distance over a WAN IP link.  The following table summarizes the tested combinations.

 

Protocol

Link Rate

KM

Media

FCIP

1 GbE

10 GbE

0

Direct Cable

WAN Simulator

FCIP

1 GbE

10 GbE

25

Dark Fiber

WAN Simulator

FCIP

1 GbE

10 GbE

50

ADVA DWDM

WAN Simulator

FCIP

1 GbE

10 GbE

75

WAN Simulator

FCIP

1 GbE

10 GbE

100

WAN Simulator

Fibre Channel

4 Gbps

0

ISL over Direct Cable

Fibre Channel

4 Gbps

25

ISL over Dark Fiber

Fibre Channel

4 Gbps

50

ISL over Dark Fiber

ADVA DWDM

Fibre Channel

4 Gbps

75

ISL over Dark Fiber

Fibre Channel

4 Gbps

100

NA

 

The ADVA optical transport supported a maximum distance of 50 Km and was tested only at its maximum distance. Brocade’s long distance optic for ISL over Dark Fiber has a maximum of 75 Km distance, so it was not tested at 100 Km. Tests included IO with different block sizes (2 KB, 64 KB and 256 KB) and outstanding IO for each block size was 1 except for the 256KB block size, which also included a test with 50 outstanding IO.

 

DUT

See the Test Configuration diagram above.

 

Purpose

This test measures the write latency from the host with synchronous replication between the arrays at different block sizes for various distances and media (dark fiber, DWDM and WAN IP) using native Fibre Channel ISL trunks and FCIP circuits on the DCX 8510 configured with the FC16-48 port card and FX8-24 extension blade.

 

Test Procedure

 

Step 1: Configure DCX 8510

Refer to existing documentation to configure the DCX 8510 with the FC16-48 port card.

Step 2: Configure FX8-24

Refer to existing documentation to configure the FX8-24 blade when inserted into the DCX 8510.

Step 3: Configure ADVA FSP 3000 DWDM Optical Transport

Refer to ADVA documentation for configuring the ADVA FSP 3000.

Step 4: Configure D-Port on DCX 8510 port

Refer to the FOS 7.0.1 Administrator Guide for how to configure a D-Port on the DCX 8510.

Step 5: Start IO From The Host with KGen Tool

The KGen IO generator on the host is configured for IO at varying block sizes.

Step 6: Measure Link Latency

Execute the following command to determine the distance link between the storage arrays. The example shown is for the 25 Km test.

----------

DCX110:root> portdporttest --show 3/42

D-Port Information:

===================

Slot:           3

Port:           42

Remote WWNN:    10:00:00:05:1e:43:be:00

Remote port:    298

Mode:           Manual

Start time:     Wed Aug  1 08:22:37 2012

End time:       Wed Aug  1 08:31:08 2012

Status:         PASSED    

================================================================================

Test                    Start time      Result          EST(HH:MMSmiley FrustratedS)   Comments

================================================================================

Electrical loopback     08:22:38        PASSED          --------        ----------

Optical loopback        08:22:52        PASSED          --------        ----------

Link traffic test       08:26:57        PASSED          --------        ----------

================================================================================

Roundtrip link latency:         247409 nano-seconds  <-- Link Latency

Estimated cable distance:       25128 meters (25.128 Km)

----------

Step 7: Change Link Distance and Measure Latency

Adjust the distance for each media using the fiber cable spool or WAN simulator for 0, 25, 50, 75 and 100 KM. Measure the latency of the link in milliseconds (ms) and the IOPS over the link at different block sizes as shown in the table in the Actual Result section.

 

Expected Result

Link latency will increase with distance and IOPS will decrease. The goal is to determine how much change in latency and IOPS occur for different combinations of link rate, media and protocol.

 

Actual Result

The table below summarizes the IOPS for different block sizes at different distances.

 

0 KM

IOPS

Latency (Milliseconds)

Block Size /

# Outstanding IO

2 KB

1

64 KB

1

256 KB

1

256 KB

50

2 KB

1

64 KB

1

256 KB

1

256 KB

50

HDS TruCopy Disaster
Recovery “OFF”

6000

1600

530

1560

0.1667

0.6250

1.8868

0.6410

Fibre Channel ISL,
short cable*

1800

715

267

1450

0.5556

1.3986

3.7453

0.6897

FCIP 1 GbE, short
cable*

1120

490

207

1490

0.8929

2.0408

4.8309

0.6711

FCIP 1 GbE, short
cable & WAN
simulator 0 ms delay*

912

430

190

1490

1.0965

2.3256

5.2632

0.6711

FCIP 1GbE WAN
simulator correction

-

-

-

-

-0.2036

-0.2114

-0.2623

-0

FCIP 10 GbE short
cable*

1205

505

207

1490

0.8299

1.9802

4.8309

0.6711

FCIP 10 GbE short
cable & WAN
simulator 0 ms delay
(O Km)*

1120

473

200

1490

0.8929

2.1142

5.0000

0.6711

Fibre Channel ISL,
Short cable**

2400

850

330

1500

0.416

1.176

3.03

0.666

 

*Sync (TruCopy)

**Async (HUR)

 

25 KM

IOPS

Latency (Milliseconds)

Block Size /

# Outstanding IO

2 KB

1

64 KB

1

256 KB

1

256 KB

50

2 KB

1

64 KB

1

256 KB

1

256 KB

50

Fibre Channel ISL
dark fiber*

980

482

207

1450

1.0204

2.0747

4.8309

0.6897

FCIP 1 GbE dark fiber

930

430

190

1490

1.0753

2.3256

5.2632

0.6711

FCIP 1 GbE WAN
simulator 0.25 ms
delay (25 Km) RAW*

750

380

172

1490

1.3333

2.6316

5.8140

0.6711

FCIP 1 GbE WAN
simulator ADJUST*

-

-

-

-

1.1297

2.4202

5.5508

0.6711

FCIP 10 GbE WAN
simulator 0.25 ms
delay (25 Km)*

880

425

185

1500

1.1364

  2.3529

5.4054

0.6667

Fibre Channel ISL,
Short cable**

2400

850

330

1500

0.416

1.176

3.03

0.666

 

*Sync (TruCopy)

**Async (HUR)

 

50 KM

IOPS

Latency (Milliseconds)

Block Size /

# Outstanding IO

2 KB

1

64 KB

1

256 KB

1

256 KB

50

2 KB

1

64 KB

1

256 KB

1

256 KB

50

Fibre Channel ISL
dark fiber

611

370

172

1430

1.6367

2.7027

5.8140

0.6993

Fibre Channel ISL
ADVA DWDM

611

370

172

1490

1.6367

2.7027

5.8140

0.6711

FCIP 10 GbE ADVA
DWDM

710

385

172

1490

1.4085

2.5974

5.8140

0.6711

FCIP 1GbE WAN
simulator 0.5 ms
delay (50 Km) RAW

640

340

155

1490

1.5625

2.9412

6.4516

0.6711

FCIP 1 GbE WAN

simulator ADJUST

-

-

-

-

1.3589

2.7298

6.1893

0.6711

FCIP 10 GbE WAN
simulator 0.5 ms
delay (50 Km)

680

365

165

1500

1.4706

2.7397

6.0606

0.6667

75 KM

IOPS

Latency (Milliseconds)

Block Size /

# Outstanding IO

2 KB

1

64 KB

1

256 KB

1

256 KB

50

2 KB

1

64 KB

1

256 KB

1

256 KB

50

Fibre Channel ISL
dark fiber

480

300

147

1430

2.0833

3.3333

6.8027

0.6993

FCIP 1 GbE WAN
simulator 0.75 ms
delay (75 Km) RAW

535

310

147

1490

1.8692

3.2258

6.8027

0.6711

FCIP 1 GbE WAN
simulator ADJUST

-

-

-

-

1.6656

3.0144

6.5404

0.6711

FCIP 10 GbE WAN
simulator 0.75 ms
delay (75 Km)

575

330

152

1500

1.7391

3.0303

6.5789

0.6667

 

100 KM

IOPS

Latency (Milliseconds)

Block Size /

# Outstanding IO

2 KB

1

64 KB

1

256 KB

1

256 KB

50

2 KB

1

64 KB

1

256 KB

1

256 KB

50

FCIP 1 GbE WAN
simulator 1.0 ms delay
(100 Km) RAW

473

280

135

1490

2.1142

3.5714

7.4074

0.6711

FCIP 1 GbE WAN
simulator ADJUST

-

-

-

-

1.9106

3.3600

7.1451

0.6711

FCIP 10 GbE WAN
simulator 1.o ms delay
(100 Km)

516

300

146

1500

1.9380

3.3333

6.8493

0.6667

 

Test Conclusions

1.   The 0 Km tests provide baselines of performance. For example, the test “HDS USPV Disaster Recovery OFF” shows the array performance for writes to local disk.

2.   For the FCIP 1 GbE and 10 GbE tests, different WAN simulators were used. The 1 GbE simulator introduced significant latency at all distances with 1 pending IO. A correction is applied to the 25, 50, 75 and 100 Km raw data to remove this bias as shown by the entries labeled “ADJUST”.

3.   The FCIP protocol on the FX8-24 blade adds approximately 0.2 – 0.3 ms of latency.

4.   The path latency (medium, devices and speed of light) is approximated when using small blocks as shown for the 2 KB block size data.

5.   Brocade FCIP compression increases the data flow reducing time to complete IO. Brocade also provides Fibre Channel compression for ISL links with its 16 Gbps products, but this option was note tested.

6.    Using Async Disaster Recovery, the application is less influenced by the network latency and has more consistent write latency. Also, ISL failures do not impact the application at all.

 

Test Case #2: Link Failure Test

 

DUT

See the Test Configuration diagram above.

 

Purpose

To demonstrate to the delay to traffic flow for both FCIP tunnels and Fibre Channel ISL trunks when a link is removed or added.

 

Test Procedure

The test includes 1 GbE and 10 GbE links on the FX8-24 card for FCIP tunnel traffic and 4 Gbps for the Fibre Channel ISL trunk traffic. For this test, there are two test scenarios, one for the FCIP tunnel and the other for the Fibre Channel ISL trunk. For the native Fibre Channel test, two ISLs are trunked between the switches and lossless dynamic load sharing is configured (See “Lossless Dynamic Load Sharing on ports” in the FOS 7.0.1 Administrator Guide for details). For the FCIP test, an FCIP tunnel is configured for either two 1 GbE or two 10 GbE circuits. After IO starts from the host, one of the circuits is disconnected.

 

Step 1: Synchronize HDS USPV Volumes

Ensure that the source and destination volumes on the HDS USPV array are already synchronized before conducting this test.

Step 2: Configure FCIP Tunnel

Configure an FCIP tunnel on between the FX8-24 cards with two 1 GbE links.

Step 3: Start Host IO To Storage

On the host configure the Kgen IO simulation tool with 25% random data, 100% writes and start IO.

Step 4: Simulate FCIP Link Failure

1.   Execute the following command before the test to determine

 

--------

DCX110:root> portshow fcipcircuit all

-------------------------------------------------------------------------------

Tunnel Circuit  OpStatus  Flags    Uptime  TxMBps  RxMBps ConnCnt CommRt  Met

-------------------------------------------------------------------------------

9/14   0 9/ge5   Up      ---4--s    2m33s   53.37    0.31    5  1000/1000  0

9/14   1 9/ge6   Up      ---4--s    2m34s   53.49    0.32    5  1000/1000  0

--------

2.   Unplug one of the cables from one of the FX8-24 cards to simulate a link failure in the FCIP tunnel.

3.   After the cable is pulled, execute the following command. Notice that the ge5 link in the 9/14 FCIP Tunnel shows an OpStatus of “InProg” indicating the link is down.

--------

DCX110:root> portshow fcipcircuit all

-------------------------------------------------------------------------------

Tunnel Circuit  OpStatus  Flags    Uptime  TxMBps  RxMBps ConnCnt CommRt  Met

-------------------------------------------------------------------------------

9/14   0 9/ge5   InProg  ---4--s    2m48s    0.00    0.00    5  1000/1000  0

9/14   1 9/ge6   Up      ---4--s    9m41s  112.58    0.66    5  1000/1000  0

-------------------------------------------------------------------------------

Flags: circuit: s=sack v=VLAN Tagged x=crossport 4=IPv4 6=IPv6

                 L=Listener I=Initiator

--------

Step 5: Verify IO Performance After Link Failure

Using the KGen Statistics Tab, you can see the impact of the link failure in the FCIP Tunnel.  At about 31 seconds into the test, the throughput graph at the bottom shows IO halted for about 0.5 seconds and then resumed at 326 MB/s, a 12% drop in throughput. This is due to traffic using only one link instead of being balanced across two links.

DataCenter-ValidationTest_FCIP-FP_KGenFCIPCircuitFail.JPG

   KGen Statistics for Loosing Circuit in FCIP Tunnel

 
Step 6: Reconnect the Link

Plug the link back into the FX8-24 port and verify that IO returns to normal across the FCIP Tunnel with no disruption to IO.  Use the KGen Statistics tab to see the IO.

DataCenter-ValidationTest_FCIP-FP_KGenFCIPCircuitAdd.JPG

   KGen Statistics For Adding Circuit to FCIP Tunnel

 

Step 7: Simulate Fibre Channel ISL Link Failure

Unplug one of the cables from the FC16-48 port card to simulate a link failure in the Fibre Channel ISL trunk.

Step 8: Verify IO Performance After Link Failure

Use the KGen Statistics tab to monitor the ISL Trunk behavior and IO performance after the ISL link is removed. Note at 55 seconds into the test the ISL link was removed. IO stopped on the ISL Trunk for 18 seconds before resuming again at the same IO rate as before the link was removed.

 

DataCenter-ValidationTest_FCIP-FP_KGenFCISLFail.JPG

   KGen Statistics for Removing Link from ISL Trunk

 

Step 9: Reconnect ISL Link

Reconnect the ISL link. IO continues without disruption on the ISL Trunk.

 

Expected Result

Recovery from a link failure should be faster for an FCIP Tunnel than for a Fibre Channel trunk configured for lossless recovery.

 

Actual Result

1.   FCIP Test. The IO on the FCIP Tunnel halted for 0.5 seconds and then resumed at a 12% reduction in the IO rate. After the link was reconnected, the IO increased back to the original rate without interruption.

2.   FC Test. The IO on the ISL Trunk halted for approximately 18 seconds when one of the ISL links in the trunk was removed. Then, IO resumed at the same rate as before the link was removed. After  the link was reconnected for the ISL Trunk, no IO disruption occurred.

 

Test Conclusions

1.   Fibre Channel ISL trunks configured for lossless recovery took approximately 18 seconds to resume IO after a link was lost. Recovery depends on the array replication software to detect the failure and recover which is considerably slower than the FCIP failure detection and recovery mechanism.

2.   Both FCIP tunnels and Fibre Channel ISL trunks do not halt IO when a new link is added.

 

Test Case #3: Frame Loss Test on ISL Failure

DUT

DataCenter-ValidationTest_FCIP-FP_UseCase3TestConfiguration.jpg

   Test #3 Configuration

 

Two 8Gbps Spirent SAN tester ports are connected to each of the Brocade 6510 switches. Each of the two Brocade 6510 switches is connected using 4x16Gbps ISLs over 25 Km of dark fiber using one trunk of 4 ISLs or two trunks of 2 ISLs each.

 

Purpose

 

To compare the amount of frame loss on ISL failure if lossless feature is on or off.

Test Procedure

Four test cases are performed for Trunking and lossless on/off combinations

Step 1: initiate Spirent IO

Initiate full line rate bidirectional streams on the Spirent SAN Tester

Step 2: Disconnect one ISL cable
Using portperfshow  command observe the most active ISL within the 4 ISLs, remove the cable from the switch port
Step 3: Collect the statistics from the Spirent tester

 

Results

Trunks

Lossless

Trunking

frame loss amount

1

On

On

19

1

On

Off

1,748

1

Off

On

203

1

Off

Off

15,872

2

On

On

51

2

Off

On

1,953

 

Test Conclusions

Both trunking and lossless on provide the lest amount of frame loss.

It is best practice to enable lossless and enable trunking if the DWDM equipment will allow this.