Design & Build

Data Center Solution, Storage-Validation Test: Brocade Gen 5 Fibre Channel and Violin Memory 6232 All Flash Array

by ‎08-07-2014 11:17 AM - edited ‎08-20-2014 10:02 AM (3,489 Views)
 

 

Preface

 

Overview

The Brocade Solid State Ready (SSR) qualification program verifies interoperability and optimum performance of solid state storage attached to Brocade SAN fabrics.

 

The SSR program uses comprehensive testing and best practice configuration to demonstrate Fibre Channel SAN and IP storage interoperability with flash storage arrays. SSR tests multiple fabrics, with multiple vendor servers, NICs, and HBAs in large port-count Brocade test topology.

 

Purpose of This Document

This document provides the validation of Brocade fabric technology with the Violin 6232 all flash storage array, using multiple switch platforms, HBAs, and server operating systems. This validation shows that the Violin 6232 interoperates properly within a Brocade Fibre Channel fabric, while supporting the performance and low latency associated with solid state storage.

 

Audience

The content in this document is written for a technical audience, including solution architects, system engineers, and technical development representatives.

 

Objectives

  1. Test the Violin 6232 array with Brocade Gen 5 FC fabrics in single and Fibre Channel routed configurations with different stress and error recovery test cases, to validate the interoperability and best practice configuration for the array with Brocade FC fabrics.
  2. Validate the performance of the FC fabric with the solid state storage array to ensure high throughput and low latency.

 

Test Conclusions

  1. Achieved 100% pass rate on all the test cases in the SSR qualification test plan. The network and the storage were able handle the various stress and error recovery scenarios without any issues.
  2. Different I/O workload scenarios were simulated using Medusa, Vdbench and VMware IOAnalyzer tools and sustained performance levels were achieved across all workload types. The results confirm that the Violin 6232 array interoperates with Brocade Fibre Channel fabrics, demonstrating high availability, performance, and low latency.
  3. For optimal availability and performance, consideration should be given to multipath configuration on the host side. While Windows 2008 and 2012 will provide Round-Robin behavior by default, Linux systems will benefit from adding a custom entry to /etc/multipath.conf, and VMWare hosts systems should be changed from the default   ‘Most Recently Used (VMWare)’ setting to ‘Round-Robin (VMWare)’. Actively using all available paths provides a significant improvement in performance throughput.
  4. Brocade Bottleneck Detection is a recommended tool to proactively monitor fabric performance and ensure high performance low-latency storage IO.

 

Related Documents

 

References

 

Document History

Date                  Version        Description

8-8-2014             1.0               Initial Version

 

Key Contributors

The content in this guide was provided by the following key contributors.

  • Test Architects: Mike Astry, Patrick Stander
  • Test Engineers: Randy Lodes, Subhish Pillai

 

About Brocade

Brocade networking solutions help the world’s leading organizations transition smoothly to a world where applications and information reside anywhere. This vision is realized through the Brocade One™ strategy, which is designed to deliver key business benefits such as unmatched simplicity, non-stop networking, application optimization, and investment protection.

 

Innovative Ethernet and storage networking solutions for data center, campus, and service provider networks help reduce complexity and cost while enabling virtualization and cloud computing to increase business agility.

 

To help ensure a complete solution, Brocade partners with world-class IT companies and provides comprehensive education, support, and professional services offerings.

 

To learn more, visit (www.brocade.com)

 

About Violin Memory

Business in a Flash. Violin Memory transforms the speed of business with high performance, always available, low cost management of critical business information and applications.

 

Violin’s All-Flash optimized solutions accelerate breakthrough CAPEX and OPEX savings for building the next generation data center. Violin’s Flash Fabric Architecture (FFA) speeds data delivery with chip-to-chassis performance optimization that achieves lower consistent latency and cost per transaction for Cloud, Enterprise and Virtualized mission-critical applications. Violin’s All-Flash Arrays and Appliances, and enterprise data management software solutions enhance agility and mobility while revolutionizing datacenter economics.

 

Founded in 2005, Violin Memory is headquartered in Santa Clara, California.

 

Test Plan

The storage array is connected to two SAN fabrics and multiple server hosts to drive IO in a multipath configuration. Error injection is introduced, and failover & recovery behaviors are observed. IO performance is observed across different workload configurations.  

 

Scope

Testing will be performed with a mix of genrally available (GA) and development versions of Brocade’s Fabric OS (FOS) in a heterogeneous environment. The test topology includes Brocade directors and switches configured with and without Fibre Channel routing (FCR).

 

Test cases include interoperability and optimal configuration tests. Performance measured with best practice configuration; however measures of absolute maximum storage performance is not in the scope of this publication.

 

Details for each test case is covered in the “Test Cases” section. Standard devices under test (DUT) includes IBM/HP/Dell chassis servers with commonly used Brocade, QLogic and Emulex HBAs. Hosts use two uplinks to differenct Brocade switches in the Fibre Channel fabric. IO generators include Medusa Labs Test Tools, vdbench, Iometer, and VMWare IOAnalyzer.

 

Test Configuration

 

 TestConfiguration.jpg

Test Configuration

 

DUT Descriptions

The following tables provide details about the devices under test (DUT).

 

Storage Array

DUT ID

Model

Vendor

Description

Violin Memory 6232

6232

Violin Memory Systems

The Violin Memory 6232 flash storage array is an all-flash array that supports up to 64 MLC VIMMS (Violin Intelligent Memory Modules). The unit under test is populated with 32 VIMMS . Each controller supports 4x 8Gb Fibre Channel connections.

 

Switch

DUT ID

Model

Vendor

Description

6510-1,2,3

BR-6510

Brocade

48 port 16Gb FC switch

5100-1,2,3

BR-5100

Brocade

40 port 8Gb FC switch

DCX-3

DCX

Brocade

8 slot 8Gb FC chassis 

DCX-4

DCX-4S

Brocade

4 slot 8Gb FC chassis

DCX-2

DCX 8510-8

Brocade

8 slot 16Gb FC chassis 

DCX-1

DCX 8510-4

Brocade

4 slot 16Gb FC chassis

VDX-1,2

VDX 6730

Brocade

60x10GbE ports and 16x8Gb FC port switch 

 

DUT Specifications

 

Storage

Version

Violin Memory 6232 array

V6.3.0.2

 

Brocade switches

Version

DCX-4S

FOS 7.3.0 development

DCX

FOS 7.3.0 development

DCX 8510-8

FOS 7.3.0 development

DCX 8510-4

FOS 7.3.0 development

6510 + Integrated Routing, Fabric Vision Licenses

FOS 7.3.0 development

5100 + Integrated Routing, Fabric Vision Licenses

FOS 7.3.0 development

VDX 6730

NOS 4.1.2

 

Adapters

Version

Brocade 1860 2-port 16Gb FC HBA

driver & firmware version 3.2.4.0

QLogic QLE2672 2-port 16GB FC HBA

driver 8.06.00.10.06.0-k, firmware 6.06.03

Emulex LPE 12002 2-port 8Gb Fc HBA

driver 10.0.100.1, firmware 1.00A9

Brocade 1020 2-port CNA adapter

driver & firmware version 3.2.4.0

 

DUT ID

Servers

RAM

Processor

OS

SRV-1

HP Proliant DL380P G8

160GB

Intel Xeon E5-2640

VMWare 5.5 [cluster]

SRV-2

HP Proliant DL380P G8

160GB

Intel Xeon E5-2640

VMWare 5.5 [cluster]

SRV-3

IBM System x3630 M4

24GB

Intel Xeon E5-2420

VMWare 5.1u2

SRV-4

Dell Poweredge R720

64GB

Intel Xeon E5-2640

Windows Server 2012

SRV-5

Dell Poweredge R720

160GB

Intel Xeon E5-2640

RHEL 6.4 x86_64

SRV-6

HP Proliant DL385p G8

16GB

AMD Opteron 6212

Windows Server 2008R2

SRV-7

Dell Poweredge R720

16GB

Intel Xeon E5-2620

SLES 11.3 x86_64

SRV-8

Dell Poweredge R720

16GB

Intel Xeon E5-2620

RHEL 6.5 x86_64

 

Test Equipment

Version

Finisar 16Gb Analyzer/Jammer

XGIG5K2001153

Medusa Labs Test Tools

6.0

Vdbench

5.0401

Iometer

1.1.0-rc1

VMWare IOAnalyzer

1.6.0

 

Configure Equipment

This section describes how the DUT and test equipment is configured. The following steps are explained in detail.

 

  1. Create zones for each host initiator group
  2. Present LUNs for each initiator group – 8 x 5GB LUNs presented to two initiators from host
  3. Configure multipathing on each host
  4. Apply any additional host tuning
  5. Setup workload generators
  6. Configure Fibre Channel Routing
  7. Enable Bottleneck Detection on switches
  8. Configure Fill Word values

 

Step 1. Create zones for each host initiator group

 

ZoningExample

CLI example using zonecreate to create a new zone, cfgadd to add the new zone to the existing zoning configuration (‘SSR ‘ in this example), and cfgenable to save and enable the zoning configuration.

 

<==========>

> zonecreate hb067168_violin, "21:00:00:24:ff:51:7d:e8; 21:00:00:24:ff:51:7d:ae;

                21:00:00:24:ff:51:7d:e9; 21:00:00:24:ff:51:7d:af;

                21:00:00:0e:1e:10:51:d0; 21:00:00:0e:1e:10:51:d1"

>cfgadd SSR, hb067168_violin

>cfgenable SSR

<==========>

 

Confirm Zoning

Use the zoneShow command to display the zones connecting the Violin array to the host.

 

<==========>

root> zoneshow hb067168_violin

 zone:  hb067168_violin

                21:00:00:24:ff:51:7d:e8; 21:00:00:24:ff:51:7d:ae;

                21:00:00:24:ff:51:7d:e9; 21:00:00:24:ff:51:7d:af;

                21:00:00:0e:1e:10:51:d0; 21:00:00:0e:1e:10:51:d1

<==========>

 

Step 2. Present LUNs

For each initiator group, 8 x 5GB LUNs are presented to the two initiators of each host from the Violin configuration tool.

 

ViolinLUNPresentationToHosts.jpg 

   Violin LUN Presentation To Hosts

 

Step 3. Configure Multipath IO on Linux Hosts

This configuration allows all paths to be used in a round-robin fashion. This provides superior performance to the default Linux settings which would only use a single active path per LUN.

Below is the recommended /etc/multipath.conf entry on Linux systems.

 

<==========>

devices {

    device {

        vendor                  "VIOLIN"

        path_selector           "round-robin 0"

        path_grouping_policy    multibus

        rr_min_io               1

        path_checker            tur

        fast_io_fail_tmo        10

        dev_loss_tmo            30

    }

}

<==========>

 

Confirm Multipath Configuration

The following shows a typical multipath configuration on a Linux host.

 

<==========>

# multipath -ll

mpathbp (36001b970b716b179b716b1795996a912) dm-13 VIOLIN,SAN ARRAY

size=5.0G features='0' hwhandler='0' wp=rw

`-+- policy='round-robin 0' prio=1 status=active

  |- 13:0:5:3  sdbp 68:48  active ready running

  |- 13:0:6:3  sdbx 68:176 active ready running

  |- 14:0:5:3  sdcj 69:112 active ready running

  |- 13:0:4:3  sdcg 69:64  active ready running

  |- 13:0:7:3  sdcw 70:64  active ready running

  |- 14:0:6:3  sdcz 70:112 active ready running

  |- 14:0:7:3  sddl 71:48  active ready running

  `- 14:0:4:3  sddt 71:176 active ready running

mpathbo (36001b970b716b179b716b1794df3d665) dm-15 VIOLIN,SAN ARRAY

size=5.0G features='0' hwhandler='0' wp=rw

`-+- policy='round-robin 0' prio=1 status=active

  |- 13:0:5:8  sdbu 68:128 active ready running

  |- 13:0:6:8  sdcc 69:0   active ready running

  |- 14:0:5:8  sdcs 70:0   active ready running

  |- 13:0:4:8  sdcp 69:208 active ready running

  |- 14:0:6:8  sddi 71:0   active ready running

  |- 13:0:7:8  sddh 70:240 active ready running

  |- 14:0:7:8  sddq 71:128 active ready running

  `- 14:0:4:8  sddy 128:0  active ready running

mpathbn (36001b970b716b179b716b179453c26a3) dm-12 VIOLIN,SAN ARRAY

size=5.0G features='0' hwhandler='0' wp=rw

`-+- policy='round-robin 0' prio=1 status=active

  |- 13:0:5:1  sdbn 68:16  active ready running

  |- 13:0:6:1  sdbv 68:144 active ready running

  |- 13:0:4:1  sdcd 69:16  active ready running

  |- 14:0:5:1  sdcf 69:48  active ready running

  |- 14:0:6:1  sdcv 70:48  active ready running

  |- 13:0:7:1  sdct 70:16  active ready running

  |- 14:0:7:1  sddj 71:16  active ready running

  `- 14:0:4:1  sddr 71:144 active ready running

mpathbm (36001b970b716b179b716b17975c58392) dm-11 VIOLIN,SAN ARRAY

size=5.0G features='0' hwhandler='0' wp=rw

`-+- policy='round-robin 0' prio=1 status=active

  |- 13:0:5:4  sdbq 68:64  active ready running

  |- 13:0:6:4  sdby 68:192 active ready running

  |- 14:0:5:4  sdcl 69:144 active ready running

  |- 13:0:4:4  sdci 69:96  active ready running

  |- 14:0:6:4  sddb 70:144 active ready running

  |- 13:0:7:4  sdcy 70:96  active ready running

  |- 14:0:7:4  sddm 71:64  active ready running

  `- 14:0:4:4  sddu 71:192 active ready running

mpathbl (36001b970b716b179b716b1798ebafd67) dm-18 VIOLIN,SAN ARRAY

size=5.0G features='0' hwhandler='0' wp=rw

`-+- policy='round-robin 0' prio=1 status=active

  |- 13:0:5:7  sdbt 68:112 active ready running

  |- 13:0:6:7  sdcb 68:240 active ready running

  |- 13:0:4:7  sdcn 69:176 active ready running

  |- 14:0:5:7  sdcr 69:240 active ready running

  |- 14:0:6:7  sddg 70:224 active ready running

  |- 13:0:7:7  sdde 70:192 active ready running

  |- 14:0:7:7  sddp 71:112 active ready running

  `- 14:0:4:7  sddx 71:240 active ready running

mpathbs (36001b970b716b179b716b17909052d48) dm-16 VIOLIN,SAN ARRAY

size=5.0G features='0' hwhandler='0' wp=rw

`-+- policy='round-robin 0' prio=1 status=active

  |- 13:0:5:5  sdbr 68:80  active ready running

  |- 13:0:6:5  sdbz 68:208 active ready running

  |- 13:0:4:5  sdck 69:128 active ready running

  |- 14:0:5:5  sdco 69:192 active ready running

  |- 14:0:6:5  sddd 70:176 active ready running

  |- 13:0:7:5  sdda 70:128 active ready running

  |- 14:0:7:5  sddn 71:80  active ready running

  `- 14:0:4:5  sddv 71:208 active ready running

mpathbr (36001b970b716b179b716b179c367a45c) dm-17 VIOLIN,SAN ARRAY

size=5.0G features='0' hwhandler='0' wp=rw

`-+- policy='round-robin 0' prio=1 status=active

  |- 13:0:5:6  sdbs 68:96  active ready running

  |- 13:0:6:6  sdca 68:224 active ready running

  |- 14:0:5:6  sdcq 69:224 active ready running

  |- 13:0:4:6  sdcm 69:160 active ready running

  |- 13:0:7:6  sddc 70:160 active ready running

  |- 14:0:6:6  sddf 70:208 active ready running

  |- 14:0:7:6  sddo 71:96  active ready running

  `- 14:0:4:6  sddw 71:224 active ready running

mpathbq (36001b970b716b179b716b179072e9a2a) dm-14 VIOLIN,SAN ARRAY

size=5.0G features='0' hwhandler='0' wp=rw

`-+- policy='round-robin 0' prio=1 status=active

  |- 13:0:5:2  sdbo 68:32  active ready running

  |- 13:0:6:2  sdbw 68:160 active ready running

  |- 13:0:4:2  sdce 69:32  active ready running

  |- 14:0:5:2  sdch 69:80  active ready running

  |- 14:0:6:2  sdcx 70:80  active ready running

  |- 13:0:7:2  sdcu 70:32  active ready running

  |- 14:0:7:2  sddk 71:32  active ready running

  `- 14:0:4:2  sdds 71:160 active ready running

<==========>

 

b. Recommended Multipath Configuration on VMWare Systems:

This configuration allows all paths to be used in a round-robin fashion. This provides superior performance to the default VMWare ‘Most Recently Used’ settings which would only use a single active path per LUN.

 

VMwareMultipathConfigurationTool.jpg 

   VMware Multipath Configuration Tool

 

Step 4. Apply any additional host tuning

The first selects the 'noop' I/O scheduler, which has been shown to get better performance with lower CPU overhead than the default schedulers (usually 'deadline' or 'cfq').  The second change eliminates the collection of entropy for the kernel random number generator, which has high cpu overhead when enabled for devices supporting high IOPS.

 

The following shows the rules applied at boot in /etc/udev/rules.d/99-violin-storage.rules file.

 

<==========>

# Use noop scheduler for high-performance solid-state storage

ACTION=="add|change", KERNEL=="sd*[!0-9]", SUBSYSTEM=="block", ENV{ID_VENDOR}=="VIOLIN", ATTR{queue/scheduler}="noop"

# Reduce CPU overhead due to entropy collection

ACTION=="add|change", KERNEL=="sd*[!0-9]", SUBSYSTEM=="block", ENV{ID_VENDOR}=="VIOLIN", ATTR{queue/add_random}="0"

# Spread CPU load by redirecting completions to originating CPU

ACTION=="add|change", KERNEL=="sd*[!0-9]", SUBSYSTEM=="block", ENV{ID_VENDOR}=="VIOLIN", ATTR{queue/rq_affinity}="2"

<==========>

 

Step 5.Setup Workload Generators

We installed several different workload generators, in order to get a variety of IO coverage. On Windows and Linux system, Medusa Labs Test Tools, vdbench, and Iometer are installed. On VMWare systems, VMWare’s IOAnalyzer is installed.

 

Step 6. Configure Brocade FC Fabric

Some of the configuration settings on the Brocade switches in the FC fabric are covered here.

 

a. Configure Brocade Fibre Channel Routing

 

<==========>

> fcrconfigure --bbfid 100

> fosconfig --enable fcr

> portcfgexport [p#] -a1 -m0 –f<edge_fid>

Example routed zone prepended with ‘lsan’

> zoneshow lsan_hb067168_violin

 zone:  lsan_hb067168_violin

                21:00:00:24:ff:51:7d:e8; 21:00:00:24:ff:51:7d:ae;

                21:00:00:24:ff:51:7d:e9; 21:00:00:24:ff:51:7d:af;

                21:00:00:0e:1e:10:51:d0; 21:00:00:0e:1e:10:51:d1

<==========>

 

b. Example output of exported devices

 

<==========>

> fcrproxydevshow

  Proxy           WWN             Proxy      Device   Physical    State

 Created                           PID       Exists     PID

in Fabric                                   in Fabric

----------------------------------------------------------------------------

    10   21:00:00:24:ff:48:b9:6a  02f001       20      551a00   Imported

    10   21:00:00:24:ff:48:b9:6b  02f101       20      541e00   Imported

    10   52:4a:93:7d:f3:5f:61:00  02f201       20      550e00   Imported

    10   52:4a:93:7d:f3:5f:61:01  02f401       20      540400   Imported

<==========>

 

Step 7. Enable Bottleneck Detection on Switches

This will enable reporting of latency and congestion alerts on each switch.

 

<==========>

> bottleneckmon --enable -alert

> bottleneckmon --config -alert -time 150 -qtime 150 -cthresh 0.7 -lthresh 0.2

root> bottleneckmon --status

Bottleneck detection - Enabled

==============================

 

Switch-wide sub-second latency bottleneck criterion:

====================================================

Time threshold                 - 0.800

Severity threshold             - 50.000

 

Switch-wide alerting parameters:

================================

Alerts                                   - Yes

Latency threshold for alert      - 0.200

Congestion threshold for alert - 0.700

Averaging time for alert           - 150 seconds

Quiet time for alert                  - 150 seconds

<==========>

 

Step 8. Set Fibre Channel Fill Word Value

On Condor2 8Gb switch platforms it is recommended to set portcfgfillword to ‘3’. Prior to the introduction of 8 Gb, IDLEs were used for link initialization, as well as fill words after link initialization. To help reduce electrical noise in copper-based equipment, the use of ARB (FF) instead of IDLEs was standardized. Because this aspect of the standard was published after some vendors had already begun development of 8 Gb interfaces, not all equipment can support ARB (FF). IDLEs are still used with 1, 2, and 4 Gb interfaces. To accommodate the new specifications and different vendor implementations, Brocade developed a user-selectable method to set the fill words to either IDLEs or ARB (FF).

 

<==========>

root> portcfgfillword 0 3 0

root> portcfgfillword 2 3 0

 

root> portcfgshow

Ports of Slot 0         0     1    2    3    4     5    6    7    8    9     10   11  12  13  14    15

------------------------------+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----

Speed                     AN  AN  AN  AN  AN AN AN  AN  AN  AN  AN AN AN AN AN  AN

Fill Word(On Active)   3    0     3     0     0   0    0     0     0    0     0    0   0     0   0    2

Fill Word(Current)       3    0     3    0      0   0   0      0    0     0     0   0   0     0    0    2

<==========>

 

Step 9. Configure zones for FCoE initiators on VDX switches

VDX ZoningExample

 

a. create zoning configuration

 

<==========>

VDX6730_066_075# config t

VDX6730_066_075(config)# zoning defined-configuration cfg NOS_SSR

VDX6730_066_075(config-cfg-NOS_SSR)#

<==========>

 

b. create a new zone

<==========>

VDX6730_066_075(config-cfg-NOS_SSR)# member-zone lsan_hb067166_violin

<==========>

 

c. Add WWN's to new zone

<==========>

VDX6730_066_075(config-cfg-NOS_SSR)# zoning defined-configuration zone lsan_hb067166_violin

VDX6730_066_075(config-zone-lsan_hb067166_pure)# member-entry 10:00:00:05:33:48:77:a8

VDX6730_066_075(config-zone-lsan_hb067166_pure)# member-entry 10:00:00:05:33:48:77:a9

VDX6730_066_075(config-zone-lsan_hb067166_pure)# member-entry 21:00:00:24:ff:51:7d:9d

VDX6730_066_075(config-zone-lsan_hb067166_pure)# member-entry 21:00:00:24:ff:51:7e:87

VDX6730_066_075(config-zone-lsan_hb067166_pure)# member-entry 21:00:00:24:ff:51:7d:9c

VDX6730_066_075(config-zone-lsan_hb067166_pure)# member-entry 21:00:00:24:ff:51:7e:86

<==========>

 

d. Save the configuration

<==========>

VDX6730_066_075(config-zone-lsan_hb067166_violin)# zoning enabled-configuration cfg-action cfg-save

<==========>

 

e. Enable the configuration

<==========>

VDX6730_066_075(config)# zoning enabled-configuration cfg-name NOS_SSR

VDX6730_066_075(config)# exit

<==========>

 

Confirm FCoE Zones Configuration

Use the show zoning command to confirm the FCoE zoning configuration.

<==========>

# show zoning enabled-configuration

zoning enabled-configuration cfg-name NOS_SSR

zoning enabled-configuration enabled-zone lsan_hb067166_violin

 member-entry 10:00:00:05:33:48:77:a8

 member-entry 10:00:00:05:33:48:77:a9

 member-entry 21:00:00:24:ff:51:7d:9d

 member-entry 21:00:00:24:ff:51:7e:87

 member-entry 21:00:00:24:ff:51:7d:9c

 member-entry 21:00:00:24:ff:51:7e:86

<==========>

 

Test Cases

The following table summarizes the test cases. Following it, each test case is described in detail including the results.

 

1.1

FABRIC INITIALIZATION – BASE FUNCTIONALITY

Confirm basic Fibre Channel functionality of the storage array

1.1.1

Storage Device – Physical and Logical Login with Speed Negotiation

1.1.2

Zoning and LUN Mapping

1.1.3

Storage Device Fabric IO Integrity

1.1.4

Storage Device – Portcfgfillword Compatibility

1.1.5

Storage Device Multipath Configuration – Path integrity

1.2

FABRIC – ADVANCED FUNCTIONALTY

Examine the storage behavior related to more advanced fabric features such as QoS, Bottleneck Detection, and advanced frame recovery

1.2.1

Storage Device Bottleneck Detection – w/Congested Host

1.2.2

Storage Device Bottleneck Detection – w/Congested Fabric

1.2.3

Storage Device – QOS Integrity

1.2.4

Storage Device – FC Protocol Jammer Test Suite

1.3

STRESS & ERROR RECOVERY WITH DEVICE MULTI-PATH

Confirm proper HA/failover behavior of storage in a multipath environment

1.3.1

Storage Device Fabric IO integrity – Congested Fabric

1.3.2

Storage Device Nameserver Integrity – Device Recovery with Port Toggle

1.3.3

Storage Device Nameserver Integrity – Device Recovery with Device Relocation

1.3.4

Storage Device Nameserver Stress – Device Recovery with Device Port Toggle

1.3.5

Storage Device Recovery – ISL Port Toggle

1.3.6

Storage Device Recovery – ISL Port Toggle (entire switch)

1.3.7

Storage Device Recovery – Director Blade Maintenance

1.3.8

Storage Device Recovery – Switch Offline

1.3.9

Storage Device Recovery – Switch Firmware Download

1.4

STORAGE DEVICE – FIBRE CHANNEL ROUTING (FCR) INTERNETWORKING TESTS

Confirm proper storage functioning within routed fabrics

1.4.1

Storage Device InterNetworking Validation w/FC host

1.4.2

Storage Device InterNetworking Validation w/FCoE Test

1.4.3

Storage Device Edge Recovery after FCR Disruptions

1.4.4

Storage Device BackBone Recovery after FCR Disruptions

1.5

Optional/Additional Tests

1.5.1

Storage device firmware update

2.0

IO WORKLOADS

All workload runs are monitored at the host, storage and fabric and verify they complete without any I/O errors or faults. Run specific IO patterns and verify advanced performance across specific dimensions of IO workload.

2.0.1

(Single host) x (1 initiator ports) à 1 target port

 

(Single host) x (1 initiator ports) à 4 target ports

2.0.2

(Single host) x (2 initiator ports) à 4 target ports

2.0.3

(4 hosts) x (2 initiator ports per host) à 4 target ports

2.0.4

(2 host ESX cluster with 2 initiator ports per host) x (8 VMs on cluster) à 4 target ports

2.0.5

Application specific workloads

 

1.1 Fabric Initialization – Base Functionality

 

1.1.1 Storage Device – Physical and Logical Login with Speed Negotiation

Test Objective

Verify device login to switch and nameserver with all supported speed settings.

 

Procedure

Set switch ports to 2/4/8/Auto_Negotiate speed settings.

<==========>

portcfgspeed <port> [2/4/8/0]

<==========>

 

Result

1. PASS. Storage logs into fabric and is link up at 2Gb/4Gb/8Gb.

2. PASS. Ran additional IO to verify.

 

1.1.2 Zoning and LUN Mapping

Test Objective

Verify host to LUN access exists with valid zoning.

 

Procedure

  1. Create FC zone on the fabric with the initiator and target WWNs.
  2. Create Host Groups and LUNs on the array with access to initiator WWN.

Result

1. PASS. For each host, created a zone containing four storage ports and two host ports.

2. PASS. Verified LUNs are presented to host;

3. PASS. Verified with IO.

 

1.1.3 Storage Device Fabric IO Integrity

Test Objective

Validate single path host-to-LUN IO with write/read/verify testing. Include short device cable pulls/porttoggle to validate device recovery.

 

Procedure

  1. Setup read/write I/O to LUN using Medusa/vdbench
  2. Perform link disruptions by port-toggles, cable pulls.
  3. Verify I/O recovers after short downtime.

Result

1. PASS. IO integrity is valid and port recovery is successful.

 

1.1.4 Storage Device – Portcfgfillword Compatibility

Test Objective

Validate with IO all portcfgfillword settings and determine optimal settings.

 

Procedure

  1. Set switch ports connecting to array target ports to different settings and verify target port operation.

<==========>

portCfgFillWord PortNumber  Mode 

Mode: 0/-idle-idle   - IDLE in Link Init, IDLE as fill word (default)

      1/-arbff-arbff - ARBFF in Link Init, ARBFF as fill word

      2/-idle-arbff  - IDLE  in Link Init, ARBFF as fill word (SW)

      3/-aa-then-ia  - If ARBFF/ARBFF failed, then do IDLE/ARBFF

<==========>

 

  1. Monitor “er_bad_os – Invalid Ordered Set” with portstatsshow

Result

1. PASS. Tested portcfgfillword modes 0,1,2,3; verify w/ IO performance and portstatsshow.

2. PASS. Mode 0 results in 'Bad Ordered Set' counters incrementing. Recommended setting is '3' on 8GB Condor2 platforms.

 

1.1.5 Storage Device Multipath Configuration – Path integrity

Test Objective

Verify multi-path configures successfully. Each Adapter and Storage port to reside in different switches. For all device paths, consecutively isolate individual paths and validate IO integrity and path recovery.

 

Procedure

  1. Setup host with at least 2 initiator ports zoned with 2 target ports on array.
  2. Setup multipath on host
  3. Start I/O
  4. Perform sequential port toggles across initiator and target switch ports to isolate paths.

Result

1. PASS. Additional configuration steps taken on the Linux and VMWare systems.

2. PASS. Path validation, performance, and recovery are verified on RHEL, SLES, VMWare, Windows 2008, and Windows 2012.

 

1.2 Fabric – AdvancedFunctionality

 

1.2.1 Storage Device Bottleneck Detection – w/Congested Host

Test Objective

Enable Bottleneck Detection in fabric. Create congestion on host adapter port. Verify Storage Device and switch behavior.

 

Procedure

  1. Enable bottleneck detection on all switches.
  2. Start I/O from single host initiator to multiple targets.
  3. Monitor switch logs for Congestion and Latency bottleneck warnings.
  4. Use “bottleneckmon –show” to monitor bottlenecked ports.

Result

1. PASS. Enable monitoring with ‘bottleneckmon –enable’

2. PASS. Create host port congestion with high throughput workload. Confirm bottleneck detection is reported.

 

1.2.2 Storage Device Bottleneck Detection – w/Congested Fabric

Test Objective

Enable Bottleneck Detection in fabric. Create congestion on switch ISL port. Verify Storage Device and switch behavior.

 

Procedure

  1. Enable bottleneck detection on all switches. Fabric Vision license required.
  2. Isolate single ISL in the fabric.
  3. Start I/O from multiple host initiators to multiple targets.
  4. Monitor switch logs for Congestion and Latency bottleneck warnings.
  5. Use “bottleneckmon –show” to monitor bottlenecked ports.

Result

1. PASS. Enable monitoring with ‘bottleneckmon –enable’;

2. PASS. Simulate ISL port congestion by isolating traffic to a single ISL and running a high throughput workload. Confirm bottleneck detection is reported.

 

1.2.3 Storage Device – QOS Integrity

Test Objective

Enable QOS for devices under test. Verify device behavior and validate traffic characteristics.

 

Procedure

  1. Setup initiator-target pairs with Low/Medium/High QoS zones in the fabric.
  2. Start I/O across all pairs and verify I/O statistics.

Result

1. PASS. Create QoS zones with Brocade HBAs

2. PASS. Verify traffic runs in high, medium, and low queues.

 

1.2.4 Storage Device – FC Protocol Jammer Test Suite

Test Objective

Perform FC Jammer Tests including areas such as: CRC corruption, packet corruption, missing frame, host error recovery, target error recovery

 

Procedure

  1. Insert Jammer device in the I/O path on the storage end.
  2. Execute the following Jammer scenarios:
  • Delete one frame
  • Delete R_RDY
  • Replace CRC of data frame
  • Replace EOF of data frame
  • Replace “good status” with “check condition”
  • Replace IDLE with LR
  • Truncate frame
  • Create S_ID/D_ID error of data frame

3.  Verify Jammer operations and recovery with Analyzer.

 

Result

1. PASS. Insert Finisar Jammer/Analyzer between storage port and switch. Introduce packet anomalies and verify proper recovery.

 

Stress Error Recovery with Device Multi-Path

 

1.3.1 Storage Device Fabric IO integrity – Congested Fabric

Test Objective

From all initiators start a mixture of READ, READ/WRITE, and WRITE traffic continuously to all their targets for a 60 hour period.  Verify no host application failover or unexpected change in I/O throughput occurs.

 

Procedure

Setup multiple host initiators with array target ports and run Read, Read-Write Mix and Write I/O at different block sizes for a long run.

Result

1. PASS. Long IO ran successfully without issues.

 

1.3.2 Storage Device Nameserver Integrity – Device Recovery with Port Toggle

Test Objective

Sequentially, manually toggle every adapter/device port.  Verify host I/O will failover to alternate path and toggled path will recover.

 

Procedure

  1. Setup host with at least 2 initiator ports zoned with 4 target ports on array.
  2. Setup multipath on host
  3. Start I/O
  4. Perform multiple iterations of sequential port toggles across initiator and target switch ports.

Result

1. PASS. Failover between 8 logical paths (2 host x 4 storage) is successfully.

 

1.3.3 Storage Device Nameserver Integrity – Device Recovery with Device Relocation

Test Objective

Sequentially performed for each Storage Device port.

Disconnect and reconnect port to different switch in same fabric. Verify host I/O will failover to alternate path and toggled path will recover. Repeat disconnect/reconnect to validate behavior in all ASIC types.

 

Procedure

  1. Setup host with at least 2 initiator ports zoned with 2 target ports on array.
  2. Setup multipath on host
  3. Start I/O
  4. Move storage target ports to different switch port in the fabric.

Result

1. PASS. Physical move of storage port shows successful recovery on Condor2 and Condor3 based products.

 

1.3.4 Storage Device Nameserver Stress – Device Recovery with Device Port Toggle

Test Objective

For extended time run. Sequentially Toggle each Initiator and Target ports in fabric.  Verify host I/O will failover to alternate path and toggled path will recover.

 

Procedure

  1. Setup host with at least 2 initiator ports zoned with 4 target ports on array.
  2. Setup multipath on host
  3. Start I/O
  4. Perform multiple iterations of sequential port toggles across initiator and target ports on the host and array.

Result

1. PASS. 48-hr run; failover and recovery with repeated port disable successful.

 

1.3.5 Storage Device Recovery – ISL Port Toggle

Test Objective

For extended time run. Sequentially toggle each ISL path on all switches.  Host I/O may pause, but should recover.  Verify host I/O throughout test.

 

Procedure

  1. Setup host with at least 2 initiator ports zoned with 4 target ports on array.
  2. Setup multipath on host
  3. Start I/O
  4. Perform multiple iterations of sequential ISL toggles across the fabric.

Result

1. PASS. Repeated ISL disable shows path failover and recovery while running IO.

 

1.3.6 Storage Device Recovery – ISL Port Toggle (entire switch)

Test Objective

For extended time run. Sequentially toggle ALL ISL paths on a switch isolating switch from fabric Verify host I/O will failover to alternate path and toggled path will recover.

 

Procedure

  1. Setup host with at least 2 initiator ports zoned with 4 target ports on array.
  2. Setup multipath on host
  3. Start I/O
  4. Perform multiple iterations of sequentially disabling all ISLs on a switch in the fabric.

Result

1. PASS. 48-hr run; Repeated ISL disable shows path failover path recovery while running IO.

 

1.3.7 Storage Device Recovery – Director Blade Maintenance

Test Objective

For extended time run. Verify device connectivity to DCX blades.

Sequentially toggle each DCX blade. Verify host I/O will failover to alternate path and toggled path will recover. Include blade disable/enable, blade power on/off, and manual blade removal/insertion.

 

Procedure

  1. Setup host with at least 2 initiator ports zoned with 2 target ports on array.
  2. Setup multipath on host
  3. Start I/O
  4. Perform multiple iterations of sequential disable/enable, power on/off of the DCX blades in the fabric.

Result

1. PASS. IO failover and recovery with DCX reboot

2. PASS. Power cycle is successful.

 

1.3.8 Storage Device Recovery – Switch Offline

Test Objective

Toggle each switch in sequential order.   Host I/O will failover to redundant paths and recover upon switch being enabled. 

Include switch enable/disable, power on/off, and reboot testing.

 

Procedure

  1. Setup host with at least 2 initiator ports zoned with 2 target ports on array.
  2. Setup multipath on host
  3. Start I/O
  4. Perform multiple iterations of sequential disable/enable, power on/off and reboot of all the switches in the fabric.

Result

1. PASS. Enable/disable, reboot, and power cycle successful.

 

1.3.9 Storage Device Recovery – Switch Firmware Download

Test Objective

Sequentially perform firmware maintenance procedure on all device connected switches under test. Verify Host I/O will continue (with minimal disruption) through firmwaredownload and device pathing will remain consistent.

 

Procedure

  1. Setup host with at least 2 initiator ports zoned with 2 target ports on array.
  2. Setup multipath on host
  3. Start I/O
  4. Sequentially perform firmware upgrades on all switches in the fabric.

Result

1. PASS. Firmware download with running IO successful.

 

1.4 Storage Device – Fibre Channel Routing (FCR) InterNetworking Tests

 

1.4.1 Storage Device InterNetworking Validation w/FC host

Test Objective

Configure two FC fabrics with FCR. Verify that edge devices are imported into adjacent nameservers and hosts have access to their routed targets after FC routers are configured.

 

Procedure

  1. Setup FCR in an Edge-Backbone-Edge configuration.
  2. Setup LSAN zoning.
  3. Verify name server and FCR fabric state. fcrproxydevshow; fabricshow
  4. Verify host access to targets.

Result

1. PASS. Configured routed fabrics and LSAN_ zones.

2. PASS. Verified with IO.

 

1.4.2 Storage Device InterNetworking Validation w/FCoE Test

Test Objective

Configure a FC fabric with FCR while connected to an FCoE fabric. Verify that edge devices are imported into adjacent nameservers and hosts have access to their routed targets after FC routers are configured.

 

Procedure

  1. Add FCoE VCS fabric to FCR setup.
  2. Setup LSAN zoning.
  3. Verify name server and FCR fabric state. fcrproxydevshow; fabricshow 
  4. Verify host access to targets. 

Result

1. PASS. Created configuration and zoning.

2. PASS. Edge devices are imported, LUNs are presented; verified with IO.

 

1.4.3 Storage Device Edge Recovery after FCR Disruptions

Test Objective

Configure FCR for Edge-Backbone-Edge configuration. With IO running, validate device access and pathing. Perform reboots, switch disables, and port-Toggles on Backbone connections to disrupt device pathing and IO. Verify path and IO recovery once switches and ports recover.

 

Procedure

  1. Setup FCR in an Edge-Backbone-Edge configuration.
  2. Setup LSAN zoning.
  3. Start I/O 
  4. Perform sequential reboots, switch disables and ISL port toggles on the switches in the backbone fabric.

Result

1. PASS. Verified path recovery with FCR disruptions while running IO.

1.4.4 Storage Device BackBone Recovery after FCR Disruptions

 

Test Objective

Configure FCR for Edge-Backbone configuration. With IO running, validate device access and pathing. Perform reboots, switch disables, and port-Toggles on Backbone connections to disrupt device pathing and IO. Verify path and IO recovery once switches and ports recover.

 

Procedure

  1. Connect array target ports to backbone fabric in an Edge-Backbone configuration.
  2. Setup LSAN zoning.
  3. Start I/O
  4. Perform sequential reboots, switch disables and ISL port toggles on the switches in the backbone fabric.

Result

1. PASS. Verified path recovery with FCR disruptions while running IO.

 

1.5 Optional/Additional Tests

 

1.5.1 Storage Device firmware update

Test Objective

Execute a non-disruptive firmware update on the array while running IO and confirm there are no IO errors

 

Procedure

  1. Run continuous IO to the array
  2. Execute code update procedure as described in vendor document MGReload_NDU_Procedure.docx
  3. Confirm updated version on all array components and no errors in IO

Result

1. PASS. Update successful (from V6.3.0.2 to V6.3.1) with no IO errors.

 

2.0 Storage and Fabric Performance - IO Workload Tests

 

2.0.1 (Single host) x (1 initiator ports) and 1 target port

Test Objective

Run IO on a single path and verify performance characteristics are as expected

 

Procedure

  1. Configure a single path from a single host initiator port to a single storage port
  2. Run IO in a loop at block transfer sizes of 512, 1k, 2k, 4k, 8k, 16k, 32k, 64k, 128k, 256k, 512k, and 1m. Include a nested loop of 100% read, 100% write, and 50% read/write.

Result

1. PASS. All workload runs are monitored at the host, storage and fabric and verify they complete without any I/O errors or faults.

2. PASS. Performance behavior is as expected.

 

2.0.2 (Single host) x (1 initiator ports) and multiple target ports

Test Objective

Run multipath IO from a single initiator port to multiple target ports and verify performance characteristics are as expected

 

Procedure

  1. Configure paths from a single host initiator port to a 4 target ports.
  2. Run IO in a loop at block transfer sizes of 512, 1k, 2k, 4k, 8k, 16k, 32k, 64k, 128k, 256k, 512k, and 1m. Include a nested loop of 100% read, 100% write, and 50% read/write.

Result

1. PASS. All workload runs are monitored at the host, storage and fabric and verify they complete without any I/O errors or faults.

2. PASS. Performance behavior is as expected.

 

2.0.3 (Single host) x (2 initiator ports) and multiple target ports

Test Objective

Run multipath IO from two initiator ports one one host to multiple target ports and verify performance characteristics are as expected.

 

Procedure

  1. Configure paths from a two host initiator ports to a 4 target ports
  2. Run IO in a loop at block transfer sizes of 512, 1k, 2k, 4k, 8k, 16k, 32k, 64k, 128k, 256k, 512k, and 1m. Include a nested loop of 100% read, 100% write, and 50% read/write.

Result

1. PASS. All workload runs are monitored at the host, storage and fabric and verify they complete without any I/O errors or faults.

2. PASS. Performance behavior is as expected.

 

2.0.4 (Multi host) x (2 initiator ports) and multiple target ports

Test Objective

Run multipath IO from multiple initiator ports on multiple hosts to multiple target ports and verify performance characteristics are as expected.

 

Procedure

  1. Configure paths from a two host initiator ports per host on 4 hosts, to 4 target ports.
  2. Run IO in a loop at block transfer sizes of 512, 1k, 2k, 4k, 8k, 16k, 32k, 64k, 128k, 256k, 512k, and 1m. Include a nested loop of 100% read, 100% write, and 50% read/write.

Result

1. PASS. All workload runs are monitored at the host, storage and fabric and verify they complete without any I/O errors or faults.

2. PASS. Performance behavior is as expected.

 

2.0.5 VMWare Cluster IO Tests

Test Objective

Run multipath IO from a VMWare cluster with multiple initiator ports to multiple target ports.

 

Procedure

Configure a 2-host VMWare cluster with multipath on 2 initiator ports per host, 4 target ports, and 8 VMs. Use VMWare IOAnalyzer to create the worker VMs and drive the workload. Run IO at large and small block transfer sizes.

 

Result

1. PASS. All workload runs are monitored at the host, storage and fabric and verify they complete without any I/O errors or faults.

2. PASS. Performance behavior is as expected.

 

2.0.6 Application Specific IO Tests

Test Objective

Use a few real-world application specific workloads to confirm behavior. Examples could include VMWare IOanalyzer trace replay feature, Iometer application specific workload emulation, or a database workload emulator like Oracle Orion.

 

Procedure

Configure paths from 2 initiators per host to 4 target ports. Run the following workloads:

  • File Server simulation with Medusa
  • OLTP simulation with Orion
  • Microsoft Exchange Server simulation with Medusa and IOAnalyzer
  • SQL Server simulation with IOAnalyzer and Iometer
  • Video On Demand simulation with IOAnalyzer – only on ESX
  • Workstation simulation with IOAnalyzer – only on ESX

Result

1. PASS. All workload runs are monitored at the host, storage and fabric and verify they complete without any I/O errors or faults.

2. PASS. Performance behavior is as expected.