Fibre Channel (SAN)

Reply
dc1
Occasional Contributor
Posts: 6
Registered: ‎11-15-2010

Slow Performance on one fabric, not the other.

Hey folks,

I'm having this amazingly annoying problem, where if i tell my multipathing software to go out only path A, the performance is fine, and then if i have it go out path B, its horrible.  I am using a sql script that inserts 10000 rows into a table and recording the times of each run. On the good path, i get 1000-1500 inserts/sec, on the bad path i get 100 if im lucky.  Sometimes if i let it sit, the first run will be semi-ok, like 800 inserts/sec, and then if i run it again, its below 100 again.

I've tried this on 2 different servers, same results.  Good steady performance on fabric A, horrible on fabric B.

I've cleared all the ports involved on fabric B, and ran the tests multiple times again, and i do not see any errors like bad transmitted word or anything.  I've also tried pathing to the storage over the other storage controller on the netapp, and still got bad performance, which to me eliminates the possibility of a bad SFP or cable to the SAN itself.

The fabric consists of SAN->Core Switch->Bladecenter Integrated Switch w/ 3 ISL Trunks to core->Server

I'm just out of options and hoping someone can help.

Super Contributor
Posts: 260
Registered: ‎04-09-2008

Re: Slow Performance on one fabric, not the other.

This could be a case of a slow drain device in one fabric affecting it entirely, and a bad SFP cable can cause this. Its strange but true and I have had a similar experience before.

Detection of the slow drain device is not easy but with newer FOS (6.3+) you have a command called bottleneckmon which can help you find bottlenecks.

New Bottleneck Detection Capability
With Fabric OS 6.3, you can now utilize the new Bottleneck Detection capability available with Brocade Advanced Performance Monitoring. Bottleneck Detection identifies and alerts you about "slow-drain" storage devices that can cause latency and I/O timeouts. This capability is particularly valuable for optimizing performance in highly virtualized server environments.

I'm using FOS 6.4.1 and in here I can enable bottleneck detection for the whole switch bottleneckmon --enable, but in older FOS you need to enable on specific ports.

bottleneckmon --show
==================================================================
        Tue Nov 16 10:51:12 CET 2010
==================================================================
List of bottlenecked ports in most recent interval:
None
==================================================================
                                                Number of
From                    To                      bottlenecked ports
==================================================================
Nov 16 10:51:02         Nov 16 10:51:12           0
Nov 16 10:50:52         Nov 16 10:51:02           0
Nov 16 10:50:42         Nov 16 10:50:52           0
Nov 16 10:50:32         Nov 16 10:50:42           0

Check for tape libs/drives or backups that might exist in the slow fabric or for flapping ports with fabriclog -s. A slow drain device often does logout login in the fabric.

dc1
Occasional Contributor
Posts: 6
Registered: ‎11-15-2010

Re: Slow Performance on one fabric, not the other.

This is very good advice, i am running 6.4.0b on my core switch (the one i dont think theres any issue with), and it said no ports are bottlenecked.

The switch that i do think i'm having a problem with was running an old version of the FW, and i upgraded it today. Problem is, bottleneckmon doesnt want to work on it because it says all my ports are not F_Ports, which they say they are in a switchshow.  However since this is an IBM integrated bladecenter module, i think bottleneckmon wont work because Locked_G_Port and Disabled_E_Port is set to ON in the portcfgshow.  No clue what to do there.

Additionally, i also disabled each ISL trunk one at a time and ran the test, still bad performance.

I am going to replace all the cables in between as well, anything you can do to help would be great.

Super Contributor
Posts: 635
Registered: ‎04-12-2010

Re: Slow Performance on one fabric, not the other.

In the older 6.3 FOS code is a defect which doesn't notice the F_Port is you have changed one of the default port settings like disable E_Port.

This is fixed with 6.4.0b.

Andreas

Super Contributor
Posts: 635
Registered: ‎04-12-2010

Re: Slow Performance on one fabric, not the other.

What kind of Storage array are you using?

Some midrange array have a preferred path and if IO are coming down the secondary link the arrays have to reroute internally the IOS which takes longer than on the  primary path.

If so please use the storage system multipathing software.

Andreas

dc1
Occasional Contributor
Posts: 6
Registered: ‎11-15-2010

Re: Slow Performance on one fabric, not the other.

I have an IBM Branded NetAPP (N6070).  I did discover the sfp on the netapp itself was causing a large number of tim_txcrd_z counts, i had it replaced, as well as every SFP involved from the core switch to the edge switch integrated in the bladecenter.

The tim_txcrd_z count stopped on the link to the SAN itself, but it still persists somewhat on the E_Ports of the edge switch.

The performance issue is still there, and it cant be preferred path related, i have used the storage systems multipath software, as well as only zoning it to the port on the SAN, and is still happening (with out without the vendor multipathing software enabled or disabled)

dc1
Occasional Contributor
Posts: 6
Registered: ‎11-15-2010

Re: Slow Performance on one fabric, not the other.

Additionally, the performance flip flops.. some run's i will get 600-800 inserts a second, and consistantly if i run a test right after it never breaks 100.

Join the Community

Get quick and easy access to valuable resource designed to help you manage your Brocade Network.

vADC is now Pulse Secure
Download FREE NVMe eBook