Fibre Channel (SAN)

Reply
Occasional Contributor
Posts: 6
Registered: ‎09-19-2007

I/O Latencies & Timeouts in our SAN Fabric Brocade 5300s - v6.4.0b

We have SAN Fabric configuration with Brocade 5300s running FOS v6.4.0b.
Each of our two floors (1 & 2), served by a Brocade 5300, floor's local hosts and storage, and these
two switches are connected by TWO 8 Gbps ISLs (trunked on port#0 and port#1). All hosts (mostly Windows) in our
environemnt access storage on either floors., so ISLs are essential in our design.

We never get close to thruput limitation on ISL links, but instead we end up getting latency bottlenecks.


For example we see high number of "tim_txcrd_z" and high values for C3 frames received (261498201) on ISL trunk port.


FL2_BCD5300
tim_rdy_pri                        1042        Time R_RDY high priority
tim_txcrd_z                        3508284216  Time TX Credit Zero (2.5Us ticks)

FL1_BCD5300
tim_rdy_pri                        1041        Time R_RDY high priority
tim_txcrd_z                        4259587257  Time TX Credit Zero (2.5Us ticks)

Obviously, we see I/O delays and/or timeouts accessing SAN storage (Two EVA 8400s - one on each floor).
EVA8400s doesn't appear to show any performance/latency bottleneck, so we imagine it must be
caused by SAN and/or ISLs.

Oct 24 2011 09:23:52 GMT   Warning  "Severe latency bottleneck detected at slot 0 port 0".  Switch  259976  1  AN-1010  FL1_BCD5300_SWF-FA

When checking via Storage Essentials - we do not see thruput anywhere near/above 8% on ISLs.

By configuring additional ISLs on different port-group (ASIC), would it help to provide alternate route to the
SAN Traffic, and also provide additional BB-credits ?

Reading from documentation we understand - you can connect additional ISLs on different port-group? In such cases
those ISL won't simply be trunked, but still be used via DPS (Dynamic Path Selection) for ISL Traffic. Correct?

Any other ideas to chase this problem, or underlying issues?

Thanks
SR

Valued Contributor
Posts: 931
Registered: ‎12-30-2009

Re: I/O Latencies & Timeouts in our SAN Fabric Brocade 5300s - v6.4.0b

Check your distance if you have long distance links.

It looks like you don't have enough buffers configured to run you're ISL at 8G

Do you use CA? If so

-what replication protocol is set on the EVA8400's? (there are 2 protocols and they require different settings on all switches participating on the paths that run CA traffic)

-Do you see errors in de CVE controllr logs with regards to the ISLs or excessive rate changes or DR groups suspending and resuming?

Occasional Contributor
Posts: 16
Registered: ‎09-28-2011

Re: I/O Latencies & Timeouts in our SAN Fabric Brocade 5300s - v6.4.0b

Hi there

I'm have a problem that sounds a bit like the same problem, my problem is that my servers is loosing paths, have you seen that as well?

http://community.brocade.com/message/20126

/kenneth

Occasional Contributor
Posts: 6
Registered: ‎09-19-2007

Re: I/O Latencies & Timeouts in our SAN Fabric Brocade 5300s - v6.4.0b

Hi:

Thanks for quick suggestions. We do not have CA, but might say something similar - HP SVSP (SAN Virtualization) - thru which we setup some mirroring on some volumes EVAs across floors (about10 TB of data-size, but updates/writes are moderate, not overly write intense). But from what I hear - we've had these I/O timeouts even before SVSP (and the mirroring across two EVAs) was put in place. Before this SVSP came into picture, the configuration was still mostly similar  - hosts on either floors still accessed data from either floors EVAs (understand this setup is not most optimal - but this is how we are today). We do not see the data-mirrors splitting due to these time-outs experienced by hosts. But, SVSP's setup disks are themselves mirrored across EVAs, occationally - these do get break and after a short while re-sync.

Most of the hosts are set @ 8 Gb (auto), while storage is on 4 Gbps (EVAs, SVSP - both can only do 4 Gbps) speed.

Distance between floors is may (length of long-haul cable) is 150 meters approx. We did not change BB Credits - I think its set at default value (26?) on both switches.

For example, SQL applications report errors: "

SQL Server has encountered 1 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file in database (117).  The OS file handle is 0x0000000000000FFC.  The offset of the latest long I/O is: 0x00000297fe0000

And these translate to Perfmon latencies (not able to catch all events thru it, but do see occational longer latencies on theses hosts)

Thanks

SR

Valued Contributor
Posts: 931
Registered: ‎12-30-2009

Re: I/O Latencies & Timeouts in our SAN Fabric Brocade 5300s - v6.4.0b

First off I do not favor AN speeds and prefer fixed, this lets you spot problems earlier (like a degrading laser getting cut of instead of a speed change).

Secondly how did you set the fill word of your 8G ports? Brocade advises to use mode 3 for most cases.

I don't know the SVSP product (manufactured by LSI until they dropped it) and its working. But your description of SVSP mirror break and resyncing could be related to your timeouts your hosts are expierencing.

Are there any host who are provisoined directly to an EVA? And if so do they experience timeout aswell

Perhaps you already mentioned but do the timeouts occur on disk on the local or remote floor/EVA/SVSP or both?

Occasional Contributor
Posts: 6
Registered: ‎09-19-2007

Re: I/O Latencies & Timeouts in our SAN Fabric Brocade 5300s - v6.4.0b

FillWord set to "0" (our original default value prior upgrading to 6.4.0b) on all our Brocade ports - including 8 Gb ports;

Most of our 8 Gb ports are in use by Hosts (Windows/Emulex), and two-ports by ISLs. Most of the storage can only run at 4 Gbps (EVAs & SVSPs both)

Most of the active hosts are on one floor, while the most-active storage enclosure (EVA) sits on the other floor;

We do not recall if we noticed same-floor Host getting timeouts accessing same-floor EVA Storage

Would it be better to set Hosts to 4 Gbps mode, rather than setting the Fillword to 3 (I've read that Errors can happen- if vendors/drives do not adhere to strict 8G standard: Ref: http://community.brocade.com/thread/4036?start=45&tstart=0)

Thanks

SR

Contributor
Posts: 39
Registered: ‎10-26-2010

Re: I/O Latencies & Timeouts in our SAN Fabric Brocade 5300s - v6.4.0b

With your ISL distance at approx 150m, you may be flirting with the upper limits of what your 8Gb ISL can do, especially depending on type of cabling you have.  This datasheet shows that you need OM3 type cabling to even get to 150m at 8G.  Much less for OM2.

http://www.brocade.com/downloads/documents/data_sheets/product_data_sheets/SFP_8GB_SWL_DS_01.pdf

Perhaps setting the ISLs to 4G speed to see if your counters stabilize would be a good test.

bs
Occasional Contributor
Posts: 10
Registered: ‎10-26-2011

Re: I/O Latencies & Timeouts in our SAN Fabric Brocade 5300s - v6.4.0b

have you made any progress on this problem? it looks like we're having the same issues in our san infrastructure.

Occasional Contributor
Posts: 16
Registered: ‎09-28-2011

Re: I/O Latencies & Timeouts in our SAN Fabric Brocade 5300s - v6.4.0b

Hi

I had the same kind of error, we tried to set the ISL uplink down to 4Gbps, and the bottlenecks and latencies stopped.

We may have a problem with our cables, we havent found out why yet, but for the momemt it runs great at 450MB/s.

/kenneth

bs
Occasional Contributor
Posts: 10
Registered: ‎10-26-2011

Re: I/O Latencies & Timeouts in our SAN Fabric Brocade 5300s - v6.4.0b

Thanks for replying!

I'm curious, did you change additional settings like raising the reserved Buffers on the ISL ports or changing other FC-port settings on your ISL or EVA / FlexFabric (NPIV) host ports?

Join the Community

Get quick and easy access to valuable resource designed to help you manage your Brocade Network.

Download FREE NVMe eBook