10-24-2011 10:40 AM
We have SAN Fabric configuration with Brocade 5300s running FOS v6.4.0b.
Each of our two floors (1 & 2), served by a Brocade 5300, floor's local hosts and storage, and these
two switches are connected by TWO 8 Gbps ISLs (trunked on port#0 and port#1). All hosts (mostly Windows) in our
environemnt access storage on either floors., so ISLs are essential in our design.
We never get close to thruput limitation on ISL links, but instead we end up getting latency bottlenecks.
For example we see high number of "tim_txcrd_z" and high values for C3 frames received (261498201) on ISL trunk port.
tim_rdy_pri 1042 Time R_RDY high priority
tim_txcrd_z 3508284216 Time TX Credit Zero (2.5Us ticks)
tim_rdy_pri 1041 Time R_RDY high priority
tim_txcrd_z 4259587257 Time TX Credit Zero (2.5Us ticks)
Obviously, we see I/O delays and/or timeouts accessing SAN storage (Two EVA 8400s - one on each floor).
EVA8400s doesn't appear to show any performance/latency bottleneck, so we imagine it must be
caused by SAN and/or ISLs.
Oct 24 2011 09:23:52 GMT Warning "Severe latency bottleneck detected at slot 0 port 0". Switch 259976 1 AN-1010 FL1_BCD5300_SWF-FA
When checking via Storage Essentials - we do not see thruput anywhere near/above 8% on ISLs.
By configuring additional ISLs on different port-group (ASIC), would it help to provide alternate route to the
SAN Traffic, and also provide additional BB-credits ?
Reading from documentation we understand - you can connect additional ISLs on different port-group? In such cases
those ISL won't simply be trunked, but still be used via DPS (Dynamic Path Selection) for ISL Traffic. Correct?
Any other ideas to chase this problem, or underlying issues?
10-24-2011 11:04 AM
Check your distance if you have long distance links.
It looks like you don't have enough buffers configured to run you're ISL at 8G
Do you use CA? If so
-what replication protocol is set on the EVA8400's? (there are 2 protocols and they require different settings on all switches participating on the paths that run CA traffic)
-Do you see errors in de CVE controllr logs with regards to the ISLs or excessive rate changes or DR groups suspending and resuming?
10-24-2011 11:34 AM
I'm have a problem that sounds a bit like the same problem, my problem is that my servers is loosing paths, have you seen that as well?
10-24-2011 12:23 PM
Thanks for quick suggestions. We do not have CA, but might say something similar - HP SVSP (SAN Virtualization) - thru which we setup some mirroring on some volumes EVAs across floors (about10 TB of data-size, but updates/writes are moderate, not overly write intense). But from what I hear - we've had these I/O timeouts even before SVSP (and the mirroring across two EVAs) was put in place. Before this SVSP came into picture, the configuration was still mostly similar - hosts on either floors still accessed data from either floors EVAs (understand this setup is not most optimal - but this is how we are today). We do not see the data-mirrors splitting due to these time-outs experienced by hosts. But, SVSP's setup disks are themselves mirrored across EVAs, occationally - these do get break and after a short while re-sync.
Most of the hosts are set @ 8 Gb (auto), while storage is on 4 Gbps (EVAs, SVSP - both can only do 4 Gbps) speed.
Distance between floors is may (length of long-haul cable) is 150 meters approx. We did not change BB Credits - I think its set at default value (26?) on both switches.
For example, SQL applications report errors: "
|SQL Server has encountered 1 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file in database (117). The OS file handle is 0x0000000000000FFC. The offset of the latest long I/O is: 0x00000297fe0000|
And these translate to Perfmon latencies (not able to catch all events thru it, but do see occational longer latencies on theses hosts)
10-24-2011 01:54 PM
First off I do not favor AN speeds and prefer fixed, this lets you spot problems earlier (like a degrading laser getting cut of instead of a speed change).
Secondly how did you set the fill word of your 8G ports? Brocade advises to use mode 3 for most cases.
I don't know the SVSP product (manufactured by LSI until they dropped it) and its working. But your description of SVSP mirror break and resyncing could be related to your timeouts your hosts are expierencing.
Are there any host who are provisoined directly to an EVA? And if so do they experience timeout aswell
Perhaps you already mentioned but do the timeouts occur on disk on the local or remote floor/EVA/SVSP or both?
10-24-2011 03:09 PM
FillWord set to "0" (our original default value prior upgrading to 6.4.0b) on all our Brocade ports - including 8 Gb ports;
Most of our 8 Gb ports are in use by Hosts (Windows/Emulex), and two-ports by ISLs. Most of the storage can only run at 4 Gbps (EVAs & SVSPs both)
Most of the active hosts are on one floor, while the most-active storage enclosure (EVA) sits on the other floor;
We do not recall if we noticed same-floor Host getting timeouts accessing same-floor EVA Storage
Would it be better to set Hosts to 4 Gbps mode, rather than setting the Fillword to 3 (I've read that Errors can happen- if vendors/drives do not adhere to strict 8G standard: Ref: http://community.brocade.com/thread/4036?start=45&
10-24-2011 03:21 PM
With your ISL distance at approx 150m, you may be flirting with the upper limits of what your 8Gb ISL can do, especially depending on type of cabling you have. This datasheet shows that you need OM3 type cabling to even get to 150m at 8G. Much less for OM2.
Perhaps setting the ISLs to 4G speed to see if your counters stabilize would be a good test.
11-09-2011 02:30 AM
I had the same kind of error, we tried to set the ISL uplink down to 4Gbps, and the bottlenecks and latencies stopped.
We may have a problem with our cables, we havent found out why yet, but for the momemt it runs great at 450MB/s.
11-10-2011 05:16 AM
Thanks for replying!
I'm curious, did you change additional settings like raising the reserved Buffers on the ISL ports or changing other FC-port settings on your ISL or EVA / FlexFabric (NPIV) host ports?