07-13-2011 06:04 PM
First, I'll explain my setup a bit.
(2) Brocade 4900 (4Gbps) FOS 6.4.0b - Not ISL'd together, they provide dual fabric High Availability at Site A
(1) Brocade 4100 (4Gbps) FOS 6.3.2
(1) Brocade 3900 (2Gbps) FOS 5.3.2c
(1) Brocade 4100 (4Gbps) FOS 6.4.0b - Connected to a site A switch through Cisco ONS
Site A and Site B are connected via an OC-192, transported by (2) Cisco ONS 15454 DWDM. 10DME card on ONS is presenting both site A and site B with (2) 2Gbps FC links, carved out of SONET. Link distance is <5KM, measured latency is around 55us round trip.
Site B has all storage systems on the Brocade 3900, with (2) ISL's from the 3900 to each of the Site B 4100's.
The OC-192 is connected between one Brocade 4900 in Site A, and one of the Brocade 4100's in Site B. I can elaborate more if needed.
What we are seeing, is a significant amount of Buffer Credit Zero conditions, and a significant amount of R_RDY High Priority. The counters for buffer zero condition increase by about 100,000 per 5 seconds, and the R_RDY are around 300,000-400,000 per 5 seconds.
Each port is assigned 26 buffer credits, with long distance set to L0.
The Cisco ONS 15454 has FEC and EFEC turned off, and buffer credit spoofing is enabled.
I'm at my wits end trying to figure what the problem is, none of it seems to make sense as to why we are out of buffer credits so quickly. Could the different speeds on the switches in Site B cause this? With Cisco ONS buffer credit spoofing on (which we really shouldn't need at this distance), I'm still incredibly confused as to why I would continue having buffer credit issues.
Please someone point out something I am missing....
07-13-2011 07:35 PM
Disable creditspoofing on the cisco. At this small distance it's not needed. 55 microseconds RTT is OK.
Are the 2 2Gb ISL links on the DWDM side reserved (dedicated) bandwidth or is this shared between other types of traffic like TCPIP. Although the Cisco side can show 2Gb/s connections it doesn't need to be dedicated.
Secondly, do you also see tx-credit-0 on some of the target ports?
It's very hard to determine since you might be dealing with numerous culprits at the same time. A slow drain device might screw things up significantly.
Give it a try.
07-13-2011 08:26 PM
The 2GB lines off the Cisco, I'm told, are dedicated.
Any good ideas how to identify, fix, etc. a slow drain device? I've come to a similar thought about a slow drain device, but not sure how to identify/fix it.
07-13-2011 10:23 PM
Hmm, yeah, bottleneck detection is very cumbersome on Condor or GoldenEye based switches like the 4100 especially with some lower fw versions.
I suspect some issue with creditspoofing on the cisco to be honest but thats just a gutt-feeling. Can you try to disable that. This will also remove the need for the Cisco to hook into the FC stream to check FLOGI's and RSCN's. You might save yourself from running into Cisco bugs as well.
07-14-2011 08:37 AM
please be aware that Brocade is using virtual channel on the ISL and is using VC_RDYs instead of R_RDYs on the ISLs. Also your 26 Buffers are spreaded through the different channels, so you will have effectively only 8 on the individual VC, but this gives you the distance limit. For 5km on 2G, you need at a minimum 5 plus one for the next frame. But this is only for full frames, so this might be not enough.
Before looking for slow drain device, you have first correct you long distance settings:
Option 1: Disable the buffer spoofing and change the long distance mode to LE (consolidation of all buffers to one channel - VC2).
Option 2: Change the Long distance mode to LE and enable the R_RDY mode on the ISL port of the Brocade switches on both sites (portcfgislmode).
After that you errors should immediately decrease. In the moment your ONS is sending R_RDY that the brocade doesn´t understand.
07-15-2011 04:26 AM
Yes, L0/LE makes are different. With L0 Brocade uses 4 VC (VC2-VC5) for the data traffic. And your overall port buffers will be dedicated to the individual VC. So, you have only a quarter for the distance calculation. With LE Brocade is only using VC2, so all port buffers belongs to the same channel and you can have them all for the distance limitation.
08-03-2011 05:26 AM
Just for the sake of maybe helping someone in the future, we learned a few things last night that solved this issue for us.
We are running Cisco 15454 with 10DME cards over a SONET OC-192 line.
Originally, we had incorrectly determined the RTT to be 55us. A couple things made us realize this is not possible.
1) The 10DME card Source to Destination is 30us by itself.
2) If FEC (Forward error correction) is turned on in the 15454, it's an additional 150us source to destination
3) If EFEC (G709) is turned on in the 15454, it's an additional 5us source to destination
The lowest possible latency RTT ever, with FEC/EFEC off, could only be 60us.
So the overall issue was the RTT for us was 112us, not 55us. This drastically changes the buffer credits needed, etc. Case Closed.
Thanks for everyone's input!