Fibre Channel (SAN)

Reply
Occasional Contributor
Posts: 8
Registered: ‎01-10-2014

5300 topologyshow/perfissues

Having experienced some issues lately, I've started digging into the FOS metrics. I came across this one. The topologyshow command was interesting…the Bandwidth IN to Bandwidth OUT ratio seems quite high. Of the 4-domains we have, three are showing Bandwidth Demand: 12300 % and the fourth is showing 4100%.

 

Watching the graphs during the time CRC errors were thrown by our storage array, the BBCreditZero errors would spike on the RealTime Network Advisor chart. I also observed portstats on the storage port:

tim_txcrd_z             3492915     Time TX Credit Zero (2.5Us ticks)

 

I read that the stat “tim_txcrd_z” can increase “if this port is an Storage Port, the Storage may not be able to handle the I/O’s in an acceptable time. The port runs out of BB-Credits.”

 

Tonight I'll be running a 12-hour 5pm to 5am SAN Health Check. Previous Health Checks I ran were only for 1-hour or so.

 

What are your thoughts on the tim_txcrd_z and Bandwidth Demand so high? Anything else to look out for?

 

 

Frequent Contributor
Posts: 141
Registered: ‎05-26-2009

Re: 5300 topologyshow/perfissues

The bandwidth demand in topologyshow only tells you the ratio between ISL ports (with their speed) versus F-Ports (also with their speed). So if you have 2x16G ISL and 4 end-devices each connected with 8G, it would be 100%. This metric doesn't say much, because it does not take into consideration how much traffic from these devices will really pass the ISLs. For exchange-based routing, it's just all devices mapped to all the ISLs.

For the tim_txcrd_z CAN be a problem depending how much you get and in which pattern (slowly but steadily increasing? short bursts? ...).

CRC errors on the other hand are a very clear indication of a physical port problem if you see them coming out of a storage array port.

I recommend you to order a SAN healthcheck incl. a performance analysis from your maintenance service provider. Beside of their findings about the current status of the fabrics you also get a feeling about what to expect and what to look for. And of course, also have a look into the manuals and the best practice / fabric resiliency guide.

Occasional Contributor
Posts: 8
Registered: ‎01-10-2014

Re: 5300 topologyshow/perfissues

[ Edited ]

I had a feeling on the topologyshow that it might be something along those lines, thanks for clarifying.

 

With regards to tim_txcrd_z, it is steadily increasing. I reset counters on two ISL ports and looked after 3-minutes, I had 46,000 on each port.

 

No CRC errors logged if you mean er_crc and er_crc_good_eof . Would disc_c3 (from porterrshow) also be related to this?

Valued Contributor
Posts: 555
Registered: ‎03-20-2011

Re: 5300 topologyshow/perfissues

You need to get rid of the CRC errors in the first place. CRC counter means "I see the error but I'm not the first one to see it". CRC good EOF means "I'm the first to see this error" - so that you know that the connected port is guilty. Having a CRC error means that the entire SCSI exchange (i.e. not a single frame) will have to be thrown away and retried.

Join the Community

Get quick and easy access to valuable resource designed to help you manage your Brocade Network.

vADC is now Pulse Secure
Download FREE NVMe eBook