03-10-2011 12:29 PM
I'm monitoring the SW-MIB::swFCPortNoTxCredits value on our FC-switches, i.e. the number of times when the transmit credit has reached zero.
Our SAN is rather small, and with no geographic distribution: All units are confined in the same server room. There are no inter-switch links.
As far as I know, the concept of tx credits is normally only interesting in relation to SANs with long distances between units. Right?
But how should the graph for swFCPortNoTxCredits be interpreted, then? Does it tell us something about the storage system, or does it mostly reflect something about the servers communicating with the storage system? Or both?
Troels Arvin, Copenhagen
03-11-2011 02:35 PM
In general it will tell you that the switch runs out of buffers for a certain amount of time. The switch is waiting for a response of the device.
This can mean that the storage array is very busy and is it not able to receive any further frames.
This can caused by an overload storage port. too many server connected or too many LUNs or a too high LUN queue depth on the servers.
Did you have discards C3 frames?
I hope this helps,
03-12-2011 11:49 PM
No, there are no C3 frame discards.
I just looked at graphs for another port on the switch: This port is connected to module on an IBM XIV storage system which we think is not overloaded (but we could be wrong). For this port, there are no C3 frame discards, either.
I guess that my question is: What rate of zero transmit credits should be interpreted as reflecting a problem, given that we have no long-distance connections?
03-13-2011 03:20 AM
It is difficult to say which value off buffercreditzero is a problem. Important is that you have no discards on all ports in your SAN.
If you have currently no Discrads in the fabric than it is no problem at all.
At 3:00 am you have an increased number of buffer credit zero which means high load on that device. Some frames (IOs) can not be processed / delivered by the ASIC to the connected device. This will reduce the throughput due to longer response times. This is normal on arrays when ports gets busy. You can see that at 3:00. You have on the RX side a higher load compared to a other time period. You have to keep in mind that you are not able to see all the bursts just some average values over your monitoring interval.
If you have no discards in fabric than you have no problem. But in case of discards somewhere in the fabric you have to find the ports with high buffercreditzero value and then you have to reduce the load on that port. Keep in mind that very often the port with discards is not the reason for discards in the fabric.
03-13-2011 08:53 AM
Does it make sense to monitor swFCPortNoTxCredits at all, then?
It sounds like swFCPortC3Discards is more relevant, if one wants to be on the watch for capacity problems.