07-19-2012 11:02 AM
I was closely monitoring our ISL ports from Blade SAN switches to the DCX. There is constant increase in the tim_txcrd_z and tim_txcrd_z_vc values. I reset the port statistics for the ISL ports and found that the tim_txcrd_z value increases by 1500 per second on certain ISL ports. The tim_txcrd_z_vc values for the data VC's 2,3,4 and 5 increments anywhere from 0 to 1100 per second, there is no increment on the tim_rdy_pri values. All the ISL's are 8Gbps with QOS in AE state (we do not have QOS license) and the Long Distance mode disabled. There is no errors on the SAN switches indicating a possible SFP or cable issue. The servers connected to these SAN switches face occasional reset of the HBA's and application I/O response alerts.
I have been digging around for an clarification on these values with no luck.
Can anyone let me know whether I need to worry about the values and the rapid increase of the tim_txcrd_z and tim_txcrd_z_vc values?
Thanks and Regards,
07-26-2012 01:42 AM
In your case I wouldn't analyze it from a pure statistics point of view. (the tim_txcrd_z ticks 400000 times a second)
I would enable bottleneckmon first (http://seb-t.de/bneck). Also the maintenance provider for your SAN switches should be able to offer a performance analysis to find bottlenecks and their reasons in your SAN.
07-26-2012 06:13 AM
It is related to lack of buffers (BB credit zero).
There may be two reasons:
1) channel really can not cope with the loading and there are bottlenecks in fabric - need to monitor stat_ftx, stat_frx;
2) there is slow device in fabric, such as 2Gb port (it is more likely). In this case need to exclude device from passing through ISL. If you have a trunk on ISL then perhaps it is better to split into separate ISL, in this case will more Virtual Channels, slow traffic will go through one VC, other will be available for faster traffic.