07-27-2016 06:11 AM
I am using FOS 7.4.1b with DCX at 2 Datacenters - with TrueCopy and later GAD.
MAPS is enabled at switches which can do maps.
Accoring the manuals, Port Health Violations widget can display c3 discards.
Have not seen any (0) so far and wondering if this is a bug or missconfiguration ...
See MAPS -1003 delay messages ....
07-27-2016 06:42 PM
You can make some general guesses about the class 3 discards using the porterrshow command, and check the column for "C3 Disc", then running maps --show all and in the subsection on 'port health' locate the counter named "C3TXTO". These are the equivalent counters. However, I say a 'general guess' because the data in porterrshow is a cumulative counter since the time the switch was rebooted, or the stats have been cleared. There is NO WAY to determine when the C3 discard happened except the detail of the mapsdb --show details which will provide the date and time if the discard happened while the MAPS rule was active.
To gather accurate counters in each command, first run portstatsclear, portstats64clear and then run mapsdb --clear. This will do a reset of all the counters for port health and also for port traffic and you can then try to match those two counters looking forward.
While it was not asked, any fabric where there are Class 3 discards occuring, this is a serious performance and data integrity issue and should be resolved completely. The recovery method for Class 3 discards is sort of like radioactive decay - the more it happens, the more it generates retransmition which will again reinforce more C3 discards. At some point, it will go super-critical - and a Class 3 frame storm(meltdown) will occur, bringing the entire fabric to a grinding halt. We call that "A Bad Thing" :-)
(note the double dash after the space and mapsdb command. It must be a double dash and not single)
07-27-2016 11:22 PM
Thanks for your feedback !
The problem i am facing, porterrshow reflects C3 discards while BNA GUI does not.
Propably i have screwed up BNA config ;-(
The problem is having too much infos without immediate (possible) action which i try to reduce ....
Might be a note like "c3 discards are only messages alerted through SNMP trap mechanismus" which i try to find
mapsdb --show is a good starting point
I see IO_PEF_IMPACT and IO_LATENCY_CLEAR entries
porterrshow show me the occuring c3 discards.
One problem i solved allready ...
Customer did too much clear the error counters ....
So allways when i looked at it, most of the counters was 0.
07-28-2016 07:08 AM - edited 07-28-2016 07:10 AM
Yes, you are having C3 discards, or from the MAPS it is called "C3TXTO" = Class 3 transmit frame timeout. Once the latency on the port goes below a defined level, then the counter is reset, and monitoring resumes for the affected port. That is the IO_LATENCY_CLEAR entry in the DashBoard.
The MAPS Dashboard is telling you there is a performance impact to IO due to a latency. I would use the info about the port in MAPS to determine which device is causing the latency impact. Sometmes it comes down to trial and error until you find the offending port without going through a lot of chasing down port buffer credit starvation. It's also useful to know that the port which is being affected by latency is typically NOT the port that is causing the latency.
For slower devices logged into the fabric, investigate the traffic flow, and any other issues can be a good starting point. In an 8Gbps fabric, common devices which cause latency are: 1) Legacy 2 or 4Gbps HBAs in high read(RX) situations. 2) Legacy tape devices in a disk based environment getting mostly writes(TX). 3) A third party Storage Volume Controller which is designed to 'trap' read and write requests, and causes IO thrashing by intercepting the data exchanges and routing them to the various real storage devices.
I would strongly suggest investigating a free SAN HEALTH report of your fabric and all attached devices. It's a very useful tool for design, and debugging typical SAN fabric topology faults, and often leads to improved overall performance and fabric health.
The link is near the bottom of the my.brocade.com web portal, look for "SAN Health and NET Health reports". You will download the latest utility, configure your settings for the report, then run it, and within 24 hours a well formatted XLS report will be emailed to you for investigation.
Best of luck,