03-10-2011 07:52 AM
We had an incident where some work around a SAN switch resulted in a connector to one of the ports being loosely connected.
Fortunately, we saw the problem through messages from the affected server in our centralized syslog. But syslog messages are easy to miss. We would like to be able to detect a similar situation in another way, such that it can show up in one of your alarming tools (Nagios, preferably).
So I performed an SNMP-walk in the SW-MIB::swFCPortEntry branch of one of our Brocade 5100 FC-switches. Then I partly pulled a connector from a port; according to the LEDs on the switch, the connector was still connected, but the involved server's syslog became full of complaints. I subsequently performed a number of SNMP-walks with various intervals. Finally, I looked at the various counters.
What I saw: Right after the incident, the value for SW-MIB::swFCPortRxBadOs rose sharply, but then it leveled out, even though the connector was still loosely connected (and the server was still complaining).
In the Port Administration web/java tool, I saw that the port's speed was now "N2"; I believe it was N4 before the change. But as some of our connectors may very well communicate at N2 speed in the normal situation, this cannot be used to detect loose connectors.
I also looked at counters in the FIBRE-CHANNEL-FE-MIB::fcFeMIB part of the SNMP tree, but found no useful counters there: No counter seemed to be able to provide a snapshot of current loose connectors.
Is there something I'm overlooking? Is there a way for us to discover when there is a loose connector in one of our Brocade switches?
Troels Arvin, Copenhagen
03-10-2011 09:02 AM
I think it would be difficult to do as no mechanism is in place which measures or triggers if a connector is not seated properly.
But if you really want this, you're best bet would be to monitor link -losses -failures etc., from the SW-MIB:swEndDevice tree in combination with errors (order discard bados portspeed etc) from the SW-MIB:swFCPort tree.
On other option is too set portspeed fixed to max speed the FCHBA supports. Your 4G port for example wouldn't step down in speed but would lose sync (which can be measured). I'm not sure if that works for 2G port as a lose connector from your example was happy at 2G.