10-30-2012 05:53 AM
We have a number of DCXs in our data centres.
We have observed the er_bad_os counters on ports attached to IBM P7s, running at 8 gbps are incrementing at an astronomical rate.
We have changed the fill word value to 3 on these ports and the counter now rests at 0.
So what I would I would like to know is exactly what er_bad_os is. I know it is invalid ordered set - but I want to know what the meaning of this is, in terms of how it efects the node plugged into the switch and how it affects the switch.
10-30-2012 05:05 PM
That's the normal behavior. Most probably you set the fillword to 0 as a workaround in the past to get the P7 up and running on a FabricOS version that had no fillword mode 3. (Otherwise mode 3 would have been the natural choice anyway).
Mode 0 uses IDLE as a fillword as well as for the transition to active state (AC) during link initialization. As the P7 implements the latest version of the FC spec, it needs those IDLE to complete link initialization. But after the link is established, ARBff should be used as the fillword to maintain synchronization rather than the IDLE from 4Gb times (mode 0 - IDLE/IDLE was standard then). As you used mode 0 the switch used IDLE as a fillword and expected the P7 to do the same. But the P7 uses ARBff according to FC spec and the switch counts up the bad_os for the reason "Oh, I expect another fillword coming in!" It was not a problem, because any properly encoded ordered set should be used to keep the link synchronized anyway. (let aside you set up port fencing, but that's another story).
So with the change to mode 3 you basically asked the switch to follow the FC protocol and as it receives the same fillword from the P7 now that he uses (ARBff) everything is fine for the switch and no counter increases anymore.