12-28-2014 02:19 PM
We are using Fabric Watch and Bottleneckmon to highlight issues but this is all after the event. We can address hardware issues which cause the latency afterwards, but meanwile the SAN wide issue causes Oracle RAC to drop disks. How can we prevent the impact in the first place? We are running FOS 7.1.0c across the Fabrics.
12-29-2014 07:22 AM
you can configure port fencing to disable the F-port that misbehaves and start discarding frames on the switch port.
12-30-2014 02:52 AM
Thanks for your response. I am already tracking C3 discards - we seem to see isolated events within the same minute. In a recent example we had 41 C3 discard errors from a port in a minute - fabricwatch timebase seems to only allow you to go as granular as one minute - I think by the time the port was fenced I suspect that the damage, (causing the latency event), would already have been done. We seem to then see no further errors on the port, (but we still get the link checked out).
I am not sure if MAPS in the next version will allow greater granularity than one minute.
01-01-2015 02:14 PM
01-04-2015 07:24 AM
Unfortunately we are as likely to see the latency caused by top tier servers – the issues we see are not caused by spikes in workload but by random link errors. As you suggest, isolating the important stuff to the same ASIC would be good, but unfortunately we are running our critical RAC clusters across two sites which means that we have to use a lot of shared infrastructure.