Fibre Channel (SAN)

Reply
Contributor
Posts: 30
Registered: ‎08-15-2007

We get Latency issues whch then cause Oracle RAC to force its disks offline

We are using Fabric Watch and Bottleneckmon to highlight issues but this is all after the event. We can address hardware issues which cause the latency afterwards, but meanwile the SAN wide issue causes Oracle RAC to drop disks.  How can we prevent the impact in the first place?  We are running FOS 7.1.0c across the Fabrics.

Valued Contributor
Posts: 761
Registered: ‎06-11-2010

Re: We get Latency issues whch then cause Oracle RAC to force its disks offline

Hi,

 

you can configure port fencing to disable the F-port that misbehaves and start discarding frames on the switch port.

 

 

Rgds,

Felipon

Contributor
Posts: 30
Registered: ‎08-15-2007

Re: We get Latency issues whch then cause Oracle RAC to force its disks offline

Felipon,

 

Thanks for your response.  I am already tracking C3 discards - we seem to see isolated events within the same minute.  In a recent example we had 41 C3 discard errors from a port in a minute - fabricwatch timebase seems to only allow you to go as granular as one minute - I think by the time the port was fenced I suspect that the damage, (causing the latency event), would already have been done. We seem to then see no further errors on the port, (but we still get the link checked out).

I am not sure if MAPS in the next version will allow greater granularity than one minute.

Valued Contributor
Posts: 547
Registered: ‎03-20-2011

Re: We get Latency issues whch then cause Oracle RAC to force its disks offline

in order to avoid this, all your top-critical systems (some people call it "tier 0" - i believe that RAC is something like this) should be connected "locally" - i.e. same switch to the same switch, and maybe even to the same ASIC in case if you are using multi-ASIC switches.
for "tier 1" systems, you'd better use QoS "high" zoning. this will isolate them in the dedicated VCs and thus minimize buffer-to-buffer issues in the lower QoS zones
all the test/dev/tmp systems should be pushed down to QoS "low" zones. if they are doing some unexpected activity - that will be their own problem and will not affect prod environments.
Contributor
Posts: 30
Registered: ‎08-15-2007

Re: We get Latency issues whch then cause Oracle RAC to force its disks offline

Thanks Alexey,

 

 

Unfortunately we are as likely to see the latency caused by top tier servers – the issues we see are not caused by spikes in workload but by random link errors. As you suggest, isolating the important stuff to the same ASIC would be good, but unfortunately we are running our critical RAC clusters across two sites which means that we have to use a lot of shared infrastructure.

 

 

Regards

Tony

Join the Community

Get quick and easy access to valuable resource designed to help you manage your Brocade Network.