02-20-2010 01:35 PM
Hi, we recently updated all our SAN fabric switches from V5.2.2 to V5.3.2c. This all went well, however since then a number of Solaris10 hosts have been reporting SCSI timeouts (wiith sometimes 100percent wait states) coming from their remote mirrors.
These errors are happening across different clusters using different hardware (both SPARC and x86) and different HBAs.
They are also being seen across storage arrays from different vendors.
To make things simple I will give an example of just one of the clusters.
hardware: Sun Fire T200.
OS: Solaris 10, update 4.
HBA: QLogic Corp. Model: 375-3356-02, Firmware Version: 4.04.01,FCode/BIOS Version: BIOS: 1.24; fcode: 1.24; EFI: 1.8.
Both local and remote storage is one same type of hardware and RAID (IBM DS4800 FC disk RD5, we are also seeing the same errors on different cluster running off EMC AX4 SAS RAID10).
Fabric: 2xSilkworm48K in seperate sites connected via 4x2GB trunked ISLs.
The SUN cluster nodes each connect via two fibres to the 48Ks which both have storage arrays connected to them.
Through software mirroring, everytime the host does a WR it writes to both its local storage and storage at the remote site. The SUN sysadmins tell me what they are seeing are wait states from the remote storage site. This is getting so bad they the only way to cure it has been to stop the mirroring and operate only off the local storage, which leaves us in a vulnerable state.
I ran a SAN Health Check report to investigate ISL oversubscription, but all looks fine on the ISL ratios.
Bandiwdth usage on the ISL links is only max 50percent.
I downloaded Brocade Compatability Matrix (Dec2009) but the Qlogic HBA above is not even mentioned.
I have also been investigating back-end performance/configuration, however the fact that the same errors are being seen off different storage hardware tend to eliminate this as an issue, besides this only happened after we upgraded to V5.3.2c.
We found a SUN support document that describes exactly the errors we're getting ( http://sunsolve.sun.com/search/document.do?assetkey=1-26-102194-1 ). However, we have different switches and FOS.
Can anyone suggest some way forward please or other things we should be looking at.
02-21-2010 05:35 AM
--->>> I downloaded Brocade Compatability Matrix (Dec2009) ...but the Qlogic HBA above is not even mentioned.
The HBA your descripted here Model: 375-3356-02, is a Genuine QLogic QLE2462, and is descripted, supported and certified in both Brocade Compatibility Matrix.
02-21-2010 06:42 PM
I've seen in the workaround in the Sun document that they mention a "reboot" to be issued on the switch. I'm assuming you have done this and still are getting errors ?
02-24-2010 08:47 AM
ports reboot has now been completed on 3 different nodes from different clusters. Errors are still being seen. Support call has now been opened with Brocade direct support.
thankyou for your suggestions.