03-04-2010 07:14 AM
I have two Brocade 3850 switches (FOS - 5.3.2a), which have a lot of "Link Failure", "Loss of Sync" and "Loss of Signal" errors on their ports.
Ports no. 2 and 8 on switch no. 1 and ports no. 4 and 8 on switch no. 2 are especially "dirty" (look at the "porterrshow" output attached below).
What these errors mean and how to eliminate them? It looks like that SFP modules and FC cables are not the culprits of these errors (no single "enc out" and "crc err" error on any port).
WWN-based point-to-point zoning was implemented on both switches.
03-04-2010 08:41 AM
1. Is the address error on the E port increasing?? Its not something you should ignore if the port is a E port. But I would only worry if its increasing steadily. And I may be wrong sometimes but thats my opinion.
2. For the loss of sync errors. I would run fabriclog -s and see if there are ports flapping (offline/online) in the past few days. If yes I would check the hosts on port 2 and 8 to see if there are errors there. Loss of sync and signal mean there is something wrong betn the 2 points ie from HBA to switch port.
This could be HBA, driver, patch panels, bent cable, OR even a SFP. FC is tricky, you cannot easily rule out anything. But since you see errors on multiple ports, I too would rule out the cable and SFP.
3 imp things that come to my mind
>> fabriclog -s
>> host errors?? eventvwr or errpt fcs error or solaris messages??
>> Are the errors increasing?? Do a statsclear and then observe.
03-04-2010 06:51 PM
I've just had some fun & games with some Qlogic cards under windows and the storport driver.
Replace SFP's and Cabels and was still getting link errors , ended up being a number of MS hotfixes to be applied.
Just to throw that into the picture for you
03-05-2010 12:21 AM
I had the same issue with Solaris SPARC machines with HBA's fixed on 2 Gbps and swithports fixed on 4 Gbps.
I got 'In_sync' and 'No_sync' for the port status and an enormous amount of link fail, loss of sync and loss of signal errors too.
Some HBA's were even fixed on loop at this CU site!!
Don't know if this helps, but it's worth to check probably....
03-05-2010 02:19 AM
Hi to all,
these 4 ports are connected to one IBM p570 server with 4 Emulex 4 Gbps HBAs. OS - AIX 6.1 (with LPAR enabled).
HBAs' and switches' ports are set to "auto-negotiate". AIX reports some FC-related errors only during server reboot.
After "statsclear" errors' character and rate has not changed (see PortErrShow_Fresh.txt attached below).
Also I attach output from "fabriclog -s".
03-05-2010 09:18 AM
1. From fabriclog -> Ports 14 and 15 on both switches flapped on 4th March 2010, but this could be something you scheduled or did manually. If not then you should probe further.
2. I have worked in 7 PB+ AIX environments, my experience is that if there are major problems with the fabric, you should see fcs errors in errpt.
You should also check for temporary disk errors if any. If there are frequent temp disk errors, then these can affect performance and sometimes attributed by flapping ports in fabric.
03-08-2010 04:47 AM
BTW, port no. 3 on switch H16_1 flapped several times last Saturday (see the attached log below).
Maybe these port flappings are the cause of "link fail", "loss sync" and "loss sig" errors? There are more earlier entries indicating other port flapping in this log.
03-08-2010 08:30 AM
do a portstatsclear on all the ports, and then see if the errors increasing. by proststatsshow port no. Do it again and again.Also see porterrshow time and again.If the same is increasing, then you have to look into many. First if encout is increasing, then change cable, change SW port and if then still issue, it should be from HBA side. check the logs in server for HBAs.