04-16-2012 04:29 AM
We are using 2 Brocade 5470 switch for IBM Blade H chassis. One of the blade servers' HBA redundancy has been degraded suddenly. The only thing in the Brocade switchs logs is the warning message "Port 1 Faulted because of many Link Failures". I checked SAN side and WWN's are registered but not logged in for the second hba. As this server has only 1 Qlogic 8Gb CFFh expansion card i am not sure if it is HBA related because only one HBA exist on the server and in the ESX console i can see that vmhba1 is working fine but vmhba2 is losing connection.(This means i cant see LUNs from second switch but first switch)
I reset counters but still i get link failuıres. I rescan HBA for in vCenter but no chance for this server.
brocade8Gb_ALT:root> portshow 1
portHealth: No Fabric Watch License
portFlags: 0x24b03 PRESENT ACTIVE F_PORT G_PORT LOGICAL_ONLINE LOGIN NOELP LED ACCEPT FLOGI
POD Port: Port is licensed
portState: 1 Online
portPhys: 6 In_Sync
portScn: 1 Online
port generation number: 376
portWwn of device(s) connected:
LE domain: 0
FC Fastwrite: OFF
Interrupts: 0 Link_failure: 42 Frjt: 0
Unknown: 0 Loss_of_sync: 42 Fbsy: 0
Lli: 228 Loss_of_sig: 0
Proc_rqrd: 417 Protocol_err: 0
Timed_out: 0 Invalid_word: 146747
Rx_flushed: 0 Invalid_crc: 0
Tx_unavail: 0 Delim_err: 0
Free_buffer: 0 Address_err: 0
Overrun: 0 Lr_in: 19
Suspended: 0 Lr_out: 19
Parity_err: 0 Ols_in: 19
2_parity_err: 0 Ols_out: 19
How can i determine the root cause of this? Is the problem related to HBA adapter or Brocade switch or cabling. There is another host in the chassis and there is no problem with it. So i can say neither HBA nor switch is failed.
04-16-2012 06:03 AM
As port1 is an internal port which is hardwired to a serverbay, the only option to exclude (internal) wiring, is to move the blade to another bay
As for troubleshooting try a portdisable 1;portenable 1 first, to force a Login.
I recently experienced a DOA which didn't login properly.
The wwn showed in the portshow command, but a nodefind against that wwn showed is was unknown if the device was a target or initiator and a portloginshow revealed the wwn but no registration for SCR's etc.
Perhaps you are experiencing the same
04-30-2012 05:39 AM
Thanks for your recommendations. As per VMware, it seems it is an ESX bug or something like that. I am giving the KB link as if any other VMware users have the same problem.