Fibre Channel (SAN)

Reply
New Contributor
Posts: 2
Registered: ‎01-20-2015

Switch Issues

[ Edited ]

Hi There,

 

We had a SAN connectivity issue recently where our production servers were not able to access the SAN storage. From the host logs, we can see the path is flapping in both fabrics during the incident but on the SAN switches the event log don't have anything. This was fixed by doing a "hafailover" on both fabrics and right afer the CP failover the original active CP became "Faulty" on one of the switches.

 

We're trying to find the root cause about the SAN connecitvity issue. 

 

I've collected supportsave info from the swithces ( a day after the failover).. found some messages which might be related to the CP HA state. I am not sure what these "iswha" messages are related to (believe these are HA messages) and these messages are gone after the hafailover was executed.

 

Need your help in identifying the actual root cause for failure.. Didn't get much help from Vendor Smiley Sad

 

Brocade Switch:

Model: DCX 8510
FOS 7.2.1a

 

At the time of issue, below messages appeared (repetitive) on Primary CP, :

sw0:iswha_recv_update_real:multi-update handler entered 16
sw0:iswha_recv_update_realSmiley Surprisedoo-rscn-update handler entered 4

 

After the manual failover to standby CP, above messages disappeared..

10:01:49.501 sw0:iswha_recover_cb:FSS stage(1003) entering.
10:01:49.501 sw0:iswha_recover_cb:sw 0 vf 0 base -1
10:01:49.501 sw0:iswha_set_state:sw 0 set HA state to 8
10:01:50.394 sw0:iswha_recover_cb:FSS stage(1) entering.
10:01:50.394 sw0:iswha_wb_recov_lsan_used_list:used list recovered 0 for 0
10:01:50.397 sw0:iswl_local_fabric_tag_set:setting bb fabric tag to 128
10:01:50.414 sw0:iswrp_local_bb_hlo_count_zero:setting fast bb hlo rate
10:01:50.414 sw0:iswrp_isw_bb_hlo_count_zero:domain 15 hlo count zero
10:01:50.414 sw0:iswrp_isw_bb_hlo_count_zero:setting fast bb hlo rate
10:01:50.414 sw0:iswha_recover_cbSmiley Tongueort recovery complete
10:01:50.522 sw0:iswha_recover_cb:FSS stage(1001) entering.
10:01:50.523 sw0:iswha_warm_recover:starting warm recovery
10:01:50.523 sw0:iswl_xlate_phantom_name_uninit:slot 0 is un-initialized
10:01:50.523 sw0:iswplat_xlate_wwn_info_retrieve:slot 0 does not need WWNs
10:01:50.523 sw0:iswl_xlate_phantom_name_uninit:slot 1 is un-initialized
10:01:50.523 sw0:iswplat_config_nwwn_index_get:sw 0 slot 1 index 0
10:01:50.523 sw0:iswplat_config_pwwn_index_get:sw 0 slot 1 index 0

 

########################################################################

 

18:17:34.381 sw0:iswha_recv_update_real:multi-update handler entered 16
18:17:34.381 sw0:iswha_recv_update_realSmiley Surprisedoo-rscn-update handler entered 4
18:17:39.381 sw0:iswha_recv_update_real:multi-update handler entered 16
18:17:39.381 sw0:iswha_recv_update_realSmiley Surprisedoo-rscn-update handler entered 4
18:17:44.381 sw0:iswha_recv_update_real:multi-update handler entered 16
18:17:44.381 sw0:iswha_recv_update_realSmiley Surprisedoo-rscn-update handler entered 4
18:17:49.501 sw0:iswha_recover_cb:FSS stage(1003) entering.
18:17:49.501 sw0:iswha_recover_cb:sw 0 vf 0 base -1
18:17:49.501 sw0:iswha_set_state:sw 0 set HA state to 8
18:17:50.394 sw0:iswha_recover_cb:FSS stage(1) entering.
18:17:50.394 sw0:iswha_wb_recov_lsan_used_list:used list recovered 0 for 0
18:17:50.397 sw0:iswl_local_fabric_tag_set:setting bb fabric tag to 128
18:17:50.414 sw0:iswrp_local_bb_hlo_count_zero:setting fast bb hlo rate
18:17:50.414 sw0:iswrp_isw_bb_hlo_count_zero:domain 15 hlo count zero
18:17:50.414 sw0:iswrp_isw_bb_hlo_count_zero:setting fast bb hlo rate
18:17:50.414 sw0:iswha_recover_cbSmiley Tongueort recovery complete
18:17:50.522 sw0:iswha_recover_cb:FSS stage(1001) entering.
18:17:50.523 sw0:iswha_warm_recover:starting warm recovery
18:17:50.523 sw0:iswl_xlate_phantom_name_uninit:slot 0 is un-initialized
18:17:50.523 sw0:iswplat_xlate_wwn_info_retrieve:slot 0 does not need WWNs
18:17:50.523 sw0:iswl_xlate_phantom_name_uninit:slot 1 is un-initialized
18:17:50.523 sw0:iswplat_config_nwwn_index_get:sw 0 slot 1 index 0
18:17:50.523 sw0:iswplat_config_pwwn_index_get:sw 0 slot 1 index 0
18:17:50.523 sw0:iswplat_xlate_wwn_info_retrieve:For model 77 slot 1 populate base wwn with 100000xxxxxxxxx range 32
18:17:50.523 sw0:fcr_set_pwwn_last_paramsSmiley Tonguewwn starting loc = 0x00000000
18:17:50.523 sw0:fcr_set_nwwn_last_params:nwwn starting loc = 0x00000000
18:17:50.523 sw0:iswl_xlate_phantom_name_init:slot 1 is initialized
18:17:50.523 sw0:iswl_xlate_phantom_name_uninit:slot 2 is un-initialized
18:17:50.523 sw0:iswplat_config_nwwn_index_get:sw 0 slot 2 index 0
18:17:50.523 sw0:iswplat_config_pwwn_index_get:sw 0 slot 2 index 0
18:17:50.523 sw0:iswplat_xlate_wwn_info_retrieve:For model 77 slot 2 populate base wwn with 100000xxxxxxxxx range 32
18:17:50.523 sw0:fcr_set_pwwn_last_paramsSmiley Tonguewwn starting loc = 0x00000000
18:17:50.523 sw0:fcr_set_nwwn_last_params:nwwn starting loc = 0x00000000
18:17:50.523 sw0:iswl_xlate_phantom_name_init:slot 2 is initialized
18:17:50.523 sw0:iswl_xlate_phantom_name_uninit:slot 3 is un-initialized
18:17:50.523 sw0:iswplat_config_nwwn_index_get:sw 0 slot 3 index 0
18:17:50.523 sw0:iswplat_config_pwwn_index_get:sw 0 slot 3 index 0
18:17:50.523 sw0:iswplat_xlate_wwn_info_retrieve:For model 77 slot 3 populate base wwn with 100000xxxxxxxxxx range 32
18:17:50.523 sw0:fcr_set_pwwn_last_paramsSmiley Tonguewwn starting loc = 0x00000000
18:17:50.523 sw0:fcr_set_nwwn_last_params:nwwn starting loc = 0x00000000
18:17:50.523 sw0:iswl_xlate_phantom_name_init:slot 3 is initialized
18:17:50.523 sw0:iswl_xlate_phantom_name_uninit:slot 4 is un-initialized
18:17:50.523 sw0:iswplat_xlate_wwn_info_retrieve:slot 4 does not need WWNs
18:17:50.523 sw0:iswl_xlate_phantom_name_uninit:slot 5 is un-initialized
18:17:50.523 sw0:iswplat_xlate_wwn_info_retrieve:slot 5 does not need WWNs
18:17:50.523 sw0:iswl_xlate_phantom_name_uninit:slot 6 is un-initialized
18:17:50.523 sw0:iswplat_config_nwwn_index_get:sw 0 slot 6 index 0
18:17:50.523 sw0:iswplat_config_pwwn_index_get:sw 0 slot 6 index 0
18:17:50.523 sw0:iswplat_xlate_wwn_info_retrieve:For model 77 slot 6 populate base wwn with 100000xxxxxxxxxx range 32
18:17:50.523 sw0:fcr_set_pwwn_last_paramsSmiley Tonguewwn starting loc = 0x00000000
18:17:50.523 sw0:fcr_set_nwwn_last_params:nwwn starting loc = 0x00000000
18:17:50.523 sw0:iswl_xlate_phantom_name_init:slot 6 is initialized
18:17:50.523 sw0:iswl_xlate_phantom_name_uninit:slot 7 is un-initialized
18:17:50.523 sw0:iswplat_config_nwwn_index_get:sw 0 slot 7 index 0
18:17:50.523 sw0:iswplat_config_pwwn_index_get:sw 0 slot 7 index 0
18:17:50.523 sw0:iswplat_xlate_wwn_info_retrieve:For model 77 slot 7 populate base wwn with 100000xxxxxxxxxxx range 32
18:17:50.523 sw0:fcr_set_pwwn_last_paramsSmiley Tonguewwn starting loc = 0x00000000
18:17:50.523 sw0:fcr_set_nwwn_last_params:nwwn starting loc = 0x00000000
18:17:50.523 sw0:iswl_xlate_phantom_name_init:slot 7 is initialized
18:17:50.523 sw0:iswl_xlate_phantom_name_uninit:slot 8 is un-initialized
18:17:50.523 sw0:iswplat_config_nwwn_index_get:sw 0 slot 8 index 0
18:17:50.523 sw0:iswplat_config_pwwn_index_get:sw 0 slot 8 index 0
18:17:50.523 sw0:iswplat_xlate_wwn_info_retrieve:For model 77 slot 8 populate base wwn with 100000xxxxxxxxxxx range 32
18:17:50.523 sw0:fcr_set_pwwn_last_paramsSmiley Tonguewwn starting loc = 0x00000000
18:17:50.523 sw0:fcr_set_nwwn_last_params:nwwn starting loc = 0x00000000
18:17:50.523 sw0:iswl_xlate_phantom_name_init:slot 8 is initialized
18:17:50.523 sw0:iswl_xlate_phantom_name_uninit:slot 9 is un-initialized
18:17:50.523 sw0:iswplat_xlate_wwn_info_retrieve:slot 9 does not need WWNs
18:17:50.523 sw0:iswl_xlate_phantom_name_uninit:slot 10 is un-initialized
18:17:50.523 sw0:iswplat_xlate_wwn_info_retrieve:slot 10 does not need WWNs
18:17:50.523 sw0:iswl_xlate_phantom_name_uninit:slot 11 is un-initialized
18:17:50.523 sw0:iswplat_xlate_wwn_info_retrieve:slot 11 does not need WWNs
18:17:50.524 sw0:iswl_xlate_phantom_name_uninit:slot 12 is un-initialized
18:17:50.524 sw0:iswplat_xlate_wwn_info_retrieve:slot 12 does not need WWNs
18:17:50.524 sw0:iswha_warm_recover:slot recovery complete after 0 msec
18:17:50.524 sw0:iswha_warm_recover:domain list recovery complete
18:17:51.455 sw0:iswha_warm_recover:timer recovery complete
18:17:51.455 sw0:iswha_wb_recov_fabs:entering cur ver 11 remote ver 11
18:17:51.455 sw0:iswha_warm_recover:fabrics recovery complete
18:17:51.455 sw0:iswrp_isw_bb_hlo_count_zero:domain 15 hlo count zero
18:17:51.455 sw0:iswrp_isw_bb_hlo_count_zero:setting fast bb hlo rate
18:17:51.455 sw0:iswha_warm_recover:warm recovery complete after 932 msec
18:17:51.455 sw0:isw_sendResumeList:sending Resume frames: start
18:17:51.455 sw0:isw_sendResumeList:done sending Resume frames
18:17:51.455 sw0:iswha_set_state:sw 0 set HA state to 1
18:17:51.455 sw0:iswProcessSCN:received base fabric unstable scn
18:17:51.455 sw0:iswProcessHAScn:Rcvd HA SCN subtype HA_EVENT_SYNC fss_remote_ver 11
18:17:51.455 sw0:iswProcessSCN:slot scn: slot 5 state 16 reason 53
18:17:51.455 sw0:iswProcessHAScn:Rcvd HA SCN subtype HA_EVENT_OUT_OF_SYNC
18:17:51.471 sw0:iswProcessSCNSmiley Tongueort scn subtype 0

 

Some more information from LOGS:

 

First FABRIC:
2014/12/07-18:19:51, [PLAT-1001], 71, SLOT 5 | CHASSIS, INFO, DCXFB1-Chassis, CP1 resetting other CP (double reset may occur).
Internet Systems Consortium DHCP Server 4.0.0
Copyright 2004-2007 Internet Systems Consortium.
All rights reserved.
For info, please visit http://www.isc.org/sw/dhcp/
Wrote 0 class decls to leases file.
Wrote 0 leases to leases file.
Listening on LPF/eth2/00:XX:XX:XX HA HTBT down, reset standby!
a:5b:49/backplane
Sending on LPF/eth2/00:XX:XX:XX:XX:XX/backplane
Sending on Socket/fallback/fallback-net
Dec 7 12:19:52 Heartbeat to 4d0004 Down!
Service instances out of sync
2014/12/07-18:19:52, [HAMK-1004], 72, SLOT 5 | CHASSIS, INFO, DCXFB1-Chassis, Resetting standby CP (double reset may occur).
2014/12/07-18:19:52, [FSSM-1003], 73, SLOT 5 | CHASSIS, WARNING, DCXFB1-Chassis, HA State out of sync.
ki_state_set: redundant ACTIVATE for instance 1
ki_state_set: redundant ACTIVATE for instance 1
Dec 7 12:19:59 All service instances on Active
2014/12/07-18:20:00, [EM-1033], 74, SLOT 5 | CHASSIS, ERROR, DCXFB1-Chassis, CP in Slot 4 set to faulty because CP ERROR asserted.
2014/12/07-18:20:00, [LOG-1000], 75, SLOT 5 | CHASSIS, INFO, DCXFB1-Chassis, Previous message repeated 1 time(s).
2014/12/07-18:20:10, [FV-1002], 76, SLOT 5 | FID 128, INFO, EB_SAN_FABRIC_A, Flow Vision Config Replay Completed Successfully.
2014/12/07-18:20:43, [FW-1424], 77, SLOT 5 | FID 128, WARNING, EB_SAN_FABRIC_A, Switch status changed from HEALTHY to DOWN.
2014/12/07-18:20:43, [FW-1433], 78, SLOT 5 | FID 128, WARNING, EB_SAN_FABRIC_A, Switch status change contributing factor CP: CP non-redundant (Slot4/CP0) faulty.
2014/12/07-18:21:10, [SEC-1203], 79, SLOT 5 | FID 128, INFO, EB_SAN_FABRIC_A, Login information: Login successful via TELNET/SSH/RSH. IP Addr: 10.10.10.211

 

Second FABRIC:
DCX8510-FB2-EB console login: Proxy: ip2ipc started (run=1, rc=0), debug=0
Proxy: ipc2ip started (rc=0)
2014/12/07-18:17:37, [HAM-1004], 379, SLOT 5 | CHASSIS, INFO, DCXFB2-Chassis, Processor rebooted - Hafailover.
Open heartbeat: 4d0004
Open heartbeat: 3f0105
Dec 7 18:17:43 Heartbeat to 4d0004 Up!

SNMP Research EMANATE/Lite Agent Version 16.2.0.9
Copyright 1989-2006 SNMP Research, Inc.
2014/12/07-18:17:52, [FV-1001], 380, SLOT 5 | CHASSIS, INFO, DCX8510-FB2-EB, Flow Vision daemon initialized.
All service instances in sync
2014/12/07-18:18:16, [FSSM-1002], 381, SLOT 5 | CHASSIS, INFO, DCXFB2-Chassis, HA State is in sync.

 

Thanks in advance!

 

Regards

SR

Regards
SR

Join the Community

Get quick and easy access to valuable resource designed to help you manage your Brocade Network.