12-18-2016 08:37 AM
our company has an old EVA storage which we cannot really turn off. It's not in use (all data has been migrated a while ago), but each and every time we tried to get rid of it, a good half of the devices connected to SAN lost connectivity to a different storage.
By now all zones containing the EVA aliases have been removed from the config in both fabrics, so nothing should be able to see the EVA. However, when I try to disable the switch ports going to the EVA, all servers behind the downstream switches lose connectivity to a new storage array, which is connected to the same switches as the EVA (master switches in both fabrics).The connectivity gets restored after servers reboot (this however requires to have the EVA ports re-enabled). The master switches (in both fabrics) are connected to the downstream switches over an ISL trunk (2x 8Gbit ports).
Would anybody here please have an idea what could cause this and where here to look?
Any ideas welcome, thanks in advance.
12-20-2016 05:58 AM
What is the FOS level on your switches?
The EVA storage port and WWPN do not have any entries in the active zoning configuration - to verify this,
you share the output from
portloginshow <EVA PORT>
portcamshow <EVA PORT>
nszonemember <EVA WWPN>
I would like to verify routing settings, please provide the following from your switch the EVA storage is connect:
12-30-2016 03:31 AM
Do you see some activitiy on ports of the EVA or on EVA itself?
01-04-2017 05:22 AM
The FOS version is v7.4.1a on the 80-port switches (these are the main switches where 99% servers are connected), the other switches you'll see in topology are on v6.4.0b (and waiting to be retired soon, last 2 servers hanging there).
The result of commands you asked for is in the attached files together with a simple topology diagram.
01-04-2017 05:27 AM
no, we don't see any activity on the EVA ports, only on the ISL trunk ports where we see C3 discards. And there's no activity on the EVA, it's half dead.
01-05-2017 03:47 AM
Concerning the management server for EVA. Have you placed the port disable too when you have tried to remove the EVA?
Can you do also:
portloginshow <Management server of EVA>
portcamshow <Management server of EVA>
nszonemember <Management server of EVA>
02-19-2017 05:35 AM
first of all thank you for your answers.
We finally got permission to try to remove the EVA storage again, so today we did the following:
1] reset error counters (portstatsclear)
2] shut down the EVA ports and the ports to the management server too (all at the same time)
Everything connected to the downstream switches lost connection to the new storage.
3] checked the error counters, saw c3 discards number growing on the ISL trunk ports on the upstream swtiches, where the old EVA, new storage and a whole bunch of servers are connected. No errors growing on the downstream switches, all zeros.
4] We've shut down all servers connected to the downstream switches.
5] Reset error counters.
6] Saw no error counters growing.
This looked promising. So we started powering up servers one by one to see which one could possibly cause it, but none of them was able to find the new storage. Also the error counters on the upstream switches started growing again. We started to run out of time.
7] re-enabled ports going to the EVA
8] everything came up normally, all servers were able to see their storage.
Attached are the settings from the port going to the EVA management server, but again: it only shows the management server and the EVA.
02-19-2017 06:43 AM
Just did an errdump and found this in both fabrics:
2017/02/19-05:50:50, [RTWR-1003], 3507, FID 128, INFO, GDC-FL4R11-SWF-FA, msd0: RTWR retry 8 to domain 2, iu_data 1000000.
2017/02/19-05:51:36, [RTWR-1003], 3508, FID 128, INFO, GDC-FL4R11-SWF-FA, msd0: RTWR retry 12 to domain 2, iu_data 1000000.
2017/02/19-05:52:23, [RTWR-1003], 3509, FID 128, INFO, GDC-FL4R11-SWF-FA, msd0: RTWR retry 16 to domain 2, iu_data 1000000.
2017/02/19-05:53:09, [RTWR-1003], 3510, FID 128, INFO, GDC-FL4R11-SWF-FA, msd0: RTWR retry 20 to domain 2, iu_data 1000000.
Domain 2 is the downstream switch. So I did a bit of reading on the forums and found this in a really old post:
In most case RTWR Error message stay indicated the Switch cannot or not get correct response from ISL parent Switch, due Port Incompatibility setting or as mentioned wrong settings in Switch Parameters.
So I wanted to see if the ports have same settings and got this (again same behavior in both fabrics):
GDC-FL4R11-SWF-FA:admin> portcfgshow 0
Area Number: 0
GDC-FL4R11-SWF-FA:admin> portcfgshow 1
Area Number: 1
GDC-FL3R06-SWF-FA:FID128:admin> portcfgshow 0
Area Number: 63
GDC-FL3R06-SWF-FA:FID128:admin> portcfgshow 1
Area Number: 61
Not sure this is related, but I've found in other articles that the area number can be fixed by portaddress --bind/unbind (http://community.brocade.com/t5/Fibre-Channel-SAN/What-is-Area-Number-in-portcfgshow/td-p/32297).
Now these switches have each of them time set up differently (already fixed on dowstream switches,but requires a reboot), both downstream switches are showing FID128 in the name (and all ports show FID128), while the upstream switches are throwing 'lscfg: requires VF to be enabled'. No idea who set it up like that and especially not why. At the moment I think the only correct settings in the switches are defined aliases and zones.
Any ideas welcome, thank you.
02-21-2017 02:39 AM
Does all your switches have VF enable?
How many switches are connected together in each fabric?
02-21-2017 02:56 AM
it's only the downstream switches (5300s) where I found VF enabled, the rest don't have it.
There are 4 switches in each fabric, two 5300s connected on an ISL trunk and two small ones (in blade enclosures, waiting to be thrown out) connected to the upstream 5300.