04-09-2010 05:56 AM
Okay, I'm not an expert on Brocade Switches, nor pretend to be one. We typically setup or switches, add aliases and zones whenver needed. We do monitor them using NetApp DFM.
Anyway, I cut over to a second new switch yesterday for a short time before moving all of my connections back. What I saw on my unix hosts were path failures, run queues increasing, cpu wait io, abnormally busy disks. However, looking at the port statistics, there were tens of thousands of errors on most connections with several with hundreds of thousands of error during a short 45 minute time frame. In setting up this switch, I attempted to dump the current config of a 4100 which is on FOS 5.0.1d, and apply it to the 5100 which was on 6.1.0c but that did not appear to be supported. So, to save fat fingering I wrote a script to parse all the zones, aliases, and configs to recreate the overall zoning configurations. Note before cutover this switch was just upgraded to 6.2.1b due to a firmware bug in 6.1.0c with the power supplies. I know the script works because it was used for another 5100 switch that is on FOS 6.1.0c. Before the upgrade, I configured all of the aliases and zones, the upgrade occurred and now an attempted cut-over.
Anyone have any ideas?
04-09-2010 07:02 AM
A couple of thoughts - not sure how useful they'll be.
1. Are your errors on all ports on just one or two? Sometimes I don't get connectors to seat properly and have pull them out and reconnect them. You might also try cleaning the ends of the fiber connectors - that helps me when I have errors on individual ports. If you are taking errors on *all* your ports then you have something else going on.
2. I'm wondering if I've been doing it wrong all these years - when putting a new switch in my fabric (no port zoning here), I just clear out the config on the switch I'm about to put in, plug it into the fabric, and it pulls down all the zoning and configuration information from other switches in the fabric. Takes about 20 seconds.
To clear out the config on an old swtich (WARNING- ONLY DO THIS ON A SWITCH NOT IN USE - AND DISCONNECTED FROM OTHER SWITCHES AND HOSTS) I do:
* optionally do:
* switchname blah
* configure... domain <domain id> (if you want a certain number, but not required)
* licenseidshow, licenseadd... if you need to add licenses
* ipaddrset... if you need to set the IP address
* tstimezone US/Central (or whatever)
Switch is now clear - plug it into the fabric - it should pull down the zoning, configs, etc. and join up. Then you can plug other devices into it, add zones, etc.
Is that not the way to do this?
04-09-2010 07:29 AM
Thanks for your post.
We're taking errors on all ports. What's odd though too, is one of my Unix hosts is a solaris box, it's multipathed with two different switches. When I did this hardware swap, it seemed to drop it's luns from all paths. When moving connections, all connections were cleaned before plugging back in to the new switch.
your point number 2 sounds interesting, you must be using the Fabric manager software, which we do not have.
04-09-2010 07:53 AM
Pete - we do use fabric manager software, but the commands for cleaning out a switch are not related to that - they are just command line entries. No software required, anyone should be able to do them.
What worries me is your errors on all ports and that your multipath host dropped LUNs as well. Sounds like some sort of confilct perhaps in your switch configs. Same domain ID perhaps - not sure. Also - if you have even one small difference between the zoning databases, the switches won't join into a fabric - that's why I like to clear switches out before joining them to a working fabric: I don't have to make sure the zoning db's are the same, the switches do it for me, and they get it right. (Doesn't work or help if you are doing port based zoning.)
If you type "errdump" on your switches do you see any complaints about zone conflicts, duplicate domain ID's, etc...?
04-09-2010 08:33 AM
Just a note here at the beginning, there is no ISL link.
Okay. Thinking more about your comments, the two switches in question do have the same domain id. Other switches at other sites have unique domain id's. I'm looking back thru the FOS admin guide as I thought I read they could be the same in the same fabric without ISL's, however, I can not find it but still looking.
Looking at the domain configuration on the switches being replaced(which I did not setup originally), they are both the same(domain id of 1), so, I do not see how this would affect it but again I'm no expert. I'm going to move forward on the assumption there is some sort of zoning problem, unless, someone tells me I'm off my rocker with the domain id's. Probably will be researching the zoning first. Then will likely clear the switch of the configurations, then try to dump/load from the old switch again, maybe I missed a step when I was originally trying to do this. Then I'm going to change the domain id anyway so it is unique.
Any other thoughts?
04-09-2010 08:48 AM
The problem basically is I'm replacing two 4100's(both with domain id 1) with two 5100's(both with domain id 1), no ISL links. I replaced the first one without incident. The second one, errors on all connected ports occured at an alarming rate. Unix hosts were incurring heavy wait io and performance problems due to wait io, a multipathed solaris server(by the looks of the messages file) appeared lose all connectivity to it's luns, by multipathed I mean two fiber connections each one to a different switch. After reverting back to the old switch, port activity on the old switch was normal. I'm trying to figure out why I saw the above symptoms as it seems to be switch related.
04-09-2010 09:08 AM
--->>> Okay, I'm not an expert on Brocade Switches, nor pretend to be one.
--->>>The problem basically is I'm replacing two 4100's....
is very simple.
First at all, never use more than One switch in the same SAN / Fabric with same DID.
you have here at the moment 4 Switch all with DID 1, this is not a good idea, and you get a risk that the Fabric Crash.
Secondly, you need just to Upgrade the 4100 to any current Compatible FOS release...
*FOS 5.0.x on 4100 is not compatible with 6.1.x, a Minimum of FOS 5.1.x is required.
Thirdly, change in the both NEW 5100 the DID to 2 and 3, then connect the NEW 5100 via ISL to 4100, Zone and config will be Merge
*FOS = Fabric Operating Systems
--->>>...errors on all connected ports occured at an alarming rate.
collect please follow output from the switch which caused the error and post here,
"switchshow", "fabricshow", "nsshow", "errshow",
04-09-2010 09:11 AM
--->>>your point number 2 sounds interesting, you must be using the Fabric manager software, which we do not have.
DCFM Professional is for Free and bundled with the NEW Switches.
some feature from webtools by Old FOS 5.x, are begin with FOS 6.0 migrated to DCFM and not longer available in Webtools.