06-02-2016 02:23 AM
everybody knows about single initiator zoning. Brocade even recommends one initiator and one target only in a zone.
the reason for this is to eliminate potentially disruptive RSCN. in Brocade documentation they say only "have potential to disrupt storage traffic". But what kind of disruption can these RSCN make? Host will get scsi timeouts and thats it? or the data may be corrupted?
Solved! Go to Solution.
06-02-2016 04:22 AM
It was a long time ago, but I have experienced issues when having mulitple Initiators in the same zone. We also discovered
that the company had two different HBAs in the same zone and one HBA became very slow in some situations.
There is no use (nor support) in having mulitple Initiators in the same zone, but you can very well have multiple Targets
as long as they belong to the same kind of device (with the same firmware levels).
I have been using Single init - Multiple targets zoning for several years with different Storage array vendors and models
To my experience, RSCN is not a problem with newer SAN-switch fw and HBA drv versions. At least I have never seen corrupt data
due to RSCN.
06-02-2016 11:27 AM - edited 06-02-2016 02:16 PM
Switch sends out an RSCN when it needs to notify the attached devices about some fabric changes. Devices receive the RSCN and react accordingly. What this practically means:
Switch firmware has very low influence on this process, unless there's a FOS bug that sends out wrong RSCNs at wrong times and/or to the wrong recipients.
I think that Target ports are very tolerant to the RSCNs, if not indifferent at all.
Host HBAs - this is the place where things can go wrong. They have to rescan what they see, and also notify the upper layers about any changes. This might take time, so yes, it is rather common to see some I/O freeze during this time. And if something in the I/O stack (HBA firmware, HBA driver, MPIO driver, SCSI driver, Volume Manager, File System, etc...) isn't working as it should, it's also fairly possible to have some more serious issues than just I/O freezes.
06-02-2016 02:05 PM - edited 06-02-2016 02:06 PM
Theoretically there is an event type in the RSCN as well as the addresses that changed. While you can see the addresses, usually the event type is not set by the switches. So a host (that registered for RSCNs) goes to the nameserver and asks about its targets. It usually even ignores the addresses in the RSCN, it just asks for everything. That takes some time. As it's not sure about its targets (well it got an RSCN, didn't it?) it will suspend the traffic during that time.
Imagine there is a zone with ports from 100 different hosts + some storages. (Definitely not best practice...)
Host 1 is rebooted and drops from the fabric briefly. Hosts 2-100 registered for RSCNs and get them. Usually the event type is not set, so they don't know what exactly happened, but they *could* see the affected address. They *could* think "Hey, I know this address. I was told this address before by the nameserver but when I tried to PLOGI/PRLI it earlier it turned out that it wasn't a target, so I shouldn't care." But what really happens is that it goes to the nameserver and ask about *ALL* the devices it's zoned with. So one after each other the switch's CPU tells all the 99 remaining hosts about each other + about the storage ports. And many of them will even re-login into all the addresses it got from the nameserver.
Even with that kind of bad zoning we shouldn't see real RSCN storms like in the old days anymore where this situation led to recursive, self-multiplying nameserver spamming with more and more RSCNs eventually leading to aweful outages. But still you can have a mean impact due to all the processing and suspended traffic during relogins.
Peer zoning helps a lot with this. Hosts won't get RSCNs about other hosts anymore.