Fibre Channel (SAN)

Reply
Occasional Contributor
Posts: 5
Registered: ‎02-19-2013
Accepted Solution

Three site: A.DCX8510-8 B.DCX8510-8 C.5100 can LISL between A&B be connected via C for fault toleran

Multi-location long distance fabrics

Setup
Three data centres
2 primary sites (A & B) each with an IBM SAN768B-2 (DCX 8510-8)
1 secondary site ( C ) with an IBM SAN40B-4 (5100)
DWDM connectivity between all sites

LISL configured between A & B (AtoB=74km)

Question
Can the LISL also be connected via site C (AtoC=60km BtoC=13km) to provide a diverse path for fault tolerance?

Valued Contributor
Posts: 521
Registered: ‎03-20-2011

Re: Three site: A.DCX8510-8 B.DCX8510-8 C.5100 can LISL between A&B be connected via C for fault toleran

Not sure why you call it LISL... But yes, I would definitely connect all three sites together. If A-B will fail, traffic will start going A-C-B. Moreover, if all links are good and stable, and given the fact that 5100 is a good single ASIC switch, you could decrease the cost of A-C and A-B links (or increase cost of A-B) so that A-B = A-C + C-B - this way your traffic will use both paths at once!

Occasional Contributor
Posts: 5
Registered: ‎02-19-2013

Re: Three site: A.DCX8510-8 B.DCX8510-8 C.5100 can LISL between A&B be connected via C for fault toleran

Hi Alexey, thank you for the response.

 

I think I over simplified the solution in my original post.

Currently the base switches in the two DCX switches form an XISL link. Virtual switches at both sites are enabled for XISL, and Logical ISL (LISL) connections exist between virtual switches with the same fid at each site.

I have attached a drawing to try to illustrate the solution I am trying to achieve.

If we created a base switch on the 5100 and connected it to the base switches at A & B would the path still fail over from AviaCtoB if AtoB failed?

 

 

 

Valued Contributor
Posts: 521
Registered: ‎03-20-2011

Re: Three site: A.DCX8510-8 B.DCX8510-8 C.5100 can LISL between A&B be connected via C for fault toleran

Andy, that's a good clarification, and the picture looks different now. However, the answer is still the same. Yes, all the base FID traffic (including XISL and LISL) will reroute to A-C-B if the link A-B will fail. Moreover, if your C site is just a transit point and doesn't have any non-base FID devices, then you don't have to partition that switch at all. You can just leave it VF-disabled and it will happily join the base FID99 and keep running all the required traffic between the sites. However, configuring it as VF-enabled is a good idea, because conversion from non-VF to VF requires reboot.
Occasional Contributor
Posts: 5
Registered: ‎02-19-2013

Re: Three site: A.DCX8510-8 B.DCX8510-8 C.5100 can LISL between A&B be connected via C for fault toleran

Thanks Alexey, that is looking promising. The final piece of our puzzle is location of resources.

Resources at A&B need to communicate with one another and additionally resources at A&B each need to be able to access a resource at C. (to support a distributed cluster environment with Quorum at site C)

Assumption:

Based on what you have told me so far, due to link cost A&B resources will communicate directly with one another and for the same reason A will directly access C as will B.

Questions:

In the event of the loss of the line between A&C access to the resource at C from A will automatically route via B?

Would the selection of the alternate path be instantaneous?

 

I have updated my drawing (attached) to hopefully illustrate clearly.

 

Valued Contributor
Posts: 521
Registered: ‎03-20-2011

Re: Three site: A.DCX8510-8 B.DCX8510-8 C.5100 can LISL between A&B be connected via C for fault toleran

Andy, that's not difficult. Every time when number of E-Ports changes in the fabric (i.e. when ISL is going online or offline) FSPF recalculates the routes between domains. So when ANY of the links in your triangle disappears, FSPF will figure out that the traffic for the now-missing link should be rerouted over two hop connection via the third location. But when the lost link recovers, FSPF immediately brings back all the single hop routes.

So, in short - yes, it will reroute, and yes, it will be almost instantaneous. Almost - because it will still require fabric rebuild. Also bear in mind that the number of frames - those that are currently "in flight" - will be lost. This will trigger some recovery on the upper layer, most likely - SCSI timeouts and retries. But that is unavoidable in the long distance implementations.

I'm just curious what kind of cluster with quorum are you deploying?
Occasional Contributor
Posts: 5
Registered: ‎02-19-2013

Re: Three site: A.DCX8510-8 B.DCX8510-8 C.5100 can LISL between A&B be connected via C for fault toleran

This is an IBM SVC Stretched cluster implementation supporting VMWare Metro storage cluster.

 

In the event of a path loss in the above environment we would need to be certain that the fabric rebuild will happen quickly. A fabric rebuild suggests that all fabrics will be affected and that access from a node to the quorum as well as node to node will be disrupted for a short while? How long might this process take?

Any significant delay in the rebuild would mean cluster nodes not having access to each other and loss of access to the third site for quorum, they would offline to protect against split brain and all hosts would be disconnected (the complete opposite of the requirement for the HA VMWare solution) until links once again re-established.

Valued Contributor
Posts: 521
Registered: ‎03-20-2011

Re: Three site: A.DCX8510-8 B.DCX8510-8 C.5100 can LISL between A&B be connected via C for fault toleran

My experience shows that fabric rebuild takes up to 10 seconds, usually 2-3 seconds for a small three-domain fabric like yours. You are right about the impact to all the devices, including those that do not have any access to the remote sites. In a config like yours, I'd not use any stretched FIDs with XISLs. I'd rather create an FCR backbone in the base FID and leave all other FIDs unique and separate. All the devices accessing the remote sites will be put in LSAN zones. Should there be any turbulence in the long distance links, the rebuild will only happen in the FCR backbone. All the affected LSAN zoned devices will receive RSCNs about their partner devices going offline, but the edge fabrics will stay stable, so at least all the local traffic will keep running without any trouble.
Occasional Contributor
Posts: 5
Registered: ‎02-19-2013

Re: Three site: A.DCX8510-8 B.DCX8510-8 C.5100 can LISL between A&B be connected via C for fault toleran

Thanks Alexey, plenty foor me to work with.

Join the Community

Get quick and easy access to valuable resource designed to help you manage your Brocade Network.