Campus Networks

Campus Network Infrastructure-Best Practices: High Availability Design with Brocade Hitless Stacking

by on ‎04-04-2013 07:26 AM - edited on ‎04-28-2015 03:44 PM by Community Manager (15,413 Views)

Synopsis: An overview of high-availability stacking features in Brocade FCX and Brocade ICX Series of switches including best practices when deploying stacking in Brocade’s HyperEdge Architecture for campus networks.

 

Contents

Preface

 

Overview

Brocade sets the industry standard for resiliency and high availability by enabling non-stop networking in data center and campus environments. The Brocade FCX and Brocade ICX Series switches support switch stacking with features that deliver continuous uptime for campus networks. Hitless stack failover is a high availability (HA) feature used in networks that must reduce downtime. It provides the following benefits:

  • Zero downtime with no Layer 2 / Layer 3 network service interruption in the event of an active stack controller failure.
  • Real-time Layer 2 / Layer 3  state synchronization between the active and standby controllers
  • Hitless insertion and removal of switches ensures replacement of failing units with no downtime
  • Zero-touch software and configuration synchronization between stack members simplifies management and administration
  • Distributed stacking allows switch members of a stack to be located in more than one wiring closet for deployment flexibility with resiliency and single point of configuration and management.

 

Purpose

This document discusses Brocade’s hitless stacking feature and best practices for achieving high-availability with stacking in the Brocade HyperEdge Architecture for campus networks.

 

 

Audience

Network architects and designers who want a better understanding of how to design and apply mixed stacking to wired and wireless campus networks.

 

 

Related Documents

The following documents are valuable resources for the designer. In addition, any Brocade release notes that have been published for the FastIron operating systems should be reviewed.

References

 

 

About Brocade

Brocade® (NASDAQ: BRCD) networking solutions help the world’s leading organizations transition smoothly to a world where applications and information reside anywhere. This vision is designed to deliver key business benefits such as unmatched simplicity, non-stop networking, application optimization, and investment protection.

Innovative Ethernet and storage networking solutions for data center, campus, and service provider networks help reduce complexity and cost while enabling virtualization and cloud computing to increase business agility.

To help ensure a complete solution, Brocade partners with world-class IT companies and provides comprehensive education, support, and professional services offerings. (www.brocade.com)

 

 

Key Contributors

The content in this guide was developed by the following key contributors.

  • Lead Designer: Simon Pollard, Campus Product Management

 

Document History

Date                  Version        Description

2013-04-08         1.0                 Initial Release

2013-05-17         1.1                 Corrected missing content

 

 

Understanding Stacking

Enterprise campus network wiring closets typically contain stacks of Ethernet switches. Stacking functionality enables the linking of small-form-factor switches through short proprietary copper cables connected to dedicated stacking ports or through copper or optical high speed Ethernet links. All Brocade switches utilise Ethernet as the stacking medium. The stack of switches then appears and behaves as a single logical switch, simplifying management and increasing resiliency. When a new switch joins the stack, it automatically inherits the operating software and configuration of the stack without requiring manual intervention.

Stacking switches provides equal value at the edge of data center networks and in campus networks. The main difference is that the switches are not physically stacked on top of each other. Instead, longer cables logically unify the switches at the top of each server rack. For example, a row of Top-of-Rack (ToR) switches can appear as a single logical switch, significantly reducing management overhead of the data center access layer.

Brocade stackable switches are linked together using either shared or dedicated stacking ports depending on model.

Switches can be connected together in a variety of stack topologies, the most common are the “daisy-chained ring” and the “braided ring” in which alternating switches are connected to each other. Brocade’s stacking technology also supports the use of 10 Gigabit Ethernet (GbE) XFP or SFP+ fiber optic ports which allows the switches participating in a stack to be situated farther apart from each other. The following member types make up a stack:

  • Active Controller. The stack member with the highest priority. It handles stack management.
  • Standby Controller. The stack member with the second-highest priority. It takes over active controller duties if the active controller fails.
  • Member. The device functioning as a stack member that is neither the active nor the standby controller. The device is eligible to become the standby or active controller if necessary.

The LEDs on the front of the switch make it easy to identify members of the stack. On FCX switches the LED is labeled AS and on ICX switches it is labeled MS. The stacking configuration is indicated as follows:

  • Green:  The device is the Active controller.
  • Amber: Indicates the device is the Standby controller.
  • Off:       The device is operating as a stack member, or is in standalone mode.

NOTE: If a Brocade switch is configured as a standalone unit, meaning the stacking protocol is disabled, it will not function as a member of a stack and will operate independently even if it is connected to other switches in a stack.

References

 

Example Topologies

Hitless_StackingTopologies.jpg 

   Common Stacking Topologies

For full details about stack topologies withrecommended cable layouts please refer to the hardware installation guide for each switch type.

 

 

Distributed Stacking for Deployment Flexibility

Because Brocade switches use Ethernet for the inter-switch stack connections the deployment options are greatly increased. If standard copper stacking cables are used then the inter-switch connections can be up to 5 meters long which is usually sufficient for locally distributed stacks such as in ToR applications. For broader distribution fiber-optic cables should be used and this allows a stack to be deployed across multiple physical locations such as the wiring closets of an office building. The table below shows the approved optics and stacking distance combinations.

Hitless_ConnectivityOptionsBySwitch.jpg

   Connectivity Options for Stacking with Brocade FCX and ICX Series Switches

By using stack connections to link distributed switches together rather than standard inter-switch links with Layer 2 STP or Layer 3 routing, several significant advantages can be realized;

  • Layer 2 simplicity
    Stack links do not need to be considered as part of the overall network topology so they can be used to provide resiliency without the need for Layer 3 routing to manage traffic flows.
  • No shut links
    The stack links are “internal” to the switches and as such are not seen as part of a Layer 2 network, therefore all links can remain open and can all be used to carry traffic simultaneously thus maximizing throughput.
  • Fast failover
    Due to the rapid detection and recovery techniques employed on stack links, failure of a link or a switch will result in hitless failover with no impact on user services.
  • Simplified management
    Even when physically distributed all the switches within a stack can be managed as a single entity enabling on- touch configuration changes via a single IP address.

 

Hitless Stacking

Hitless stacking is supported on Brocade FCX and ICX Series switches. It is a high-availability feature set that ensures sub-second or no loss of data traffic during the following events:

  • Active Controller failure or role change
  • Software failure
  • Addition or removal of units in a stack
  • Removal or disconnection of the stacking cable between the Active and Standby Controllers

During such events, the Standby Controller takes over the active role and the system continues to forward traffic seamlessly, as if no failure or topology change has occurred.

The following hitless stacking features are supported:

 

Hitless Failover

Hardware or software failures can take a device offline and potentially disrupt the entire network until the issue is resolved. Hitless failover reduces device downtime by utilizing active and standby controllers (switches) within a switch stack. When an active (master) controller fails unexpectedly, the standby controller automatically takes over and becomes the active controller. This failover process is “hitless,” meaning that it occurs with zero downtime and no interruption of L2/L3 network services. Furthermore, in the event a switch needs to be taken offline for maintenance or repair, this process can be performed manually via hitless switchover.

Hitless recovery is also triggered in the event of a stack link failure, for example if a stack cable was removed or accidentally damaged.

To understand how hitless failover occurs within a Brocade switch stack it is important to consider:

  • How the members of a stack relate to one another
  • How synchronization occurs
  • What needs to be synchronized between specific members of a stack
  • How hitless failover from one member to anther occurs

Hitless Switchover Process

Switchover between controllers is managed manually using the CLI or automatically without reloading the stack configuration and without any packet loss to the services and protocols within the stack. Switchover is a planned change of assignment of the Active and Standby controllers in a stack.

 

Hitless Failover Process

Failover is the automatic, or forced, switchover between the Active and Standby controllers. The failure or abnormal termination of the Active controller triggers hitless failover. In the event of a failover, the Active controller abruptly leaves the stack and the Standby controller immediately assumes the active role. Unlike a Switchover, a failover generally happens without warning and may cause sub-second packet loss as packets traversing the stacking link at the time of failure may be lost.

The following events are supported with hitless stacking:

  • Failover
  • Switchover
  • Priority change
  • Role change

 

Stack Synchronization

Ensuring that the Active and Standby controllers are synchronized is a critical component of hitless stacking. Synchronization is an integral part of Brocade’s stacking technology and is automated and transparent. For the Standby controller to take over immediately, the data and control planes must be synchronized with the Active controller. The Standby controller stores the necessary information for assuming control in its database, including spanning tree states, route information, Media Access Control (MAC) address tables, Virtual LANs (VLANs), etc.

When a stack is created and the stack member switches reboot, the Active controller assigns a Standby controller within 60 seconds. The Active controller configuration is then copied to the Standby controller through the baseline synchronization process which is completed within 70 seconds.

After the baseline synchronization is complete, the Standby controller is ready for hitless failover. The Active and Standby controllers remain synchronized in real-time through dynamic synchronization. As a result switch stacks operating with synchronized Active and Standby controllers are able to maintain system integrity when a hitless failover occurs.

The following processes ensure synchronization of Active and Standby controllers:

  • Baseline Synchronization. When a stack is created, a Standby controller comes back online after the initial reboot and the Active controller synchronizes its running configuration with the Standby controller. Individual applications synchronize the required database to the Standby controller.
  • Dynamic Synchronization. After the baseline synchronization occurs, the Standby controller receives updates in real time whenever any change occurs on the Active controller. Control plane packets are sent to the Standby controller as they are received by the Active controller.

Failover Process

After the controllers are synchronized, any failure of the Active controller triggers a dynamic failover to the Standby. Typical events that will trigger dynamic failover include:

  • The active CPU crashes due to a software or hardware failure
  • The Active controller is powered down
  • The stacking cable between the Active and Standby controller is disconnected

When a hitless failover event occurs, management control is transferred from the Active controller to the Standby controller with zero downtime and no Layer 2 / Layer 3 network service interruption.

In a Brocade switch stack, the stack priority number influences the role and status of each switch in the stack: active, standby, or member. If the priority number is equal, stack status is determined by the lowest Unit ID number. Hitless failover uses these determining factors when assigning and reassigning stack status during the failover process.

In the following example, hitless failover is active in a three switch stack when a failover event occurs.

  1. Unit 1, Priority 128 is the Active controller, and it fails.
  2. Unit 2, Priority 0 is the Standby controller, and it immediately assumes the role of the Active controller when Unit 1 fails.
  3. Thirty seconds after the active role was reassigned to Unit 2, the new Active controller assigns a new Standby controller, which is Unit 3, Priority 0. The new Active controller synchronizes its data to the new Standby controller for 70 seconds. After that process is completed, the new Standby controller is ready for hitless failover.

Note: In order to achieve hitless failover in a stack containing only two switches you must have the same stack priority set on both devices. If you want to assign the same priority to the Active and Standby Controllers, you must do so after the stack is formed. This prevents the intended Standby Controller from becoming the Active Controller during stack construction.

 

Persistent MAC Address

The Active controller uses its MAC address as the MAC address for the entire stack. This ensures the stack is recognized by other network elements as a single logical switch simplifying management and increasing resiliency. The stack MAC address is automatically generated and is the MAC address of the first port of the Active controller which ensures a consistent MAC address across stack reboots and prevents topology changes that would result from protocol enable, disable and configuration changes. The MAC address of the Active controller is the Bridge ID for Layer 2 protocols.

If the Active controller is disconnected from the rest of the stack, the MAC address of the stack changes based on the election of a new Active controller. The causes the forwarding database to be reset creating a topology change event and a minor network outage.

Even a minor outage can be significant for critical hosts and applications such as IP phones and VDI clients. An outage can be avoided by using the “Stack persistent-mac” command to configure the stack to continue using the MAC address of the original Active Controller. The administrator can then decide when an outage is acceptable and reset the forwarding database manually to eliminate any impact on end-users.

An alternative recommendation is use of the “stack mac” command to manually set a MAC address for the stack that continues to be used regardless of the switch currently selected as the Active controller. Stack management and switch membership changes never trigger a reset of the forwarding database with the associated outage caused by topology changes.

 

 

Configuring a Stack

Configuring a Brocade switch stack using Brocade’s stacking technology is a simple process.

  1. Ensure all switches designated to become members of the stack are running the same firmware version.
  2. Connect the switches with the appropriate stack cables to create the desired stacking topology.
  3. Once the switches are connected, use a console cable to enter configure terminal mode on the switch designated as the Active controller, and enter the command stack enable.
  4. Next, type exit, and then run the stack secure-setup script. It is critical that stacking is enabled on the Active controller before running the script.
  5. The CLI shows all switches in the stack and the stack topology and then asks if the information is correct.
  6. Accept the defaults and the stack automatically forms. No additional configuration is required. The lower priority switches, but not the Active controller, reboot and assume their new stack ID number. At that point the stack is fully functional.

Refer to the FastIron Configuration Guide for more details about stacking and available configuration options,.

References

 

Replacing a Failed Stack Controller

The replacement and failback process of a failed controller is simple and hitless. The replacement controller must have the same model number as the failed controller, and the device must be running a clean configuration on the same version of code that the stack is running. After the replacement controller is added to the stack and brought online, it re-joins the stack in place of the previously failed controller.

This automated process works in the following manner:

  • The new controller reboots to get its new configuration without impacting the stack.
  • After the new controller reboots with its new configuration, stack election occurs.
  • During an election, the stack looks at the configuration and priority number of each stack member and makes adjustments accordingly.
  • Initially, when the replacement controller is connected to the stack, it is assigned the standby role so that it can synchronize all of the runtime data, configurations, and protocol states in preparation for assuming the active controller role.
  • After the synchronization is complete, the stack performs an automatic switchover, which swaps the active and standby controller duties. In Brocade FastIron software release 7.2.0 or later is a hitless process.
  • Immediately after the switchover is complete, the replacement functions as the active controller, and the stack is still ready for a hitless failover if necessary.

 

Upgrading Switch Firmware

To function correctly, every switch in a stack must use the same version of FastIron software, this ensures that all features and functions are consistent across all devices. Within a Brocade switch stack the Auto Image Copy function is enabled by default and ensures that every stack member runs the same version of software. The master image is taken from the Active Controller and is automatically copied to any switch in the stack that is not loaded with the same version.

For maximum flexibility the Auto Image Copy can be disabled but if this is done any switch that is added to a stack which is loaded with different software to the Active Controller will not function and will automatically have all its ports disabled. To bring the switch into the stack its software must be updated manually.

The Auto Image Copy feature ensures that all units in a stack are running the same flash image following events such as the addition of a switch to a stack, the replacement of a failed device or a stack merge and Brocade recommends that it should be left in its default configuration.

Contributors