Data Center

VCS Principal Switch: First Among Equals

by lcaywood on ‎08-20-2013 12:48 PM (677 Views)

Lately I’ve found myself explaining a minor paradox fairly frequently, so I thought I’d capture it here for easy bookmarking.

Brocade VCS fabrics were designed with a distributed control plane as well as a logically centralized management construct. The former means all nodes are aware of each other and share information about their health and state, which enables a relatively high degree of autonomous operation of the fabric as a whole. On the other hand, eliminating per-device management is clearly advantageous in terms of streamlining both deployment and troubleshooting. The VCS control plane facilitates rapid and consistent policy distribution across the fabric.

Now for the paradox: VCS fabrics are designed to be masterless. This helps ensure resilience in the event of a node failure. Yet the simplicity of centralized management depends on there being a single point from which policy is defined and distributed. Some approaches achieve this via a separate controller and management network, which may present resilience concerns. In VCS Logical Chassis mode, a “principal switch” is assigned by the administrator, and the designation can be reassigned to a different node at any time.

However, this is not Darwinian, Roman Triumvirate-style primus inter pares; rather, the election process more closely resembles a Witenagemot, with a generally understood succession plan being ratified and implemented at need by peer nodes. Here’s how it works: the administrator decides the principal switch should have certain characteristics, for example, hardware HA, large scale, etc. (In fabrics containing VDX 8770s, those devices would be preferred as principal switch candidates. Fabrics with leaf-spine topologies would generally designate a spine switch.) The administrator then assigns the principal switch as well as priority of backup principals based on specific policy parameters. In the event of a principal switch failure, management automatically fails over to the designated successor switch to avoid disruption.

However, the “next of kin” succession can be altered, either by quickly moving down the line of succession if multiple nodes are affected, or by direct intervention by the administrator, for example because upgrades or policy changes affect the preferred type of switch. This flexibility to alter the fabric management scheme at need, generally but not strictly within the confines of a clear, predefined succession process, ensures that fabrics can be tuned and optimized organically without the massive disruption of a 1066-type of event.

Hastings tapestry.jpg

There are a number of other interesting aspects of the VCS Logical Chassis construct I haven't touched on here. Please take a look at the Logical Chassis whitepaper, which also goes over the details of zero-touch discovery, simplified firmware updates and other useful features.