Networks are the most sophisticated and most complex technology of today’s data centers, at the same time they are hard and messy to operate. The key reason for this imbalance in sophistication lies in the way network industry has evolved. Networking industry has been very good in developing protocol standards that helped create interoperable network devices to build complex networks that scale. When it came to managing these complex networks, for a very long time, the interface has been human oriented “CLI”. As the complexity increased, as many parts of the network required automation, the networking industry’s answer has been yet another protocol, NETCONF has become a default programmable interface to today’s network devices. While NETCONF helps establish a connection with variety of network devices and provides some standard actions, it doesn’t provide or mandate a standard data model. Protocols like NETCONF provide programmability at a device level, however, programmability is not the same as automation.
As there are no network automation frameworks, current network automation practice can be broadly categorized into the following two scenarios:
Use the vendor supplied management product to manage some features or device types that can constitute a slice of overall data center network. These products tend to be clunky and rarely fit well with the customer’s workflows. In addition, customers typically need several of these management products for each function (example, Configuration, Monitoring) and customers need to develop their own integration layer on top of these management products. In this approach network automation is a huge software integration problem.
More often, customers have been dealing with the automation by directly interfacing with network devices on their own using whatever tools and scripting languages they are comfortable with and developing automation scripts from the grounds up. This ad hoc development doesn’t enable reuse across the use cases or across the organizations.
Both the approaches above are inefficient, expensive and the key reasons for network being the least agile of the data center infrastructure. Network automation can’t be addressed with “protocols” alone! The missing piece in network automation puzzle is a software framework that addresses the inefficiencies in today’s model with the following characteristics:
Enable Reuse – building monolithic automation scripts does not enable reuse and serves only specific scenarios. Automation framework must enable building units of automation that can be reused for many scenarios and across many organizations. These building blocks are developed and validated against specific device versions.
Flexible and customizable – automation workflows can vary widely for each customer.
Easy to integrate with the customer’s ecosystem – customers have already deployed many tools and applications to automate some parts or functions of the network. For example, there are many event and log collector products in use today, and an automation framework must be able to integrate with these existing tools out of the box.
Programming language and tool agnostic – variety of programming/scripting languages are in use today as well as many popular automation tools such as Chef, Puppet and Ansible and there will be new and better ones in the future.
Open Source - for wider adoption, collaboration and innovation.
StackStorm an open source platform, developed by data center automation experts, is ideally suited for addressing the missing piece in network automation. StackStorm platform, which breaks down the automation problem into wiring of Actions (reusable building blocks) using flexible Workflows in response to events from “Sensors” that connect to the ecosystem. By combining “DevOps and ChatOps” tools and culture, network automation can achieve parity in automation with the rest of the cloud domains as explained below. This framework, leverages the existing investments in automation and does not require forklift.
Some of the key concepts behind StackStorm are explained below:
Currently network automation for most customers is based on in-house developed ad-hoc, monolithic scripts that are not hard to customize, maintain and not reusable across multiple organizations. StackStorm’s Events, Actions & Workflow model enables the development of reusable building blocks for network automation. Actions are units of automation that can be configuration of a device, collection of diagnostic data or auto remediation of an issue. Key benefits of this approach include:
StackStorm’s Actions can be shared across the organizations in an open source community. As many Actions built by the vendors and the community become available, customers can automate their network by simply wiring the Actions into their specific workflows.
Customers can also build their own Actions if needed and Actions can be written in any programming language, or customers can continue to use any of the key emerging automation technologies such as Ansible, Chef or Puppet that are currently in use. With this approach customers can leverage their existing investments and expertise without getting tied to any specific technology allowing them to evolve with the new and emerging automation technologies.
Building network configuration scripts can be quite complex, as network automation may require multiple configuration steps that need to be executed in a particular order over multiple devices (device types) in a network. Significant part of the complexity comes in “scaffolding code” that takes care of coordination of multiple tasks and handling error conditions. Modern open source workflow infrastructure such as Mistral used in StackStorm provides much of this “scaffolding” for free and enables the user to focus on “actual tasks” or wiring the tasks that are readily available. Brocade enterprise version will include Actions for network domain in addition to hundreds of Actions that are currently available in open source version of StackStorm.
Figure 1 Creating Workflows in StackStorm
StackStorm enables users to describe network automation as a set of actions and relations using a graphical UI or using YAML markup language directly. Using this description, Mistral takes care of state management, correct execution order, parallelism, and synchronization and high availability. Mistral also provides flexible task scheduling so that we can run a process according to a specified schedule (i.e. every Sunday at 4.00pm) instead of running it immediately.
An execution of a these workflows can be rerun on error from the beginning or from the task(s) that failed. The latter is useful for long running workflows with temporary service or network outages.
DevOps & ChatOps
Network reliability comes from combining network design, automation and operational practices. DevOps-style tools and methodologies for cross-domain collaboration have become prevalent in the application and compute space. The same tools have the potential for network automation speeding up the troubleshooting scenarios. StackStorm exposes automation workflows as commands that can be executed in a team chat room reducing the feedback loop.
In addition, StackStorm has many other features such as Rules, Policies and Security that will be covered in future blogs.
Last decade networking industry slowly migrated from the “CLI” to programmability, programmability alone is not sufficient and automation is the next frontier. Automation is about eliminating manual repetitive steps reliably in any customer scenario. StackStorm as a framework has the potential to eliminate the inefficiencies in current network automation model and bring the network automation in parity with the rest of the data center infrastructure automation.