As data center networks scale to support thousands of servers running a variety of different services, a new network architecture using the Border Gateway Protocol (BGP) as a data center routing protocol is gaining popularity among cloud service providers. BGP has traditionally been thought of as only usable as the protocol for large-scale Internet routing, but it can also be used as an IGP between data center network layers. The concept is pretty simple and has a number of advantages over using an IGP such as OSPF or IS-IS.
Large-Scale BGP is Simpler than Large-Scale IGP
While BGP in itself may take some heavy learning to fully grok, BGP as a data center IGP uses basic BGP functionality without the complexity of full-scale Internet routing and traffic engineering. BGP is especially suited for building really big hierarchical autonomous networks, such as the Internet. So, introducing hierarchy with EBGP and private ASNs into data center aggregation and access layers down to the top of rack behaves just like you would expect. We’re not talking about carrying full Internet routes down to the top of rack here, just IGP-scale routes, so even lightweight BGP implementations that run on 1RU top of rack routers will just work fine in this application.
The hierarchy and aggregation abilities of an IGP are certainly quite extensive, but each different OSPF area type, for example, introduces different behaviors between routers, areas and how different LSA types are propagated. There’s a lot of complexity to consider when designing large-scale IGP hierarchy, and a lot of information that is flooded and computed when the topology changes. The other advantages of BGP are the traffic engineering and troubleshooting abilities. With BGP you know exactly what prefix is sent and received to each peer, what path attributes are sent and received, and you even have the ability to modify path attributes. Using AS paths you can tell precisely where the prefix originated and how it propagated, which can be invaluable in troubleshooting routing problems.
How it Works
What you basically do is divide the network into modular building blocks made up of top of rack access routers, aggregation routers, and data center core routers. Each component uses its own private ASN, with EBGP peering between blocks to distribute routing information. The top of rack component doesn’t necessarily need to be a single rack; it could certainly be a set of racks and a BGP router.
Petr Lapukhov of Microsoft gave a great overview of the concept at a NANOG conference recently in a presentation called “Building Scalable Data Centers: BGP is the Better IGP”, which goes into a lot more background on their design goals and implementation details. If you’d like to experiment with the network design as Petr describes, the commands for the BGP features on slide 23 for the Brocade NetIron software are:
An alternative that takes the design even further from top of rack down into the virtual server layer for high-density multitenant applications is to also use the Brocade Vyatta vRouter. In this design, EBGP would be run from the data center core at each layer to a virtual server that routes for a set of servers in the rack. This addition gives customers a lot of flexibility in controlling their own routing, for example, if they wanted to announce their own IP address blocks to their hosting provider as part of their public cloud. Customers could also use some of the other vRouter VPN and firewall features to control access into their private cloud.
In addition to using BGP to manage routing information, you can also build an OpenFlow overlay to add application-level PBR to the network. Using the Brocade hybrid port features that enables routers to forward using both OpenFlow rules and Layer 3 routing on the same port, introducing SDN into this network as an overlay is easy. In fact, this is exactly what Internet2 is doing in production on their AL2S (Advanced Layer 2 Services) network to enable dynamically provisioned Layer 2 circuits.
So is BGP better as a data center IGP? I think the design lends itself especially well to building modular data center networks with independent and autonomous modular components that can be built all the way down to the virtual server level. Perhaps you even have different organizations running their own pieces of the network, or servers that you’d rather not invite into your OSPF or IS-IS IGP.
For more information on Brocade’s high density 10 GbE, 40 GbE and 100 GbE routing solutions, please visit the Brocade MLX Series product page.