Design & Build

Best Practice: Configuring BGP Route Reflection

by brcd-sp.expert on ‎01-25-2012 10:30 PM - edited on ‎05-12-2014 01:53 PM by (6,959 Views)

BRCD-ENTERPRISE 2477

 

 

Introduction

This best practice document is focused on the configuration aspects of deploying redundant BGP route reflectors (RRs) in service provider networks using NetIron OS. It will cover the following topics:
 
  • Overview of Route Reflection (RR)
    • Full-mesh BGP topologies
    • Description of RR
    • Why RRs are commonly deployed (scale and full-mesh configuration)
    • How routing loops are avoided with RRs
  • Best Practice for deploying and configuring redundant route reflection
    • CLUSTER_ID configuration
 

 

This best practice document is Service Provider focused, as most enterprises do not need to use BGP inside their enterprise network. The focus is on the NetIron series of products; such as the MLX and CER service provider products. The CES product may be applicable, but due to its limited scale in IP (as compared to CER), it’s less applicable and most likely would not be deployed as a BGP RR. The CER-RT product will be very applicable due to its increased scale in IP routing. The CER-RT will be very well positioned to function as a BGP RR.

 

Before You Begin

Prerequisites

It is assumed that the reader has an understanding of IP routing and BGP operation and configuration. 

 

 

Topic of Discussion

 

BGP Route Reflection (RFC 4456)

Overview of BGP Full-Mesh Topologies

Border Gateway Protocol (BGP) v4 is massively deployed in the Internet and has been so for many, many years. All IP service provider and carrier networks deploy BGP inside and outside their networks. When BGP is used outside an SP network, it is external-BGP (eBGP), and when it is used inside an SP network, it is internal-BGP (iBGP). This BCP is focused on iBGP topologies; and more specifically, when an iBGP topology utilizes route reflection (RR).

 

An iBGP full-mesh is the standard BGP design and topology. A full-mesh is required to ensure all externally received routes are propagated to all the internal BGP speakers inside an Autonomous System (AS). The common best practice for iBGP peering is to use the routers’ primary loopback IP address as the BGP peering endpoint. This provides additional resiliency for iBGP peering in case of an interface or link failure. For eBGP, the common best practice is to use the external interface IP address as the BGP peering endpoint. If that interface or link fails, the eBGP peering session will go down, which is the expected result. There are alternatives to this, but they are beyond the scope of this document.

 

 

Figure 1: iBGP Full-Mesh Topology
 
Figure1_2477.jpg

 

To avoid BGP routing loops, a BGP router will not re-advertise an internally learned route to another internal BGP router. For instance, if an iBGP router receives a route from an iBGP peer, it cannot re-advertise that route to other iBGP peers. Doing so would create a BGP routing loop. Since re-advertisement of these routes is prohibited, a BGP router that receives external routing information (Network Layer Reachability Information, or NLRI) must directly advertise the NLRI to every other iBGP router in the AS. This is shown in Figure 2 below.

 

Figure 2: iBGP Advertisements

 

Figure 2_2477.jpg

 

For example, if R2 were to re-advertise 10.1/16 to R4, and R4 were to re-advertise to R3, then R3 would also re-advertise 10.1/16 back to R1. The standard BGP loop prevention mechanism of AS_PATH is not applicable in iBGP topologies, as all the routers are in the same AS (Autonomous System). So, the result of doing this would create a routing loop.

 

For proper loop prevention, the rules of BGP results in a full-mesh topology being required. If there is not a full-mesh, then not every iBGP router will receive all the NLRI. This will result in black-hole forwarding.

 

Description of BGP Route Reflection

The result of the required full-mesh topology can be a serious scaling concern in large IP networks. The full-mesh topology results in the common “N-squared” problem. Every iBGP router needs to be configured to peer and exchange routes with every other iBGP router in the AS. This is not only configuration intensive, but the large numbers of BGP peerings (BGP establishes TCP sessions, using TCP Port 179) also results in large numbers of TCP sessions in the network. This adds additional control plane load on each router, and further increases network operational complexity in maintaining and troubleshooting the large numbers of BGP peerings. Imagine a network with 100 iBGP routers; this would result in each router having 99 BGP peering sessions for a total of 4950 BGP sessions (N*(N-1)/2) in the AS!

 

BGP Route Reflection (RFC  4456) basically “relaxes” the rule, described above, that prevents routing loops in BGP topologies. In other words, an iBGP router can re-advertise NLRI that it receives from another iBGP router. This iBGP router is called a route reflector (RR); which is an excellent description of its operation.  A RR reflects NLRI learned from one iBGP router to other iBGP routers. This is shown in the diagram below.

 

Figure 3: Basic Route Reflection

 

figure3_2477a.jpg

 

As shown above, one or more RRs is deployed in the AS (RR1 is the route reflector in this Figure). R5 in this topology is called a “RR client”.  R5, the RR client, iBGP peers only with the RR(s); it does not peer to other iBGP routers. R5 receives external routing information from R10, its eBGP peer, and advertises this to its RR1. RR1 receives this routing information, performs its standard BGP loop detection and best-path decision process, and re-advertises its best paths to all of its iBGP neighbors and to all of its other clients (if it has other RR clients). The resulting iBGP full-mesh is now drastically reduced; while still maintain proper routing information in all of the iBGP routers in the AS. Otherwise, R5 would need to be part of the iBGP full-mesh topology.

 

The resulting BGP topology is now more manageable from an operations, configuration and routing state (number of peers and advertised routes) perspective. Full mesh iBGP peering is no longer required and the routing table (RIB) in the client router is reduced, often significantly. This is because the RR only reflects the best path that it has selected due to its BGP best-path decision process. There may be additional backup paths available to the RR but the client only receives the best path.

 

Another key advantage of using route reflection is when a new or additional iBGP router is added to the network, the new iBGP router can easily become a RR client of one or more RRs with a few simple additional configurations. The new iBGP router no longer needs to be configured to peer with every other iBGP router in the AS; and additionally, only the RR(s) needs to have its BGP configuration updated to include this new iBGP peer. The remaining iBGP routers in the AS do not need to have their configurations modified. It should be noted that a RR client router is not aware, nor does it need any special configuration, that it is functioning as a “client”. Only the RR needs additional configuration beyond the basic BGP configurations.

 

Loop Prevention BGP Attributes

While BGP topologies benefit from route reflection in terms of operational simplicity and scale, there still must be a proper loop prevention or detection capability. Inside an AS, BGP routing loops are prevented by the full-mesh topology and the rule of “an iBGP router cannot re-advertise routes learnt from one iBGP router to another iBGP router”. Outside an AS (ie: In the Internet), BGP routing loops are prevented with the use of the BGP AS_PATH attribute.

 

When a BGP router receives NLRI, it first checks the AS_PATH attribute to determine if its configured ASN (AS Number) is included in the AS_PATH. If it is, the route has looped and the router discards the route. In other words, the route was either initially advertised by the local AS, or the route has been received through the local AS. In either case, the route has looped and must be ignored to prevent forwarding loops.

 

Inside an AS, the AS_PATH attribute is not useful as a loop detection mechanism because all the routers in the local AS have the same ASN configured. This is why an iBGP full-mesh is required inside an AS. However; BGP route reflection relaxes this full-mesh rule which may lead to a routing loop inside the local AS. RFC 4456 introduces new BGP attributes to detect and avoid routing loops inside an AS when BGP route reflection is used.

 

  • ORIGINATOR_ID

This attribute contains the identifier of the router that originates the route into the local AS. The identifier used is the Router-ID (RID) of the originating router.

 

The RR creates the ORIGINATOR_ID attribute and inserts the RID of the originating router; which would typically be one of its configured RR clients. If an RR originates a route into the local AS (RRs are not often deployed in this manner, but it is possible), it will insert its RID into the ORIGINATOR_ID attribute. If an iBGP router receives a routing update with its RID contained in the ORIGINATOR_ID attribute, then the route has looped and it will be discarded.

  • CLUSTER_LIST

 

This attribute contains the identifier of the router that originates the route into the local AS. The identifier used is the Router-ID (RID) of the originating router.

 

The RR creates the ORIGINATOR_ID attribute and inserts the RID of the originating router; which would typically be one of its configured RR clients. If an RR originates a route into the local AS (RRs are not often deployed in this manner, but it is possible), it will insert its RID into the ORIGINATOR_ID attribute. If an iBGP router receives a routing update with its RID contained in the ORIGINATOR_ID attribute, then the route has looped and it will be discarded.

 

BCP on Redundant BGP Route Reflection

When a single RR is deployed in a BGP topology, the best practice recommendations in this document do not really apply. However, the best practice recommended here should also be followed even when a single RR is deployed. The primary point of this document focuses on situations where there is more than one RR deployed in a network; and more specifically, when a BGP RR “cluster” has more than one RR. There are two scenarios to discuss:

 

  1. A BGP topology with many RRs but only a single RR “per cluster”.
  2. A BGP topology with many RRs and more than one RR per cluster (ie. multi-cluster)

 

The 1st scenario was shown in Figure 3. The 2nd scenario is shown below in Figure 4.

 

Figure 4: Redundant Route Reflection

 

FIgure4_2477a.jpg

 
 
 

 

Figure 4 shows RR client R5 peered to two RRs, RR1 and RR3. If either RR1 or RR3 were to fail, the 10.1/16 prefix from R10 will still be advertised to R2 and R4 by the other available RR. This is the most common deployment scenario for an iBGP topology using route reflection. So, redundancy is provided and the forwarding performance of the network is minimally affected if a single RR were to fail. All routes advertised from that RR will be withdrawn by the other routers and the IGP will re-route the underlying IP topology around the failed RR.
 

 

Some BGP implementations will create the CLUSTER_ID automatically, while allowing the option to manually configure the CLUSTER_ID (such as NetIron OS), while other BGP implementation may require a manually configured CLUSTER_ID. In scenario 1, the BCP is to manually configure the CLUSTER_ID of the RR and to make this the same as the Router-ID (RID) of the router. The RID is a unique value and the common practice is to make the RID the same as the configured primary loopback IP address. So, the net result is that the loopback IP address is the same as the RID and the CLUSTER_ID. This makes network operations and troubleshooting simpler and is pretty straight-forward.

 

 
The 2nd scenario where there is more than one RR “per cluster” is the more interesting scenario. Assume a cluster with two RRs (for redundancy reasons), as shown in Figure 4. The client, R5, is peered to both RRs. In the event one of the RRs fails, the other RR already has all the routes so routing consistency and forwarding is maintained.  As a side note: In the event of a single RR failure, the packet forwarding performance of the network should not be affected, even while there is a routing convergence event. This is because the “BGP next-hop” of a route is typically not the BGP RR itself, but is a BGP RR client router (R5 in this example). Per RFC 4456, a BGP RR must not modify the BGP next-hop attribute when it reflects route. So, while there is a routing convergence event when one RR fails, the BGP next-hop of a route is not changed so forwarding to that BGP next-hop should not be affected. In Figure 4, R2 receives two routes to 10.1/16, one from each of the RRs. R2 performs its best-path decision process and installs one of those routes into its local FIB; while keeping the secondary, backup route in its local RIB. In Figure 4, it is most likely that R2 will install the route to 10.1/16 that it received from RR1 as its best path since R2 and RR1 are adjacent and the IGP metric is lower than it would be from R2 to RR3. If RR1 fails, R2 should not drop packets when it updates its FIB with the secondary route since the BGP next-hop of that route has not changed. This secondary route is now the primary and preferred route. Referring again to Figure 4, R5 is the BGP next-hop for external destination 10.1/16. R2 can reach R5 via RR1 or via R4-RR3. If RR1 fails, that does not change the BGP next-hop; it is still R5. R2’s FIB should be automatically updated so that R2 sends packets to R3 to reach R5 and no longer sends packets to RR1 to reach R5 (since RR1 has failed). So, while there will be a short control plane routing convergence event in the local AS, data plane packet forwarding should not be (or minimally, at most) affected.
 

 

When two (or more) RRs are deployed for redundancy reasons, there is a decision of which CLUSTER_ID to configure on the RRs. While following the recommendations described here, each RR will be configured to have its CLUSTER_ID the same value as its primary loopback IP address. This makes the CLUSTER_IDs of each of the RRs unique. The alternate decision is to configure both RRs of a cluster with the same CLUSTER_ID value. This is shown here.

 

 

Figure 5:  Redundant Route-Reflection with same CLUSTER_ID

 

2477_Figure5.jpg

 

The problem with configuring both RRs to have the same CLUSTER_ID is that the RRs can no longer accept reflected routes from each other. The RR will see its configured CLUSTER_ID in the CLUSTER_LIST attribute of routes that it receives from the other RR, and it will discard the route due to loop prevention rules. Whereas in the first case using unique CLUSTER_IDs on each of the RRs, each RR can now receive each others’ reflected routes. The CLUSTER_LIST of the reflected route will contain the CLUSTER_ID of the other RR, which is unique in the AS, and the receiving router will not discard the route due to loop prevention rules. While this slight difference in behavior may appear harmless, the case of configuring the same CLUSTER_ID on both RRs can lead to corner cases of black-hole forwarding.

 

Potential Black-Hole Forwarding Scenario

Assume the network topology shown below.

 

Figure 6: Redundant RRs and Black-Hole Scenario

 

2477_Figure6.jpg

 

R5 receives external routing information from R10 via eBGP. It advertises the NLRI to RR1 and RR3. RR1 and RR3 both reflect the NLRI to R2 and R4, and also to each other. Since they are both configured with the same CLUSTER_ID (192.168.1.10), they will both ignore the reflected NLRI from each other due to loop prevention rules. Their configured CLUSTER_ID is present in the CLUSTER_LIST attribute. Proper routing and forwarding behavior is still maintained under normal operating conditions. R5 is the BGP next-hop for 10.1/16 and R5 is reachable from all other iBGP routers via the IGP. Route recursion provides this reachability: Destination 10.1/16 has a BGP next-hop of 192.168.1.5; where 192.168.1.5 is reachable via the IGP. R5 is configured with “next-hop-self” (which is a standard best practice), so that when it advertises 10.1/16 it sets the BGP next-hop to itself. The address used for this would be its loopback IP address.

 

If RR1 were to fail, as previously described, the network will re-converge with minimal or no impact to packet forwarding. This is because R2 has two paths in its RIB for destination 10.1/16, one from each of the RRs. R2 will select one of those as its best route and install that in its FIB. The BGP nexthop of 10.1/16 is 192.168.1.5 for both paths. 192.168.1.5 is reachable via the IGP from R2-RR1-R5 and from R2-R4-RR3-R5. It selects the path R2-RR1-R5 due to that path having a lower (better) IGP metric. So, even during a failure of an RR, routing and forwarding continues to function.

 

However, if the BGP peering session between R5 and RR1 fails, then there is a black-hole problem. In this scenario, RR1 is still active in the network and is still peering to other iBGP routers. Only its iBGP session to R5 is down. When R2 is forwarding packets to destination 10.1/16, the IGP directs those packets through RR1 to ultimately reach R5 and R10. As previously described, this is due to IGP metrics, as the “shortest-path” to the BGP next-hop of 192.168.1.5 is through R2-RR1-R5 and not through R2-R4-RR3-R5. R2-R4-RR3-R5 is the backup path that is still contained in R2’s RIB. However, now that the BGP peering session between R5 and RR1 is down, RR1 no longer has the external destination 10.1/16 in its routing or forwarding tables. Although RR2 had reflected that route to RR1, due to loop prevention rules, RR1 discards that route. RR1 saw its configured CLUSTER_ID in the CLUSTER_LIST attribute. So, now when RR1 receives packets from R2 that are destined to external destination 10.1/16, it does not find route 10.1/16 in its RIB or FIB. It will drop those packets. This is the black-hole problem, resulting from configuring the same CLUSTER_ID on both RRs.

 

The scenario of the BGP peering session between R5 and RR1 failing, while both routers remain active in the network IGP, can be considered a corner case scenario. However, it is possible either due to mis-configuration or some other circumstance. A network operator could be configuring either R5 or RR1 and due to a mis-configuration, the BGP peering session could be accidentally mis-configured. Another possible scenario is either R5 or RR1 could have its management/control plane CPU at a high level for some reason resulting in dropped BGP keepalives. In such a case, any number of BGP peering sessions could fail or flap.

 

If each RR had its loopback IP address configured as its CLUSTER_ID, the uniqueness of loopback IP addresses ensures there is no similar black-hole problem in such a failure scenario. RR1 would accept the reflected routes from RR2; and even though its peering session to R5 is down, destination routes from R10 are maintained in its RIB and FIB. When RR1 now receives packets for destination 10.1/16 it does a lookup in its FIB and finds that route and its associated BGP next-hop of 192.168.1.5, which is R5. The IGP informs RR1 how to reach next-hop 192.168.1.5 and the packets are forwarded to R5.

 

It should be noted that the result of this configuration will increase the BGP RIB size on both RRs. This is because each RR will have received routes directly from R5, and the same routes with different path attributes from the other RR. However; in most cases, the benefit of avoiding the black-hole scenario outweighs this increase in RIB size. So, based on the black-hole forwarding problem described here, the recommended best common practice is to configure each RR with a unique CLUSTER_ID. Furthermore, the CLUSTER_ID should be configured as the same value as the primary loopback IP address. The BGP Router-ID should also be configured as the same value as the primary loopback IP address.

 

If RIB size in the RRs is not a large concern, then the best practice recommended here is to configure the CLUSTER_ID of each RR to be unique. If there are memory or RIB constraints, then perhaps it would be acceptable to configure the same CLUSTER_ID on each RR but at the risk of experiencing the black-hole forwarding problem.

 

RR Configuration

These examples are for RR1 only.

 

To configure the Router_ID to match the local primary loopback IP address:

 

NetIron(config)# ip router-id 192.168.1.1

 

To enable BGP, configure the local-AS number:

 

NetIron(config)# router bgp BGP4: Please configure 'local-as' parameter in order to enable BGP4. NetIron(config-bgp)# local-as 10

 

Configure a BGP peer group for the iBGP peers (iBGP peers to RR1 are R2, RR3, R4):

 

NetIron(config-bgp)# neighbor PeerGroup1 peer-group NetIron(config-bgp)# neighbor PeerGroup1 description “iBGP Peers” NetIron(config-bgp)# neighbor PeerGroup1 remote-as 100

 

Now add the IBGP peers to the peer group:

 

 

NetIron(config-bgp)# neighbor 192.168.1.2 peer-group PeerGroup1

NetIron(config-bgp)# neighbor 192.168.1.3 peer-group PeerGroup1 NetIron(config-bgp)# neighbor 192.168.1.4 peer-group PeerGroup1

 

To enable route reflection, the router needs to be configured with the peer address of the RR client. Also, it is a best practice to put the RR client in a different BGP peer group from the iBGP peers. To add R5 as an RR client of RR1, enter the following command on RR1:

 

 

NetIron(config-bgp)# neighbor PeerGroup2 peer-group NetIron(config-bgp)# neighbor PeerGroup2 description “RR Clients” NetIron(config-bgp)# neighbor PeerGroup2 remote-as 100 NetIron(config-bgp)# neighbor 192.168.1.5 peer-group PeerGroup2 NetIron(config-bgp)# neighbor 192.168.1.5 route-reflector-client

 

To configure RR1 with a CLUSTER-ID of 192.168.1.1, which matches the Router_ID  and local primary loopback IP address:

 

NetIron(config-bgp)# cluster-id 192.168.1.1

 

Summary

 

The best current practice for configuration of the CLUSTER_ID on route reflectors is to make the value the same as the local router’s primary loopback IP address. This value should also be the configured BGP Router-ID. This ensures uniqueness of the CLUSTER_ID and Router-ID of each RR in the AS. This makes network operations and troubleshooting simpler. The primary loopback IP address of each router should always be advertised into the IGP of the network, using standard IGP configurations.

 

Configuring unique CLUSTER_IDs on redundant RRs serving the same clients also prevents a corner case black-hole forwarding scenario. The black-hole forwarding scenario that can be avoided is described in detail in this paper. This type of configuration provides the RR client with redundant RR clusters, since each RR is in its own cluster.

 

If memory or RIB size on the RRs is a concern, it may be necessary to configure the same CLUSTER_ID on both RRs. Doing so will reduce the memory / RIB consumption on each of the RRs. However, the network operations staff should be aware of the black-hole forwarding problem of this type of configuration. Most deployed RRs today do not have much of a memory constraint, so it is expected that the best practice of configuring unique CLUSTER_IDs on each of the RRs would be applicable in most cases.

 

Related Information

For more information, see the documents below:

 

  • RFC 4271 A Border Gateway Protocol 4 (BGP4), January 2006
  • RFC 4456 BGP Route Reflection: An Alternative to Full Mesh Internal BGP (iBGP), April 2006.
  • Brocade NetIron OS Configuration Guide, September 2010.

 

 

Some, but not all of the content in this site provided, reviewed, approved or endorsed by Brocade and is provided solely as a convenience of our customers. All postings and use of the content on this site are subject to the BROCADE EXTRANET TERMS AND CONDITIONS OF USE of the site. BROCADE ASSUMES NO LIABIITY WHATSOEVER, MAKES NO REPRESENTATION AND DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THE CONTENT PROVIDED HEREIN, INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, CORRECTNESS, APPROPRIATENESS OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED EXPECT AS PROVIDED IN BROCADE’S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, THIRD PARTIES USE THIS CONTENT AT THEIR OWN RISK. Content on this site may contain or be subject to specific guidelines or limitation on use. Third parties using this content agree to abide by any limitation or guidelines and to comply with the BROCADE EXTRANET TERMS AND CONDITIONS OF USE of this site. Brocade may make changes to this content, to specifications, or product design or descriptions at any time, or may remove content at its sole discretion without notice.

Contributors