Software-Defined

Service-Aware Transport for Multi-site NFV Resiliency

by kevin.shatzkamer on ‎02-17-2016 01:22 PM - last edited on ‎02-17-2016 03:10 PM by Community Manager (3,165 Views)

This is a co-authored blog by: Kevin Shatzkamer, Mobile CTO, Brocade - Antti Pappila, CTO, Creanord - Rory Browne, SDN/NFV Architect, Intel

 

Service assurance accurately measures and reports on the infrastructure (network and platform) key performance indicators (KPIs) that might affect a specific service. Although service assurance is well-understood in the traditional networking environment, applying similar mechanism to network function virtualization (NFV) infrastructure (NFV-I), and using principles from software-defined networking (SDN) to automatically restore service level remains a work in progress.

 

Brocade, Creanord, and Intel have partnered together to develop an approach to Virtual Service Assurance Management (vSAM), enabling more open and deterministic service deployment and resource usage in a virtual environment. The initial environment for demonstration is the mobile S/Gi-LAN, but the approach is generic enough to apply to various other NFV-based use-cases, including virtual CPE. A static demonstration of the capabilities of this solution will be on display at Mobile World Congress 2016 (Brocade booth Hall 2, Suite 2G29, Creanord in Team Finland Pavilion Hall 1 E04), with a live demonstration at NFV World Congress (19-22 April 2016, San Jose). In addition, the VSAM work is an approved ETSI NFV PoC (PoC #39).

 

Why is Service Assurance Needed in an NFV Environment?

Service providers continue to have concerns about the determinism and manageability of services on virtual infrastructure and across programmable networks. While the physical appliance era tightly integrated the hardware resources with application software and OSS stack, the promise of the NFV era is to decouple and modularize this architecture, and to use APIs to gain visibility into network metadata and re-provision infrastructure and services as required. To do this a horizontal service assurance capability is required that interfaces agnostically to NFVI, VNF and service management layers.

 

Telco workloads are typically driven by Service Level Agreements (SLAs), often with regulations associated with real-time, always-on service delivery.  In order to deliver these services, operators must have mechanisms to track Key Performance Indicators (KPIs), used to guarantee SLAs. According to Heavy Reading NFV Market Tracker shown below, five of the top eight challenges are directly impacted by the service assurance capabilities of the SDN/NFV system.

 

What are the biggest technical challenges related to NFV that still must be resolved?

Brocade, Creanord, and Intel believe that resolving the operator challenge related to visibility, measurement, reporting, and assurance of virtual services is an important step in accelerating the deployments of NFV. By combining Brocade’s strength in SDN and NFV, Creanord’s SLA, virtualized service assurance and probe technology, and Intel’s carrier-grade hardware resources and deep understanding of computing architectures, many of the challenges defined above are addressed.

 

The Solution Approach

The virtualized service assurance management (vSAM) is an E2E service assurance fabric that spans NFVI-PoPs and WAN underlay. The vSAM may be configured with thresholds for KPI violation of the NFVI and underlay. Thus when a NFVI KPI such as CPU load or underlay KPI such as latency are violated for a configurable length of time, the vSAM informs the service orchestration layers.

 

In addition to the interrupt-driven model, the vSAM may also poll NFVI and underlay resources at any stage to give a service orchestration entity a real-time accurate view of network resources across the NFV architecture. This allows the service orchestrator to make deterministic and efficient use of resources when provisioning or modifying services.

 

Making this work requires Intel’s deep understanding and expertise into x86 and software architectures, along with Brocade and Creanord software components. The following diagram illustrates the solution architecture, followed by a list of components provided by each vendor.

Virtualized Service Assurance Management solution architecture with Brocade, Creanord, Intel.

 

 

Brocade Components

·       Brocade vRouter 5600

·       Brocade SDN Controller

·       Brocade Service Function Chaining (SFC) SDN Application (for demonstration)

Creanord

Components

·       Creanord EchoVault SLA Performance Assurance System: Creanord EchoVault system provides powerful performance reporting and SLA management with multi-vendor support.

·       Creanord vProbe: The Creanord vProbe provides virtualized, powerful and accurate end-to-end network performance measurements and monitoring in data centers and other central locations as well as at the network edge.

Intel Components

·       Dual Intel® Xeon® Processor E5-2600 v3

·       2x Intel® 82599 10 GbE

·       Collectd-based VNF KPI Monitoring Agent

·       VNF and DCI Underlay Impairment Agent (for test purpose)

 

 

An Example of How This Works

If we consider the diagram above. We have 3 NFV locations that represent 3 Gi-LANs sites with SDN-based service chaining whereby we selectively and dynamically route subscriber traffic coming from the mobile core GGSN and PGWs onto service chains for processing.

 

Now, let’s understand the impact of a failure within a service chain.

 

Under normal circumstances, service chains would be served by the local Gi-LAN PoP, however, in the figure below, a failure has occurred on the pink (voice) service chain whereby a Session Border Controller (SBC) on Gi-LAN A has either failed or overloaded (as in it has exceeded a KPI threshold we have defined for that service chain).

 

Impact of a failure within a service chain

 

When the failure occurs, vSAM immediately notifies the Brocade service chaining application. The service chaining application then decides on how to reroute the service chain based on network-wide NFV and underlay KPI state. In this case it decides to route the chain over the WAN link to the SBC in Gi-LAN B, after it has checked that the SBC on Gi-LAN B is not too loaded to take the traffic and the underlay WAN is exhibiting the right characteristics (reported by Creanord vProbe) to ensure service integrity (very low latency for the voice service for example).

 

In addition, similar actions would need to be taken in the event of WAN link failure, WAN link overload, VNF failures, or a combination of multiple simultaneous conditions. In any of these scenarios, vSAM notifies the service chaining application, which checks VNF resource and network state and reroutes the chain appropriately.

 

Feature

Benefit

Virtualized Probes

May be deployed and relocated at will across the NFVI infrastructure

Combined NFVI and underlay KPIs

One holistic view of service and network KPI state without expensive of external proprietary correlation

Configurable thresholds

Thresholds may be set in accordance to service descriptors defined at service creation time, and modified at any stage in the service life cycle

Interrupt and Poll mode architecture

Enables fast responses to resource overloads or failure to ensure SLAs but also allows background OSS processes to mine vSAM network status as needed

Integration with Service Chaining Application

Ensures E2E service integrity and determinism whilst making best use of NFVI resources negating site over dimensioning

Open Source Architecture

Delivers multi-vendor capabilities leveraging open source platforms with open APIs from vendor infrastructure

Multi-Site Architecture

vSAM can scale to many locations making it very applicable for vCPE, MEC, C-RAN and distributed EPC applications for example

 

Conclusion

By implementing a vSAM horizontal architecture, whereby service assurance KPIs are abstracted and open, service providers can remove tight dependencies between upper layer service management systems and NFVI. This ensures NFVI presents an open interface from a platform and network point of view, which helps address operator challenges around vendor modularity, on-site resource usage optimization, service integrity and velocity across the whole SDN/NFV architecture.

 

Resources: