This is a co-authored blog by: Kevin Shatzkamer, Mobile CTO, Brocade - Antti Pappila, CTO, Creanord - Rory Browne, SDN/NFV Architect, Intel
Service assurance accurately measures and reports on the infrastructure (network and platform) key performance indicators (KPIs) that might affect a specific service. Although service assurance is well-understood in the traditional networking environment, applying similar mechanism to network function virtualization (NFV) infrastructure (NFV-I), and using principles from software-defined networking (SDN) to automatically restore service level remains a work in progress.
Brocade, Creanord, and Intel have partnered together to develop an approach to Virtual Service Assurance Management (vSAM), enabling more open and deterministic service deployment and resource usage in a virtual environment. The initial environment for demonstration is the mobile S/Gi-LAN, but the approach is generic enough to apply to various other NFV-based use-cases, including virtual CPE. A static demonstration of the capabilities of this solution will be on display at Mobile World Congress 2016 (Brocade booth Hall 2, Suite 2G29, Creanord in Team Finland Pavilion Hall 1 E04), with a live demonstration at NFV World Congress (19-22 April 2016, San Jose). In addition, the VSAM work is an approved ETSI NFV PoC (PoC #39).
Why is Service Assurance Needed in an NFV Environment?
Service providers continue to have concerns about the determinism and manageability of services on virtual infrastructure and across programmable networks. While the physical appliance era tightly integrated the hardware resources with application software and OSS stack, the promise of the NFV era is to decouple and modularize this architecture, and to use APIs to gain visibility into network metadata and re-provision infrastructure and services as required. To do this a horizontal service assurance capability is required that interfaces agnostically to NFVI, VNF and service management layers.
Telco workloads are typically driven by Service Level Agreements (SLAs), often with regulations associated with real-time, always-on service delivery. In order to deliver these services, operators must have mechanisms to track Key Performance Indicators (KPIs), used to guarantee SLAs. According to Heavy Reading NFV Market Tracker shown below, five of the top eight challenges are directly impacted by the service assurance capabilities of the SDN/NFV system.
Brocade, Creanord, and Intel believe that resolving the operator challenge related to visibility, measurement, reporting, and assurance of virtual services is an important step in accelerating the deployments of NFV. By combining Brocade’s strength in SDN and NFV, Creanord’s SLA, virtualized service assurance and probe technology, and Intel’s carrier-grade hardware resources and deep understanding of computing architectures, many of the challenges defined above are addressed.
The Solution Approach
The virtualized service assurance management (vSAM) is an E2E service assurance fabric that spans NFVI-PoPs and WAN underlay. The vSAM may be configured with thresholds for KPI violation of the NFVI and underlay. Thus when a NFVI KPI such as CPU load or underlay KPI such as latency are violated for a configurable length of time, the vSAM informs the service orchestration layers.
In addition to the interrupt-driven model, the vSAM may also poll NFVI and underlay resources at any stage to give a service orchestration entity a real-time accurate view of network resources across the NFV architecture. This allows the service orchestrator to make deterministic and efficient use of resources when provisioning or modifying services.
Making this work requires Intel’s deep understanding and expertise into x86 and software architectures, along with Brocade and Creanord software components. The following diagram illustrates the solution architecture, followed by a list of components provided by each vendor.
· Creanord vProbe: The Creanord vProbe provides virtualized, powerful and accurate end-to-end network performance measurements and monitoring in data centers and other central locations as well as at the network edge.
· Dual Intel® Xeon® Processor E5-2600 v3
· 2x Intel® 82599 10 GbE
· Collectd-based VNF KPI Monitoring Agent
· VNF and DCI Underlay Impairment Agent (for test purpose)
An Example of How This Works
If we consider the diagram above. We have 3 NFV locations that represent 3 Gi-LANs sites with SDN-based service chaining whereby we selectively and dynamically route subscriber traffic coming from the mobile core GGSN and PGWs onto service chains for processing.
Now, let’s understand the impact of a failure within a service chain.
Under normal circumstances, service chains would be served by the local Gi-LAN PoP, however, in the figure below, a failure has occurred on the pink (voice) service chain whereby a Session Border Controller (SBC) on Gi-LAN A has either failed or overloaded (as in it has exceeded a KPI threshold we have defined for that service chain).
When the failure occurs, vSAM immediately notifies the Brocade service chaining application. The service chaining application then decides on how to reroute the service chain based on network-wide NFV and underlay KPI state. In this case it decides to route the chain over the WAN link to the SBC in Gi-LAN B, after it has checked that the SBC on Gi-LAN B is not too loaded to take the traffic and the underlay WAN is exhibiting the right characteristics (reported by Creanord vProbe) to ensure service integrity (very low latency for the voice service for example).
In addition, similar actions would need to be taken in the event of WAN link failure, WAN link overload, VNF failures, or a combination of multiple simultaneous conditions. In any of these scenarios, vSAM notifies the service chaining application, which checks VNF resource and network state and reroutes the chain appropriately.
May be deployed and relocated at will across the NFVI infrastructure
Combined NFVI and underlay KPIs
One holistic view of service and network KPI state without expensive of external proprietary correlation
Thresholds may be set in accordance to service descriptors defined at service creation time, and modified at any stage in the service life cycle
Interrupt and Poll mode architecture
Enables fast responses to resource overloads or failure to ensure SLAs but also allows background OSS processes to mine vSAM network status as needed
Integration with Service Chaining Application
Ensures E2E service integrity and determinism whilst making best use of NFVI resources negating site over dimensioning
Open Source Architecture
Delivers multi-vendor capabilities leveraging open source platforms with open APIs from vendor infrastructure
vSAM can scale to many locations making it very applicable for vCPE, MEC, C-RAN and distributed EPC applications for example
By implementing a vSAM horizontal architecture, whereby service assurance KPIs are abstracted and open, service providers can remove tight dependencies between upper layer service management systems and NFVI. This ensures NFVI presents an open interface from a platform and network point of view, which helps address operator challenges around vendor modularity, on-site resource usage optimization, service integrity and velocity across the whole SDN/NFV architecture.