05-11-2017 04:31 AM
I installed vyatta 5600 as a VM, connecting it to 2 interfaces directly connected to the physical 82599 NICs on the host.
I am using qemu-kvm 2.5. and virtio drivers for the NICs.
then I send traffic using pktgen to one port expecting it on the other port.
I am sending always same udp packet in a rate of 2gbps but cannot forward 100% of the packets (only ~99.9% are forwarded).
host machine is: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
vm is deployed with 8G RAM and 3 vcpus (also tried with 5 vcpus).
any idea what can be the problem? I am trying to reach at least 6gbps.
05-12-2017 11:10 AM
There are multiple conditions to be met in order to achieve the maximum throughput with vRouter in a virtualized environment, some of which, but not limited to, are :
- use PCI Passthrough or SR-IOV modes
- assign 3 dataplane vCPUs per 10GE interface + 1 for the Control Plane
- do not cross NUMA node boundaries (QPI)
- do CPU pinning
- use 1G Huge Pages for Memory Backing
Also, make sure pktgen can sustain the throughput you expect from the DUT with the frame size you're using.
I have a server with Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz and I can achive line rate forwarding on 2 x 10GE ports (20Gbps total) with 128-Byte Frames by running RFC-2544 test on Spirent with 4 x 9's Frame Loss Tolerance.
05-14-2017 02:47 AM
fine tunning the hugepage, as you suggested, improved the performance by an order, but not enough though.
on 4gbps (pkt size 1400) i still face pkt loss (e.g. rx/tx 0.99998) while i expect 0 pkt loss.
- the guest includes two 10G interfaces (82599) connected in passthrough mode.
- the guest is assigned with 8G RAM
- the guest is assigned with 7 vCPUs all in pinning mode (using libvirt 'vcpupin' option)
- the host includes one single CPU (hence no NUMA is in used)
- guest's memory is backed with 1G page size, see:
<page size="1" unit="G"/>
- while just forwarding traffic between interfaces on the host (using dpdk l3fwd example) pktgen is capable of full throughput with different pkt size.
in our test we are using 1400 pkt size.
05-15-2017 08:15 AM
How much are you getting at 100% load (2x10GE)?
What is the duration of your test runs?
Also, please keep in mind that you're almost getting a 7 x 9's figure in your test, and vRouter is spec'ed with 4 x 9's only, as I mentioned in my previous reply. Please refer to our data sheet at this link http://www.brocade.com/en/backend-content/pdf-page.html?/content/dam/common/documents/content-types/datasheet/brocade-vrouter-ds.pdf
There are a number of other fine tuning tips, many of them are server vendor brand and model specific, that you need to do in order to achieve the highest throughput with the lowest frame loss rate.
05-15-2017 08:58 AM
i reach 100% for about 500-800mbps.
i test it for ~1 minutes.
i went over this spec some time ago but i assumes the 4 9's is when you close to the performance limit.
should i expect such performance under any rate? i would expect to reach 100% at least on low rates.
05-15-2017 09:37 AM
You should be able to get no frame loss at low input rates, but it usually takes a lot of effort and tuning (be prepared). Here are a few more tips:
- search for your specific server vendor's recommendations on how to change their server settings for high performance use cases. At least, this should include changing the CPU power profile to "Maximum Performance" instead of the default, which is usually "Balanced Perfomance and Power" or something similar
- isolate the dataplane CPUs to aoivd them being interrupted
- use only x16 or x8 PCIe slots and use only one 10GE port per NIC to maintin a low bitrate on PCIe slots
Please refer to this paper from Intel for more instructions about fine tuning the host server https://builders.intel.com/docs/networkbuilders/numa_aware_hypervisor_and_impact_on_brocade_vrouter.pdf.