15129044dSC.J. Collier..  BSD LICENSE
25129044dSC.J. Collier    Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
35129044dSC.J. Collier    All rights reserved.
45129044dSC.J. Collier
55129044dSC.J. Collier    Redistribution and use in source and binary forms, with or without
65129044dSC.J. Collier    modification, are permitted provided that the following conditions
75129044dSC.J. Collier    are met:
85129044dSC.J. Collier
95129044dSC.J. Collier    * Redistributions of source code must retain the above copyright
105129044dSC.J. Collier    notice, this list of conditions and the following disclaimer.
115129044dSC.J. Collier    * Redistributions in binary form must reproduce the above copyright
125129044dSC.J. Collier    notice, this list of conditions and the following disclaimer in
135129044dSC.J. Collier    the documentation and/or other materials provided with the
145129044dSC.J. Collier    distribution.
155129044dSC.J. Collier    * Neither the name of Intel Corporation nor the names of its
165129044dSC.J. Collier    contributors may be used to endorse or promote products derived
175129044dSC.J. Collier    from this software without specific prior written permission.
185129044dSC.J. Collier
195129044dSC.J. Collier    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
205129044dSC.J. Collier    "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
215129044dSC.J. Collier    LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
225129044dSC.J. Collier    A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
235129044dSC.J. Collier    OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
245129044dSC.J. Collier    SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
255129044dSC.J. Collier    LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
265129044dSC.J. Collier    DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
275129044dSC.J. Collier    THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
285129044dSC.J. Collier    (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
295129044dSC.J. Collier    OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
305129044dSC.J. Collier
315129044dSC.J. CollierPoll Mode Driver for Emulated Virtio NIC
325129044dSC.J. Collier========================================
335129044dSC.J. Collier
345129044dSC.J. CollierVirtio is a para-virtualization framework initiated by IBM, and supported by KVM hypervisor.
355129044dSC.J. CollierIn the Data Plane Development Kit (DPDK),
365129044dSC.J. Collierwe provide a virtio Poll Mode Driver (PMD) as a software solution, comparing to SRIOV hardware solution,
375129044dSC.J. Collierfor fast guest VM to guest VM communication and guest VM to host communication.
385129044dSC.J. Collier
395129044dSC.J. CollierVhost is a kernel acceleration module for virtio qemu backend.
405129044dSC.J. CollierThe DPDK extends kni to support vhost raw socket interface,
415129044dSC.J. Collierwhich enables vhost to directly read/ write packets from/to a physical port.
425129044dSC.J. CollierWith this enhancement, virtio could achieve quite promising performance.
435129044dSC.J. Collier
445129044dSC.J. CollierIn future release, we will also make enhancement to vhost backend,
455129044dSC.J. Collierreleasing peak performance of virtio PMD driver.
465129044dSC.J. Collier
475129044dSC.J. CollierFor basic qemu-KVM installation and other Intel EM poll mode driver in guest VM,
485129044dSC.J. Collierplease refer to Chapter "Driver for VM Emulated Devices".
495129044dSC.J. Collier
505129044dSC.J. CollierIn this chapter, we will demonstrate usage of virtio PMD driver with two backends,
515129044dSC.J. Collierstandard qemu vhost back end and vhost kni back end.
525129044dSC.J. Collier
535129044dSC.J. CollierVirtio Implementation in DPDK
545129044dSC.J. Collier-----------------------------
555129044dSC.J. Collier
565129044dSC.J. CollierFor details about the virtio spec, refer to Virtio PCI Card Specification written by Rusty Russell.
575129044dSC.J. Collier
585129044dSC.J. CollierAs a PMD, virtio provides packet reception and transmission callbacks virtio_recv_pkts and virtio_xmit_pkts.
595129044dSC.J. Collier
605129044dSC.J. CollierIn virtio_recv_pkts, index in range [vq->vq_used_cons_idx , vq->vq_ring.used->idx) in vring is available for virtio to burst out.
615129044dSC.J. Collier
625129044dSC.J. CollierIn virtio_xmit_pkts, same index range in vring is available for virtio to clean.
635129044dSC.J. CollierVirtio will enqueue to be transmitted packets into vring, advance the vq->vq_ring.avail->idx,
645129044dSC.J. Collierand then notify the host back end if necessary.
655129044dSC.J. Collier
665129044dSC.J. CollierFeatures and Limitations of virtio PMD
675129044dSC.J. Collier--------------------------------------
685129044dSC.J. Collier
695129044dSC.J. CollierIn this release, the virtio PMD driver provides the basic functionality of packet reception and transmission.
705129044dSC.J. Collier
715129044dSC.J. Collier*   It supports merge-able buffers per packet when receiving packets and scattered buffer per packet
725129044dSC.J. Collier    when transmitting packets. The packet size supported is from 64 to 1518.
735129044dSC.J. Collier
745129044dSC.J. Collier*   It supports multicast packets and promiscuous mode.
755129044dSC.J. Collier
768be94df6SRicardo Salveti*   The descriptor number for the Rx/Tx queue is hard-coded to be 256 by qemu.
775129044dSC.J. Collier    If given a different descriptor number by the upper application,
785129044dSC.J. Collier    the virtio PMD generates a warning and fall back to the hard-coded value.
795129044dSC.J. Collier
805129044dSC.J. Collier*   Features of mac/vlan filter are supported, negotiation with vhost/backend are needed to support them.
815129044dSC.J. Collier    When backend can't support vlan filter, virtio app on guest should disable vlan filter to make sure
825129044dSC.J. Collier    the virtio port is configured correctly. E.g. specify '--disable-hw-vlan' in testpmd command line.
835129044dSC.J. Collier
845129044dSC.J. Collier*   RTE_PKTMBUF_HEADROOM should be defined larger than sizeof(struct virtio_net_hdr), which is 10 bytes.
855129044dSC.J. Collier
865129044dSC.J. Collier*   Virtio does not support runtime configuration.
875129044dSC.J. Collier
885129044dSC.J. Collier*   Virtio supports Link State interrupt.
895129044dSC.J. Collier
905129044dSC.J. Collier*   Virtio supports software vlan stripping and inserting.
915129044dSC.J. Collier
925129044dSC.J. Collier*   Virtio supports using port IO to get PCI resource when uio/igb_uio module is not available.
935129044dSC.J. Collier
945129044dSC.J. CollierPrerequisites
955129044dSC.J. Collier-------------
965129044dSC.J. Collier
975129044dSC.J. CollierThe following prerequisites apply:
985129044dSC.J. Collier
995129044dSC.J. Collier*   In the BIOS, turn VT-x and VT-d on
1005129044dSC.J. Collier
1015129044dSC.J. Collier*   Linux kernel with KVM module; vhost module loaded and ioeventfd supported.
1025129044dSC.J. Collier    Qemu standard backend without vhost support isn't tested, and probably isn't supported.
1035129044dSC.J. Collier
1045129044dSC.J. CollierVirtio with kni vhost Back End
1055129044dSC.J. Collier------------------------------
1065129044dSC.J. Collier
1075129044dSC.J. CollierThis section demonstrates kni vhost back end example setup for Phy-VM Communication.
1085129044dSC.J. Collier
1095129044dSC.J. Collier.. _figure_host_vm_comms:
1105129044dSC.J. Collier
1115129044dSC.J. Collier.. figure:: img/host_vm_comms.*
1125129044dSC.J. Collier
1135129044dSC.J. Collier   Host2VM Communication Example Using kni vhost Back End
1145129044dSC.J. Collier
1155129044dSC.J. Collier
1165129044dSC.J. CollierHost2VM communication example
1175129044dSC.J. Collier
1185129044dSC.J. Collier#.  Load the kni kernel module:
1195129044dSC.J. Collier
1205129044dSC.J. Collier    .. code-block:: console
1215129044dSC.J. Collier
1225129044dSC.J. Collier        insmod rte_kni.ko
1235129044dSC.J. Collier
1245129044dSC.J. Collier    Other basic DPDK preparations like hugepage enabling, uio port binding are not listed here.
1255129044dSC.J. Collier    Please refer to the *DPDK Getting Started Guide* for detailed instructions.
1265129044dSC.J. Collier
1275129044dSC.J. Collier#.  Launch the kni user application:
1285129044dSC.J. Collier
1295129044dSC.J. Collier    .. code-block:: console
1305129044dSC.J. Collier
1315129044dSC.J. Collier        examples/kni/build/app/kni -c 0xf -n 4 -- -p 0x1 -P --config="(0,1,3)"
1325129044dSC.J. Collier
1335129044dSC.J. Collier    This command generates one network device vEth0 for physical port.
1345129044dSC.J. Collier    If specify more physical ports, the generated network device will be vEth1, vEth2, and so on.
1355129044dSC.J. Collier
1365129044dSC.J. Collier    For each physical port, kni creates two user threads.
1375129044dSC.J. Collier    One thread loops to fetch packets from the physical NIC port into the kni receive queue.
1385129044dSC.J. Collier    The other user thread loops to send packets in the kni transmit queue.
1395129044dSC.J. Collier
1405129044dSC.J. Collier    For each physical port, kni also creates a kernel thread that retrieves packets from the kni receive queue,
1415129044dSC.J. Collier    place them onto kni's raw socket's queue and wake up the vhost kernel thread to exchange packets with the virtio virt queue.
1425129044dSC.J. Collier
1435129044dSC.J. Collier    For more details about kni, please refer to :ref:`kni`.
1445129044dSC.J. Collier
1455129044dSC.J. Collier#.  Enable the kni raw socket functionality for the specified physical NIC port,
1465129044dSC.J. Collier    get the generated file descriptor and set it in the qemu command line parameter.
1475129044dSC.J. Collier    Always remember to set ioeventfd_on and vhost_on.
1485129044dSC.J. Collier
1495129044dSC.J. Collier    Example:
1505129044dSC.J. Collier
1515129044dSC.J. Collier    .. code-block:: console
1525129044dSC.J. Collier
1535129044dSC.J. Collier        echo 1 > /sys/class/net/vEth0/sock_en
1545129044dSC.J. Collier        fd=`cat /sys/class/net/vEth0/sock_fd`
1555129044dSC.J. Collier        exec qemu-system-x86_64 -enable-kvm -cpu host \
1565129044dSC.J. Collier        -m 2048 -smp 4 -name dpdk-test1-vm1 \
1575129044dSC.J. Collier        -drive file=/data/DPDKVMS/dpdk-vm.img \
1585129044dSC.J. Collier        -netdev tap, fd=$fd,id=mynet_kni, script=no,vhost=on \
1595129044dSC.J. Collier        -device virtio-net-pci,netdev=mynet_kni,bus=pci.0,addr=0x3,ioeventfd=on \
1605129044dSC.J. Collier        -vnc:1 -daemonize
1615129044dSC.J. Collier
1625129044dSC.J. Collier    In the above example, virtio port 0 in the guest VM will be associated with vEth0, which in turns corresponds to a physical port,
1635129044dSC.J. Collier    which means received packets come from vEth0, and transmitted packets is sent to vEth0.
1645129044dSC.J. Collier
1655129044dSC.J. Collier#.  In the guest, bind the virtio device to the uio_pci_generic kernel module and start the forwarding application.
1668be94df6SRicardo Salveti    When the virtio port in guest bursts Rx, it is getting packets from the
1678be94df6SRicardo Salveti    raw socket's receive queue.
1688be94df6SRicardo Salveti    When the virtio port bursts Tx, it is sending packet to the tx_q.
1695129044dSC.J. Collier
1705129044dSC.J. Collier    .. code-block:: console
1715129044dSC.J. Collier
1725129044dSC.J. Collier        modprobe uio
1735129044dSC.J. Collier        echo 512 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
1745129044dSC.J. Collier        modprobe uio_pci_generic
1755b1ff351SRicardo Salveti        python tools/dpdk-devbind.py -b uio_pci_generic 00:03.0
1765129044dSC.J. Collier
1775129044dSC.J. Collier    We use testpmd as the forwarding application in this example.
1785129044dSC.J. Collier
1795129044dSC.J. Collier    .. figure:: img/console.*
1805129044dSC.J. Collier
1815129044dSC.J. Collier       Running testpmd
1825129044dSC.J. Collier
1835129044dSC.J. Collier#.  Use IXIA packet generator to inject a packet stream into the KNI physical port.
1845129044dSC.J. Collier
1855129044dSC.J. Collier    The packet reception and transmission flow path is:
1865129044dSC.J. Collier
1878be94df6SRicardo Salveti    IXIA packet generator->82599 PF->KNI Rx queue->KNI raw socket queue->Guest
1888be94df6SRicardo Salveti    VM virtio port 0 Rx burst->Guest VM virtio port 0 Tx burst-> KNI Tx queue
1898be94df6SRicardo Salveti    ->82599 PF-> IXIA packet generator
1905129044dSC.J. Collier
1915129044dSC.J. CollierVirtio with qemu virtio Back End
1925129044dSC.J. Collier--------------------------------
1935129044dSC.J. Collier
1945129044dSC.J. Collier.. _figure_host_vm_comms_qemu:
1955129044dSC.J. Collier
1965129044dSC.J. Collier.. figure:: img/host_vm_comms_qemu.*
1975129044dSC.J. Collier
1985129044dSC.J. Collier   Host2VM Communication Example Using qemu vhost Back End
1995129044dSC.J. Collier
2005129044dSC.J. Collier
2015129044dSC.J. Collier.. code-block:: console
2025129044dSC.J. Collier
2035129044dSC.J. Collier    qemu-system-x86_64 -enable-kvm -cpu host -m 2048 -smp 2 -mem-path /dev/
2045129044dSC.J. Collier    hugepages -mem-prealloc
2055129044dSC.J. Collier    -drive file=/data/DPDKVMS/dpdk-vm1
2065129044dSC.J. Collier    -netdev tap,id=vm1_p1,ifname=tap0,script=no,vhost=on
2075129044dSC.J. Collier    -device virtio-net-pci,netdev=vm1_p1,bus=pci.0,addr=0x3,ioeventfd=on
2085129044dSC.J. Collier    -device pci-assign,host=04:10.1 \
2095129044dSC.J. Collier
2105129044dSC.J. CollierIn this example, the packet reception flow path is:
2115129044dSC.J. Collier
2128be94df6SRicardo Salveti    IXIA packet generator->82599 PF->Linux Bridge->TAP0's socket queue-> Guest
2138be94df6SRicardo Salveti    VM virtio port 0 Rx burst-> Guest VM 82599 VF port1 Tx burst-> IXIA packet
2148be94df6SRicardo Salveti    generator
2155129044dSC.J. Collier
2165129044dSC.J. CollierThe packet transmission flow is:
2175129044dSC.J. Collier
2188be94df6SRicardo Salveti    IXIA packet generator-> Guest VM 82599 VF port1 Rx burst-> Guest VM virtio
2198be94df6SRicardo Salveti    port 0 Tx burst-> tap -> Linux Bridge->82599 PF-> IXIA packet generator
2208be94df6SRicardo Salveti
2218be94df6SRicardo Salveti
2228be94df6SRicardo SalvetiVirtio PMD Rx/Tx Callbacks
2238be94df6SRicardo Salveti--------------------------
2248be94df6SRicardo Salveti
2258be94df6SRicardo SalvetiVirtio driver has 3 Rx callbacks and 2 Tx callbacks.
2268be94df6SRicardo Salveti
2278be94df6SRicardo SalvetiRx callbacks:
2288be94df6SRicardo Salveti
2298be94df6SRicardo Salveti#. ``virtio_recv_pkts``:
2308be94df6SRicardo Salveti   Regular version without mergeable Rx buffer support.
2318be94df6SRicardo Salveti
2328be94df6SRicardo Salveti#. ``virtio_recv_mergeable_pkts``:
2338be94df6SRicardo Salveti   Regular version with mergeable Rx buffer support.
2348be94df6SRicardo Salveti
2358be94df6SRicardo Salveti#. ``virtio_recv_pkts_vec``:
2368be94df6SRicardo Salveti   Vector version without mergeable Rx buffer support, also fixes the available
2378be94df6SRicardo Salveti   ring indexes and uses vector instructions to optimize performance.
2388be94df6SRicardo Salveti
2398be94df6SRicardo SalvetiTx callbacks:
2408be94df6SRicardo Salveti
2418be94df6SRicardo Salveti#. ``virtio_xmit_pkts``:
2428be94df6SRicardo Salveti   Regular version.
2438be94df6SRicardo Salveti
2448be94df6SRicardo Salveti#. ``virtio_xmit_pkts_simple``:
2458be94df6SRicardo Salveti   Vector version fixes the available ring indexes to optimize performance.
2468be94df6SRicardo Salveti
2478be94df6SRicardo Salveti
2488be94df6SRicardo SalvetiBy default, the non-vector callbacks are used:
2498be94df6SRicardo Salveti
2508be94df6SRicardo Salveti*   For Rx: If mergeable Rx buffers is disabled then ``virtio_recv_pkts`` is
2518be94df6SRicardo Salveti    used; otherwise ``virtio_recv_mergeable_pkts``.
2528be94df6SRicardo Salveti
2538be94df6SRicardo Salveti*   For Tx: ``virtio_xmit_pkts``.
2548be94df6SRicardo Salveti
2558be94df6SRicardo Salveti
2568be94df6SRicardo SalvetiVector callbacks will be used when:
2578be94df6SRicardo Salveti
2588be94df6SRicardo Salveti*   ``txq_flags`` is set to ``VIRTIO_SIMPLE_FLAGS`` (0xF01), which implies:
2598be94df6SRicardo Salveti
2608be94df6SRicardo Salveti    *   Single segment is specified.
2618be94df6SRicardo Salveti
2628be94df6SRicardo Salveti    *   No offload support is needed.
2638be94df6SRicardo Salveti
2648be94df6SRicardo Salveti*   Mergeable Rx buffers is disabled.
2658be94df6SRicardo Salveti
2668be94df6SRicardo SalvetiThe corresponding callbacks are:
2678be94df6SRicardo Salveti
2688be94df6SRicardo Salveti*   For Rx: ``virtio_recv_pkts_vec``.
2698be94df6SRicardo Salveti
2708be94df6SRicardo Salveti*   For Tx: ``virtio_xmit_pkts_simple``.
2718be94df6SRicardo Salveti
2728be94df6SRicardo Salveti
2738be94df6SRicardo SalvetiExample of using the vector version of the virtio poll mode driver in
2748be94df6SRicardo Salveti``testpmd``::
2758be94df6SRicardo Salveti
2768be94df6SRicardo Salveti   testpmd -c 0x7 -n 4 -- -i --txqflags=0xF01 --rxq=1 --txq=1 --nb-cores=1
277