pkt_switch.rst revision 5b1ff351
1..  BSD LICENSE
2    Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
3    All rights reserved.
4
5    Redistribution and use in source and binary forms, with or without
6    modification, are permitted provided that the following conditions
7    are met:
8
9    * Redistributions of source code must retain the above copyright
10    notice, this list of conditions and the following disclaimer.
11    * Redistributions in binary form must reproduce the above copyright
12    notice, this list of conditions and the following disclaimer in
13    the documentation and/or other materials provided with the
14    distribution.
15    * Neither the name of Intel Corporation nor the names of its
16    contributors may be used to endorse or promote products derived
17    from this software without specific prior written permission.
18
19    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
20    "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
21    LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
22    A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
23    OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
24    SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
25    LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
26    DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
27    THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
28    (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
29    OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
30
31DPDK Xen Based Packet-Switching Solution
32========================================
33
34Introduction
35------------
36
37DPDK provides a para-virtualization packet switching solution, based on the Xen hypervisor's Grant Table, Note 1,
38which provides simple and fast packet switching capability between guest domains and host domain based on MAC address or VLAN tag.
39
40This solution is comprised of two components;
41a Poll Mode Driver (PMD) as the front end in the guest domain and a switching back end in the host domain.
42XenStore is used to exchange configure information between the PMD front end and switching back end,
43including grant reference IDs for shared Virtio RX/TX rings,
44MAC address, device state, and so on. XenStore is an information storage space shared between domains,
45see further information on XenStore below.
46
47The front end PMD can be found in the DPDK directory lib/ librte_pmd_xenvirt and back end example in examples/vhost_xen.
48
49The PMD front end and switching back end use shared Virtio RX/TX rings as para- virtualized interface.
50The Virtio ring is created by the front end, and Grant table references for the ring are passed to host.
51The switching back end maps those grant table references and creates shared rings in a mapped address space.
52
53The following diagram describes the functionality of the DPDK Xen Packet- Switching Solution.
54
55
56.. _figure_dpdk_xen_pkt_switch:
57
58.. figure:: img/dpdk_xen_pkt_switch.*
59
60   Functionality of the DPDK Xen Packet Switching Solution.
61
62
63Note 1 The Xen hypervisor uses a mechanism called a Grant Table to share memory between domains
64(`http://wiki.xen.org/wiki/Grant Table <http://wiki.xen.org/wiki/Grant%20Table>`_).
65
66A diagram of the design is shown below, where "gva" is the Guest Virtual Address,
67which is the data pointer of the mbuf, and "hva" is the Host Virtual Address:
68
69
70.. _figure_grant_table:
71
72.. figure:: img/grant_table.*
73
74   DPDK Xen Layout
75
76
77In this design, a Virtio ring is used as a para-virtualized interface for better performance over a Xen private ring
78when packet switching to and from a VM.
79The additional performance is gained by avoiding a system call and memory map in each memory copy with a XEN private ring.
80
81Device Creation
82---------------
83
84Poll Mode Driver Front End
85~~~~~~~~~~~~~~~~~~~~~~~~~~
86
87*   Mbuf pool allocation:
88
89    To use a Xen switching solution, the DPDK application should use rte_mempool_gntalloc_create()
90    to reserve mbuf pools during initialization.
91    rte_mempool_gntalloc_create() creates a mempool with objects from memory allocated and managed via gntalloc/gntdev.
92
93    The DPDK now supports construction of mempools from allocated virtual memory through the rte_mempool_xmem_create() API.
94
95    This front end constructs mempools based on memory allocated through the xen_gntalloc driver.
96    rte_mempool_gntalloc_create() allocates Grant pages, maps them to continuous virtual address space,
97    and calls rte_mempool_xmem_create() to build mempools.
98    The Grant IDs for all Grant pages are passed to the host through XenStore.
99
100*   Virtio Ring Creation:
101
102    The Virtio queue size is defined as 256 by default in the VQ_DESC_NUM macro.
103    Using the queue setup function,
104    Grant pages are allocated based on ring size and are mapped to continuous virtual address space to form the Virtio ring.
105    Normally, one ring is comprised of several pages.
106    Their Grant IDs are passed to the host through XenStore.
107
108    There is no requirement that this memory be physically continuous.
109
110*   Interrupt and Kick:
111
112    There are no interrupts in DPDK Xen Switching as both front and back ends work in polling mode.
113    There is no requirement for notification.
114
115*   Feature Negotiation:
116
117    Currently, feature negotiation through XenStore is not supported.
118
119*   Packet Reception & Transmission:
120
121    With mempools and Virtio rings created, the front end can operate Virtio devices,
122    as it does in Virtio PMD for KVM Virtio devices with the exception that the host
123    does not require notifications or deal with interrupts.
124
125XenStore is a database that stores guest and host information in the form of (key, value) pairs.
126The following is an example of the information generated during the startup of the front end PMD in a guest VM (domain ID 1):
127
128.. code-block:: console
129
130        xenstore -ls /local/domain/1/control/dpdk
131        0_mempool_gref="3042,3043,3044,3045"
132        0_mempool_va="0x7fcbc6881000"
133        0_tx_vring_gref="3049"
134        0_rx_vring_gref="3053"
135        0_ether_addr="4e:0b:d0:4e:aa:f1"
136        0_vring_flag="3054"
137        ...
138
139Multiple mempools and multiple Virtios may exist in the guest domain, the first number is the index, starting from zero.
140
141The idx#_mempool_va stores the guest virtual address for mempool idx#.
142
143The idx#_ether_adder stores the MAC address of the guest Virtio device.
144
145For idx#_rx_ring_gref, idx#_tx_ring_gref, and idx#_mempool_gref, the value is a list of Grant references.
146Take idx#_mempool_gref node for example, the host maps those Grant references to a continuous virtual address space.
147The real Grant reference information is stored in this virtual address space,
148where (gref, pfn) pairs follow each other with -1 as the terminator.
149
150
151.. _figure_grant_refs:
152
153.. figure:: img/grant_refs.*
154
155   Mapping Grant references to a continuous virtual address space
156
157
158After all gref# IDs are retrieved, the host maps them to a continuous virtual address space.
159With the guest mempool virtual address, the host establishes 1:1 address mapping.
160With multiple guest mempools, the host establishes multiple address translation regions.
161
162Switching Back End
163~~~~~~~~~~~~~~~~~~
164
165The switching back end monitors changes in XenStore.
166When the back end detects that a new Virtio device has been created in a guest domain, it will:
167
168#.  Retrieve Grant and configuration information from XenStore.
169
170#.  Map and create a Virtio ring.
171
172#.  Map mempools in the host and establish address translation between the guest address and host address.
173
174#.  Select a free VMDQ pool, set its affinity with the Virtio device, and set the MAC/ VLAN filter.
175
176Packet Reception
177~~~~~~~~~~~~~~~~
178
179When packets arrive from an external network, the MAC?VLAN filter classifies packets into queues in one VMDQ pool.
180As each pool is bonded to a Virtio device in some guest domain, the switching back end will:
181
182#.  Fetch an available entry from the Virtio RX ring.
183
184#.  Get gva, and translate it to hva.
185
186#.  Copy the contents of the packet to the memory buffer pointed to by gva.
187
188The DPDK application in the guest domain, based on the PMD front end,
189is polling the shared Virtio RX ring for available packets and receives them on arrival.
190
191Packet Transmission
192~~~~~~~~~~~~~~~~~~~
193
194When a Virtio device in one guest domain is to transmit a packet,
195it puts the virtual address of the packet's data area into the shared Virtio TX ring.
196
197The packet switching back end is continuously polling the Virtio TX ring.
198When new packets are available for transmission from a guest, it will:
199
200#.  Fetch an available entry from the Virtio TX ring.
201
202#.  Get gva, and translate it to hva.
203
204#.  Copy the packet from hva to the host mbuf's data area.
205
206#.  Compare the destination MAC address with all the MAC addresses of the Virtio devices it manages.
207    If a match exists, it directly copies the packet to the matched VIrtio RX ring.
208    Otherwise, it sends the packet out through hardware.
209
210.. note::
211
212    The packet switching back end is for demonstration purposes only.
213    The user could implement their switching logic based on this example.
214    In this example, only one physical port on the host is supported.
215    Multiple segments are not supported. The biggest mbuf supported is 4KB.
216    When the back end is restarted, all front ends must also be restarted.
217
218Running the Application
219-----------------------
220
221The following describes the steps required to run the application.
222
223Validated Environment
224~~~~~~~~~~~~~~~~~~~~~
225
226Host:
227
228    Xen-hypervisor: 4.2.2
229
230    Distribution: Fedora release 18
231
232    Kernel: 3.10.0
233
234    Xen development package (including Xen, Xen-libs, xen-devel): 4.2.3
235
236Guest:
237
238    Distribution: Fedora 16 and 18
239
240    Kernel: 3.6.11
241
242Xen Host Prerequisites
243~~~~~~~~~~~~~~~~~~~~~~
244
245Note that the following commands might not be the same on different Linux* distributions.
246
247*   Install xen-devel package:
248
249    .. code-block:: console
250
251        yum install xen-devel.x86_64
252
253*   Start xend if not already started:
254
255    .. code-block:: console
256
257        /etc/init.d/xend start
258
259*   Mount xenfs if not already mounted:
260
261    .. code-block:: console
262
263        mount -t xenfs none /proc/xen
264
265*   Enlarge the limit for xen_gntdev driver:
266
267    .. code-block:: console
268
269        modprobe -r xen_gntdev
270        modprobe xen_gntdev limit=1000000
271
272.. note::
273
274    The default limit for earlier versions of the xen_gntdev driver is 1024.
275    That is insufficient to support the mapping of multiple Virtio devices into multiple VMs,
276    so it is necessary to enlarge the limit by reloading this module.
277    The default limit of recent versions of xen_gntdev is 1048576.
278    The rough calculation of this limit is:
279
280        limit=nb_mbuf# * VM#.
281
282        In DPDK examples, nb_mbuf# is normally 8192.
283
284Building and Running the Switching Backend
285~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
286
287#.  Edit config/common_linuxapp, and change the default configuration value for the following two items:
288
289    .. code-block:: console
290
291        CONFIG_RTE_LIBRTE_XEN_DOM0=y
292        CONFIG RTE_LIBRTE_PMD_XENVIRT=n
293
294#.  Build the target:
295
296    .. code-block:: console
297
298        make install T=x86_64-native-linuxapp-gcc
299
300#.  Ensure that RTE_SDK and RTE_TARGET are correctly set. Build the switching example:
301
302    .. code-block:: console
303
304        make -C examples/vhost_xen/
305
306#.  Load the Xen DPDK memory management module and preallocate memory:
307
308    .. code-block:: console
309
310        insmod ./x86_64-native-linuxapp-gcc/build/lib/librte_eal/linuxapp/xen_dom0/rte_dom0_mm.ko
311        echo 2048> /sys/kernel/mm/dom0-mm/memsize-mB/memsize
312
313    .. note::
314
315        On Xen Dom0, there is no hugepage support.
316        Under Xen Dom0, the DPDK uses a special memory management kernel module
317        to allocate chunks of physically continuous memory.
318        Refer to the *DPDK Getting Started Guide* for more information on memory management in the DPDK.
319        In the above command, 4 GB memory is reserved (2048 of 2 MB pages) for DPDK.
320
321#.  Load uio_pci_generic and bind one Intel NIC controller to it:
322
323    .. code-block:: console
324
325        modprobe uio_pci_generic
326        python tools/dpdk-devbind.py -b uio_pci_generic 0000:09:00:00.0
327
328    In this case, 0000:09:00.0 is the PCI address for the NIC controller.
329
330#.  Run the switching back end example:
331
332    .. code-block:: console
333
334        examples/vhost_xen/build/vhost-switch -c f -n 3 --xen-dom0 -- -p1
335
336.. note::
337
338    The -xen-dom0 option instructs the DPDK to use the Xen kernel module to allocate memory.
339
340Other Parameters:
341
342*   -vm2vm
343
344    The vm2vm parameter enables/disables packet switching in software.
345    Disabling vm2vm implies that on a VM packet transmission will always go to the Ethernet port
346    and will not be switched to another VM
347
348*   -Stats
349
350    The Stats parameter controls the printing of Virtio-net device statistics.
351    The parameter specifies the interval (in seconds) at which to print statistics,
352    an interval of 0 seconds will disable printing statistics.
353
354Xen PMD Frontend Prerequisites
355~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
356
357#.  Install xen-devel package for accessing XenStore:
358
359    .. code-block:: console
360
361        yum install xen-devel.x86_64
362
363#.  Mount xenfs, if it is not already mounted:
364
365    .. code-block:: console
366
367        mount -t xenfs none /proc/xen
368
369#.  Enlarge the default limit for xen_gntalloc driver:
370
371    .. code-block:: console
372
373        modprobe -r xen_gntalloc
374        modprobe xen_gntalloc limit=6000
375
376.. note::
377
378    Before the Linux kernel version 3.8-rc5, Jan 15th 2013,
379    a critical defect occurs when a guest is heavily allocating Grant pages.
380    The Grant driver allocates fewer pages than expected which causes kernel memory corruption.
381    This happens, for example, when a guest uses the v1 format of a Grant table entry and allocates
382    more than 8192 Grant pages (this number might be different on different hypervisor versions).
383    To work around this issue, set the limit for gntalloc driver to 6000.
384    (The kernel normally allocates hundreds of Grant pages with one Xen front end per virtualized device).
385    If the kernel allocates a lot of Grant pages, for example, if the user uses multiple net front devices,
386    it is best to upgrade the Grant alloc driver.
387    This defect has been fixed in kernel version 3.8-rc5 and later.
388
389Building and Running the Front End
390~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
391
392#.  Edit config/common_linuxapp, and change the default configuration value:
393
394    .. code-block:: console
395
396        CONFIG_RTE_LIBRTE_XEN_DOM0=n
397        CONFIG_RTE_LIBRTE_PMD_XENVIRT=y
398
399#.  Build the package:
400
401    .. code-block:: console
402
403        make install T=x86_64-native-linuxapp-gcc
404
405#.  Enable hugepages. Refer to the  *DPDK Getting Started Guide* for instructions on
406    how to use hugepages in the DPDK.
407
408#.  Run TestPMD. Refer to *DPDK TestPMD Application User Guide* for detailed parameter usage.
409
410    .. code-block:: console
411
412        ./x86_64-native-linuxapp-gcc/app/testpmd -c f -n 4 --vdev="eth_xenvirt0,mac=00:00:00:00:00:11"
413        testpmd>set fwd mac
414        testpmd>start
415
416    As an example to run two TestPMD instances over 2 Xen Virtio devices:
417
418    .. code-block:: console
419
420        --vdev="eth_xenvirt0,mac=00:00:00:00:00:11" --vdev="eth_xenvirt1;mac=00:00:00:00:00:22"
421
422
423Usage Examples: Injecting a Packet Stream Using a Packet Generator
424~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
425
426Loopback Mode
427^^^^^^^^^^^^^
428
429Run TestPMD in a guest VM:
430
431.. code-block:: console
432
433    ./x86_64-native-linuxapp-gcc/app/testpmd -c f -n 4 --vdev="eth_xenvirt0,mac=00:00:00:00:00:11" -- -i --eth-peer=0,00:00:00:00:00:22
434    testpmd> set fwd mac
435    testpmd> start
436
437Example output of the vhost_switch would be:
438
439.. code-block:: console
440
441    DATA:(0) MAC_ADDRESS 00:00:00:00:00:11 and VLAN_TAG 1000 registered.
442
443The above message indicates that device 0 has been registered with MAC address 00:00:00:00:00:11 and VLAN tag 1000.
444Any packets received on the NIC with these values is placed on the device's receive queue.
445
446Configure a packet stream in the packet generator, set the destination MAC address to 00:00:00:00:00:11, and VLAN to 1000,
447the guest Virtio receives these packets and sends them out with destination MAC address 00:00:00:00:00:22.
448
449Inter-VM Mode
450^^^^^^^^^^^^^
451
452Run TestPMD in guest VM1:
453
454.. code-block:: console
455
456    ./x86_64-native-linuxapp-gcc/app/testpmd -c f -n 4 --vdev="eth_xenvirt0,mac=00:00:00:00:00:11" -- -i --eth-peer=0,00:00:00:00:00:22 -- -i
457
458Run TestPMD in guest VM2:
459
460.. code-block:: console
461
462    ./x86_64-native-linuxapp-gcc/app/testpmd -c f -n 4 --vdev="eth_xenvirt0,mac=00:00:00:00:00:22" -- -i --eth-peer=0,00:00:00:00:00:33
463
464Configure a packet stream in the packet generator, and set the destination MAC address to 00:00:00:00:00:11 and VLAN to 1000.
465The packets received in Virtio in guest VM1 will be forwarded to Virtio in guest VM2 and
466then sent out through hardware with destination MAC address 00:00:00:00:00:33.
467
468The packet flow is:
469
470packet generator->Virtio in guest VM1->switching backend->Virtio in guest VM2->switching backend->wire
471