test_environment.rst revision b55e324d
1Integration Tests
2=================
3
4Abstract
5--------
6
7FD.io VPP software data plane technology has become very popular across
8a wide range of VPP eco-system use cases, putting higher pressure on
9continuous verification of VPP software quality.
10
11This document describes a proposal for design and implementation of extended
12continuous VPP testing by extending existing test environments.
13Furthermore it describes and summarizes implementation details of Integration
14and System tests platform *1-Node VPP_Device*. It aims to provide a complete
15end-to-end view of *1-Node VPP_Device* environment in order to improve
16extendability and maintenance, under the guideline of VPP core team.
17
18The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
19"SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this document are to be
20interpreted as described in :rfc:`8174`.
21
22Overview
23--------
24
25.. only:: latex
26
27    .. raw:: latex
28
29        \begin{figure}[H]
30            \centering
31                \graphicspath{{../_tmp/src/vpp_device_tests/}}
32                \includegraphics[width=0.90\textwidth]{vpp_device}
33                \label{fig:vpp_device}
34        \end{figure}
35
36.. only:: html
37
38    .. figure:: vpp_device.svg
39        :alt: vpp_device
40        :align: center
41
42Physical Testbeds
43-----------------
44
45All :abbr:`FD.io (Fast Data Input/Ouput)` :abbr:`CSIT (Continuous System
46Integration and Testing)` vpp-device tests are executed on physical testbeds
47built with bare-metal servers hosted by :abbr:`LF (Linux Foundation)` FD.io
48project. Two 1-node testbed topologies are used:
49
50- **2-Container Topology**: Consisting of one Docker container acting as SUT
51  (System Under Test) and one Docker container as TG (Traffic Generator), both
52  connected in ring topology via physical NIC cross-connecting.
53
54Current FD.io production testbeds are built with servers based on one
55processor generation of Intel Xeons: Skylake (Platinum 8180). Testbeds built
56with servers based on Arm processors are in the process of being added to FD.io
57production.
58
59Following section describe existing production 1n-skx testbed.
60
611-Node Xeon Skylake (1n-skx)
62~~~~~~~~~~~~~~~~~~~~~~~~~~~~
63
641n-skx testbed is based on single SuperMicro SYS-7049GP-TRT server equipped with
65two Intel Xeon Skylake Platinum 8180 2.5 GHz 28 core processors. Physical
66testbed topology is depicted in a figure below.
67
68.. only:: latex
69
70    .. raw:: latex
71
72        \begin{figure}[H]
73            \centering
74                \graphicspath{{../_tmp/src/vpp_device_tests/}}
75                \includegraphics[width=0.90\textwidth]{vf-2n-nic2nic}
76                \label{fig:vf-2n-nic2nic}
77        \end{figure}
78
79.. only:: html
80
81    .. figure:: vf-2n-nic2nic.svg
82        :alt: vf-2n-nic2nic
83        :align: center
84
85Server is populated with the following NIC models:
86
87#. NIC-1: x710-da4 4p10GE Intel.
88#. NIC-2: x710-da4 4p10GE Intel.
89
90All Intel Xeon Skylake servers run with Intel Hyper-Threading enabled,
91doubling the number of logical cores exposed to Linux, with 56 logical
92cores and 28 physical cores per processor socket.
93
94NIC interfaces are shared using Linux vfio_pci and VPP VF drivers:
95
96- DPDK VF driver,
97- Fortville AVF driver.
98
99Provided Intel x710-da4 4p10GE NICs support 32 VFs per interface, 128 per NIC.
100
101Complete 1n-skx testbeds specification is available on `CSIT LF Testbeds
102<https://wiki.fd.io/view/CSIT/Testbeds:_Xeon_Skx,_Arm,_Atom.>`_ wiki page.
103
104Total of two 1n-skx testbeds are in operation in FD.io labs.
105
1061-Node Virtualbox (1n-vbox)
107~~~~~~~~~~~~~~~~~~~~~~~~~~~
108
1091n-skx testbed can run in single VirtualBox VM machine. This solution replaces
110the previously used Vagrant environment based on 3 VMs.
111
112VirtualBox VM MAY be created by Vagrant and MUST have additional 4 virtio NICs
113each pair attached to separate private networks to simulate back-to-back
114connections. It SHOULD be 82545EM device model (otherwise can be changed in
115boostrap scripts). Example of Vagrant configuration:
116
117::
118
119    Vagrant.configure(2) do |c|
120      c.vm.network "private_network", type: "dhcp", auto_config: false,
121          virtualbox__intnet: "port1", nic_type: "82545EM"
122      c.vm.network "private_network", type: "dhcp", auto_config: false,
123          virtualbox__intnet: "port2", nic_type: "82545EM"
124
125      c.vm.provider :virtualbox do |v|
126        v.customize ["modifyvm", :id, "--nicpromisc2", "allow-all"]
127        v.customize ["modifyvm", :id, "--nicpromisc3", "allow-all"]
128        v.customize ["modifyvm", :id, "--nicpromisc4", "allow-all"]
129        v.customize ["modifyvm", :id, "--nicpromisc5", "allow-all"]
130
131Vagrant VM is populated with the following NIC models:
132
133#. NIC-1: 82545EM Intel.
134#. NIC-2: 82545EM Intel.
135#. NIC-3: 82545EM Intel.
136#. NIC-4: 82545EM Intel.
137
138Containers
139----------
140
141It was agreed on :abbr:`TWS (Technical Work Stream)` call to continue with
142Ubuntu 18.04 LTS as a baseline system with OPTIONAL extend to Centos 7 and
143SuSE per demand [TWSLink]_.
144
145All :abbr:`DCR (Docker container)` images are REQUIRED to be hosted on Docker
146registry available from LF network, publicly available and trackable. For
147backup, tracking and contributing purposes all Dockerfiles (including files
148needed for building container) MUST be available and stored in [fdiocsitgerrit]_
149repository under appropriate folders. This allows the peer review process to be
150done for every change of infrastructure related to scope of this document.
151Currently only **csit-shim-dcr** and **csit-sut-dcr** containers will be stored
152and maintained under CSIT repository by CSIT contributors.
153
154At the time of designing solution described in this document the interconnection
155between [dockerhub]_ and  [fdiocsitgerrit]_ for automated build purposes and
156image hosting cannot be established with the trust and respectful to
157security of FD.io project. Unless adressed, :abbr:`DCR` images will be placed in
158custom registry service [fdioregistry]_. Automated Jenkins jobs will be created
159in align of long term solution for container lifecycle and ability to build
160new version of docker images.
161
162In parallel, the effort is started to find the outsourced Docker registry
163service.
164
165Versioning
166~~~~~~~~~~
167
168As of initial version of vpp-device, we do have only single latest version of
169Docker image hosted on [dockerhub]_. This will be addressed as further
170improvement with proper semantic versioning.
171
172jenkins-slave-dcr
173~~~~~~~~~~~~~~~~~
174
175This :abbr:`DCR` acts as the Jenkins slave (known also as jenkins minion). It
176can connect over SSH protocol to TCP port 6022 of **csit-shim-dcr** and executes
177non-interactive reservation script. Nomad is responsible for scheduling this
178container execution onto specific **1-Node VPP_Device** testbed. It executes
179:abbr:`CSIT` environment including :abbr:`CSIT` framework.
180
181All software dependencies including VPP/DPDK that are not present in
182**csit-sut-dcr** container image and/or needs to be compiled prior running on
183**csit-sut-dcr** SHOULD be compiled in this container.
184
185- *Container Image Location*: Docker image at snergster/vpp-ubuntu18.
186
187- *Container Definition*: Docker file specified at [JenkinsSlaveDcrFile]_.
188
189- *Initializing*: Container is initialized from within *Consul by HashiCorp*
190  and *Nomad by HashiCorp*.
191
192csit-shim-dcr
193~~~~~~~~~~~~~
194
195This :abbr:`DCR` acts as an intermediate layer running script responsible for
196orchestrating topologies under test and reservation. Responsible for managing VF
197resources and allocation to :abbr:`DUT (Device Under Test)`, :abbr:`TG
198(Traffic Generator)` containers. This MUST to be done on **csit-shim-dcr**.
199This image also acts as the generic reservation mechanics arbiter to make sure
200that only Y number of simulations are spawned on any given HW node.
201
202- *Container Image Location*: Docker image at snergster/csit-shim.
203
204- *Container Definition*: Docker file specified at [CsitShimDcrFile]_.
205
206- *Initializing*: Container is initialized from within *Consul by HashiCorp*
207  and *Nomad by HashiCorp*. Required docker parameters, to be able to run
208  nested containers with VF reservation system are: privileged, net=host,
209  pid=host.
210
211- *Connectivity*: Over SSH only, using <host>:6022 format. Currently using
212  *root* user account as primary. From the jenkins slave it will be able to
213  connect via env variable, since the jenkins slave doesn't actually know what
214  host its running on.
215
216  ::
217
218      ssh -p 6022 root@10.30.51.node
219
220csit-sut-dcr
221~~~~~~~~~~~~
222
223This :abbr:`DCR` acts as an :abbr:`SUT (System Under Test)`. Any :abbr:`DUT` or
224:abbr:`TG` application is installed there. It is RECOMMENDED to install DUT and
225all DUT dependencies via commands ``rpm -ihv`` on RedHat based OS or ``dpkg -i``
226on Debian based OS.
227
228Container is designed to be a very lightweight Docker image that only installs
229packages and execute binaries (previously built or downloaded on
230**jenkins-slave-dcr**) and contains libraries necessary to run CSIT framework
231including those required by DUT/TG.
232
233- *Container Image Location*: Docker image at snergster/csit-sut.
234
235- *Container Definition*: Docker file specified at [CsitSutDcrFile]_.
236
237- *Initializing*:
238  ::
239
240    docker run
241    # Run the container in the background and print the new container ID.
242    --detach=true
243    # Give extended privileges to this container. A "privileged" container is
244    # given access to all devices and able to run nested containers.
245    --privileged
246    # Publish all exposed ports to random ports on the host interfaces.
247    --publish-all
248    # Automatically remove the container when it exits.
249    --rm
250    # Size of /dev/shm.
251    dcr_stc_params+="--shm-size 512M "
252    # Override access to PCI bus by attaching a filesystem mount to the
253    # container.
254    dcr_stc_params+="--mount type=tmpfs,destination=/sys/bus/pci/devices "
255    # Mount vfio to be able to bind to see bound interfaces. We cannot use
256    # --device=/dev/vfio as this does not see newly bound interfaces.
257    dcr_stc_params+="--volume /dev/vfio:/dev/vfio "
258    # Mount docker.sock to be able to use docker deamon of the host.
259    dcr_stc_params+="--volume /var/run/docker.sock:/var/run/docker.sock "
260    # Mount /opt/boot/ where VM kernel and initrd are located.
261    dcr_stc_params+="--volume /opt/boot/:/opt/boot/ "
262    # Mount host hugepages for VMs.
263    dcr_stc_params+="--volume /dev/hugepages/:/dev/hugepages/ "
264
265  Container name is catenated from **csit-** prefix and uuid generated uniquely
266  for each container instance.
267
268- *Connectivity*: Over SSH only, using <host>[:<port>] format. Currently using
269  *root* user account as primary.
270  ::
271
272    ssh -p <port> root@10.30.51.<node>
273
274Container required to run as ``--privileged`` due to ability to create nested
275containers and have full read/write access to sysfs (for bind/unbind). Docker
276automatically pick free network port (``--publish-all``) for ability to connect
277over ssh. To be able to limit access to PCI bus, container is creating tmpfs
278mount type in PCI bus tree. CSIT reservation script is dynamically linking only
279PCI devices (NIC cards) that are reserved for particular container. This
280way it is not colliding with other containers. To make vfio work, access to
281``/dev/vfio`` must be granted.
282
283.. todo: Change default user to testuser with non-privileged and install sudo.
284
285Environment initialization
286--------------------------
287
288All 1-node servers are to be managed and provisioned via the [ansiblelink]_ set
289of playbooks with *vpp-device* role. Full playbooks can be found under
290[fdiocsitansible]_ directory. This way we are able to track all configuration
291changes of physical servers in gerrit (in structured yaml format) as well as we
292are able to extend *vpp-device* to additional servers with less effort or
293re-stage servers in case of failure.
294
295SR-IOV VF initialization is done via ``systemd`` service during host system boot
296up. Service with name *csit-initialize-vfs.service* is created under systemd
297system context (``/etc/systemd/system/``). By default service is calling
298``/usr/local/bin/csit-initialize-vfs.sh`` with single parameter:
299
300- **start**: Creates maximum number of :abbr:`virtual functions (VFs)` (detected
301  from ``sriov_totalvfs``) for each whitelisted PCI device.
302- **stop**: Removes all :abbr:`VFs` for all whitelisted PCI device.
303
304Service is considered active even when all of its processes exited successfully.
305Stopping service will automatically remove :abbr:`VFs`.
306
307::
308
309    [Unit]
310    Description=CSIT Initialize SR-IOV VFs
311    After=network.target
312
313    [Service]
314    Type=one-shot
315    RemainAfterExit=True
316    ExecStart=/usr/local/bin/csit-initialize-vfs.sh start
317    ExecStop=/usr/local/bin/csit-initialize-vfs.sh stop
318
319    [Install]
320    WantedBy=default.target
321
322Script is driven by two array variables ``pci_blacklist``/``pci_whitelist``.
323They MUST store all PCI addresses in **<domain>:<bus>:<device>.<func>** format,
324where:
325
326- **pci_blacklist**: PCI addresses to be skipped from :abbr:`VFs`
327  initialization (usefull for e.g. excluding management network interfaces).
328- **pci_whitelist**: PCI addresses to be included for :abbr:`VFs`
329  initialization.
330
331VF reservation
332--------------
333
334During topology initialization phase of script, mutex is used to avoid multiple
335instances of script to interact with each other during resources allocation.
336Mutal exclusion ensure that no two distinct instances of script will get same
337resource list.
338
339Reservation function reads the list of all available virtual function network
340devices in system:
341
342::
343
344    # Find the first ${device_count} number of available TG Linux network
345    # VF device names. Only allowed VF PCI IDs are filtered.
346    for netdev in ${tg_netdev[@]}
347    do
348        for netdev_path in $(grep -l "${pci_id}" \
349                             /sys/class/net/${netdev}*/device/device \
350                             2> /dev/null)
351        do
352            if [[ ${#TG_NETDEVS[@]} -lt ${device_count} ]]; then
353                tg_netdev_name=$(dirname ${netdev_path})
354                tg_netdev_name=$(dirname ${tg_netdev_name})
355                TG_NETDEVS+=($(basename ${tg_netdev_name}))
356            else
357                break
358            fi
359        done
360        if [[ ${#TG_NETDEVS[@]} -eq ${device_count} ]]; then
361            break
362        fi
363    done
364
365Where ``${pci_id}`` is ID of white-listed VF PCI ID. For more information please
366see [pciids]_. This act as security constraint to prevent taking other unwanted
367interfaces.
368The output list of all VF network devices is split into two lists for TG and
369SUT side of connection. First two items from each TG or SUT network devices
370list are taken to expose directly to namespace of container. This can be done
371via commands:
372
373::
374
375    $ ip link set ${netdev} netns ${DCR_CPIDS[tg]}
376    $ ip link set ${netdev} netns ${DCR_CPIDS[dut1]}
377
378In this stage also symbolic links to PCI devices under sysfs bus directory tree
379are created in running containers. Once VF devices are assigned to container
380namespace and PCI deivces are linked to running containers and mutex is exited.
381Selected VF network device automatically dissapear from parent container
382namespace, so another instance of script will not find device under that
383namespace.
384
385Once Docker container exits, network device is returned back into parent
386namespace and can be reused.
387
388Network traffic isolation - Intel i40evf
389----------------------------------------
390
391In a virtualized environment, on Intel(R) Server Adapters that support SR-IOV,
392the virtual function (VF) may be subject to malicious behavior. Software-
393generated layer two frames, like IEEE 802.3x (link flow control), IEEE 802.1Qbb
394(priority based flow-control), and others of this type, are not expected and
395can throttle traffic between the host and the virtual switch, reducing
396performance. To resolve this issue, configure all SR-IOV enabled ports for
397VLAN tagging. This configuration allows unexpected, and potentially malicious,
398frames to be dropped. [inteli40e]_
399
400To configure VLAN tagging for the ports on an SR-IOV enabled adapter,
401use the following command. The VLAN configuration SHOULD be done
402before the VF driver is loaded or the VM is booted. [inteli40e]_
403
404::
405
406    $ ip link set dev <PF netdev id> vf <id> vlan <vlan id>
407
408For example, the following instructions will configure PF eth0 and
409the first VF on VLAN 10.
410
411::
412
413    $ ip link set dev eth0 vf 0 vlan 10
414
415VLAN Tag Packet Steering allows to send all packets with a specific VLAN tag to
416a particular SR-IOV virtual function (VF). Further, this feature allows to
417designate a particular VF as trusted, and allows that trusted VF to request
418selective promiscuous mode on the Physical Function (PF). [inteli40e]_
419
420To set a VF as trusted or untrusted, enter the following command in the
421Hypervisor:
422
423::
424
425  $ ip link set dev eth0 vf 1 trust [on|off]
426
427Once the VF is designated as trusted, use the following commands in the VM
428to set the VF to promiscuous mode. [inteli40e]_
429
430- For promiscuous all:
431  ::
432
433      $ ip link set eth2 promisc on
434
435- For promiscuous Multicast:
436  ::
437
438      $ ip link set eth2 allmulti on
439
440.. note::
441
442    By default, the ethtool priv-flag vf-true-promisc-support is set to
443    *off*, meaning that promiscuous mode for the VF will be limited. To set the
444    promiscuous mode for the VF to true promiscuous and allow the VF to see
445    all ingress traffic, use the following command.
446    $ ethtool set-priv-flags p261p1 vf-true-promisc-support on
447    The vf-true-promisc-support priv-flag does not enable promiscuous mode;
448    rather, it designates which type of promiscuous mode (limited or true)
449    you will get when you enable promiscuous mode using the ip link commands
450    above. Note that this is a global setting that affects the entire device.
451    However,the vf-true-promisc-support priv-flag is only exposed to the first
452    PF of the device. The PF remains in limited promiscuous mode (unless it
453    is in MFP mode) regardless of the vf-true-promisc-support setting.
454    [inteli40e]_
455
456Service described earlier *csit-initialize-vfs.service* is responsible for
457assigning 802.1Q vlan tagging to each vitual function via physical function
458from list of white-listed PCI addresses by following (simplified) code.
459
460::
461
462    SCRIPT_DIR="$(dirname $(readlink -e "${BASH_SOURCE[0]}"))"
463    source "${SCRIPT_DIR}/csit-initialize-vfs-data.sh"
464
465    # Initilize whitelisted NICs with maximum number of VFs.
466    pci_idx=0
467    for pci_addr in ${PCI_WHITELIST[@]}; do
468        if ! [[ ${PCI_BLACKLIST[*]} =~ "${pci_addr}" ]]; then
469            pci_path="/sys/bus/pci/devices/${pci_addr}"
470            # SR-IOV initialization
471            case "${1:-start}" in
472                "start" )
473                    sriov_totalvfs=$(< "${pci_path}"/sriov_totalvfs)
474                    ;;
475                "stop" )
476                    sriov_totalvfs=0
477                    ;;
478            esac
479            echo ${sriov_totalvfs} > "${pci_path}"/sriov_numvfs
480            # SR-IOV 802.1Q isolation
481            case "${1:-start}" in
482                "start" )
483                    pf=$(basename "${pci_path}"/net/*)
484                    for vf in $(seq "${sriov_totalvfs}"); do
485                        # PCI address index in array (pairing siblings).
486                        if [[ -n ${PF_INDICES[@]} ]]
487                        then
488                            vlan_pf_idx=${PF_INDICES[$pci_addr]}
489                        else
490                            vlan_pf_idx=$((pci_idx % (${#PCI_WHITELIST[@]}/2)))
491                        fi
492                        # 802.1Q base offset.
493                        vlan_bs_off=1100
494                        # 802.1Q PF PCI address offset.
495                        vlan_pf_off=$(( vlan_pf_idx * 100 + vlan_bs_off ))
496                        # 802.1Q VF PCI address offset.
497                        vlan_vf_off=$(( vlan_pf_off + vf - 1 ))
498                        # VLAN string.
499                        vlan_str="vlan ${vlan_vf_off}"
500                        # MAC string.
501                        mac5="$(printf '%x' ${pci_idx})"
502                        mac6="$(printf '%x' $(( vf - 1 )))"
503                        mac_str="mac ba:dc:0f:fe:${mac5}:${mac6}"
504                        # Set 802.1Q VLAN id and MAC address
505                        ip link set ${pf} vf $(( vf - 1)) ${mac_str} ${vlan_str}
506                        ip link set ${pf} vf $(( vf - 1)) trust on
507                        ip link set ${pf} vf $(( vf - 1)) spoof off
508                    done
509                    pci_idx=$(( pci_idx + 1 ))
510                    ;;
511            esac
512            rmmod i40evf
513            modprobe i40evf
514        fi
515    done
516
517Assignment starts at VLAN 1100 and incrementing by 1 for each VF and by 100 for
518each white-listed PCI address up to the middle of the PCI list. Second half of
519the lists is assumed to be directly (cable) paired siblings and assigned with
520same 802.1Q VLANs as its siblings.
521
522Open tasks
523----------
524
525Security
526~~~~~~~~
527
528.. note::
529
530    Switch to non-privileged containers: As of now all three container
531    flavors are using privileged containers to make it working. Explore options
532    to switch containers to non-privileged with explicit rather implicit
533    privileges.
534
535.. note::
536
537    Switch to testuser account intead of root.
538
539Maintainability
540~~~~~~~~~~~~~~~
541
542.. note::
543
544    Docker image distribution: Create jenkins jobs with full pipiline of
545    CI/CD for CSIT Docker images.
546
547Stability
548~~~~~~~~~
549
550.. note::
551
552    Implement queueing mechanism: Currently there is no mechanics that
553    would place starving jobs in queue in case of no resources available.
554
555.. note::
556
557    Replace reservation script with Docker network plugin written in
558    GOLANG/SH/Python - platform independent.
559
560Links
561-----
562
563.. [TWSLink] `TWS <https://wiki.fd.io/view/CSIT/TWS>`_
564.. [dockerhub] `Docker hub <https://hub.docker.com/>`_
565.. [fdiocsitgerrit] `FD.io/CSIT gerrit <https://gerrit.fd.io/r/CSIT>`_
566.. [fdioregistry] `FD.io registy <registry.fdiopoc.net>`_
567.. [JenkinsSlaveDcrFile] `jenkins-slave-dcr-file <https://github.com/snergfdio/multivppcache/blob/master/ubuntu18/Dockerfile>`_
568.. [CsitShimDcrFile] `csit-shim-dcr-file <https://github.com/snergfdio/multivppcache/blob/master/csit-shim/Dockerfile>`_
569.. [CsitSutDcrFile] `csit-sut-dcr-file <https://github.com/snergfdio/multivppcache/blob/master/csit-sut/Dockerfile>`_
570.. [ansiblelink] `ansible <https://www.ansible.com/>`_
571.. [fdiocsitansible] `Fd.io/CSIT ansible <https://git.fd.io/csit/tree/resources/tools/testbed-setup/ansible>`_
572.. [inteli40e] `Intel i40e <https://downloadmirror.intel.com/26370/eng/readme.txt>`_
573.. [pciids] `pci ids <http://pci-ids.ucw.cz/v2.2/pci.ids>`_
574