keep_alive.rst revision 97f17497
1
2..  BSD LICENSE
3    Copyright(c) 2015-2016 Intel Corporation. All rights reserved.
4    All rights reserved.
5
6    Redistribution and use in source and binary forms, with or without
7    modification, are permitted provided that the following conditions
8    are met:
9
10    * Redistributions of source code must retain the above copyright
11    notice, this list of conditions and the following disclaimer.
12    * Redistributions in binary form must reproduce the above copyright
13    notice, this list of conditions and the following disclaimer in
14    the documentation and/or other materials provided with the
15    distribution.
16    * Neither the name of Intel Corporation nor the names of its
17    contributors may be used to endorse or promote products derived
18    from this software without specific prior written permission.
19
20    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
21    "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
22    LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
23    A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
24    OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
25    SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
26    LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
27    DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
28    THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
29    (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
30    OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
31
32Keep Alive Sample Application
33=============================
34
35The Keep Alive application is a simple example of a
36heartbeat/watchdog for packet processing cores. It demonstrates how
37to detect 'failed' DPDK cores and notify a fault management entity
38of this failure. Its purpose is to ensure the failure of the core
39does not result in a fault that is not detectable by a management
40entity.
41
42
43Overview
44--------
45
46The application demonstrates how to protect against 'silent outages'
47on packet processing cores. A Keep Alive Monitor Agent Core (master)
48monitors the state of packet processing cores (worker cores) by
49dispatching pings at a regular time interval (default is 5ms) and
50monitoring the state of the cores. Cores states are: Alive, MIA, Dead
51or Buried. MIA indicates a missed ping, and Dead indicates two missed
52pings within the specified time interval. When a core is Dead, a
53callback function is invoked to restart the packet processing core;
54A real life application might use this callback function to notify a
55higher level fault management entity of the core failure in order to
56take the appropriate corrective action.
57
58Note: Only the worker cores are monitored. A local (on the host) mechanism
59or agent to supervise the Keep Alive Monitor Agent Core DPDK core is required
60to detect its failure.
61
62Note: This application is based on the :doc:`l2_forward_real_virtual`. As
63such, the initialization and run-time paths are very similar to those
64of the L2 forwarding application.
65
66Compiling the Application
67-------------------------
68
69To compile the application:
70
71#.  Go to the sample application directory:
72
73    .. code-block:: console
74
75        export RTE_SDK=/path/to/rte_sdk cd ${RTE_SDK}/examples/keep_alive
76
77#.  Set the target (a default target is used if not specified). For example:
78
79    .. code-block:: console
80
81        export RTE_TARGET=x86_64-native-linuxapp-gcc
82
83    See the *DPDK Getting Started Guide* for possible RTE_TARGET values.
84
85#.  Build the application:
86
87    .. code-block:: console
88
89        make
90
91Running the Application
92-----------------------
93
94The application has a number of command line options:
95
96.. code-block:: console
97
98    ./build/l2fwd-keepalive [EAL options] \
99            -- -p PORTMASK [-q NQ] [-K PERIOD] [-T PERIOD]
100
101where,
102
103* ``p PORTMASK``: A hexadecimal bitmask of the ports to configure
104
105* ``q NQ``: A number of queues (=ports) per lcore (default is 1)
106
107* ``K PERIOD``: Heartbeat check period in ms(5ms default; 86400 max)
108
109* ``T PERIOD``: statistics will be refreshed each PERIOD seconds (0 to
110  disable, 10 default, 86400 maximum).
111
112To run the application in linuxapp environment with 4 lcores, 16 ports
1138 RX queues per lcore and a ping interval of 10ms, issue the command:
114
115.. code-block:: console
116
117    ./build/l2fwd-keepalive -c f -n 4 -- -q 8 -p ffff -K 10
118
119Refer to the *DPDK Getting Started Guide* for general information on
120running applications and the Environment Abstraction Layer (EAL)
121options.
122
123
124Explanation
125-----------
126
127The following sections provide some explanation of the The
128Keep-Alive/'Liveliness' conceptual scheme. As mentioned in the
129overview section, the initialization and run-time paths are very
130similar to those of the :doc:`l2_forward_real_virtual`.
131
132The Keep-Alive/'Liveliness' conceptual scheme:
133
134* A Keep- Alive Agent Runs every N Milliseconds.
135
136* DPDK Cores respond to the keep-alive agent.
137
138* If keep-alive agent detects time-outs, it notifies the
139  fault management entity through a callback function.
140
141The following sections provide some explanation of the code aspects
142that are specific to the Keep Alive sample application.
143
144The keepalive functionality is initialized with a struct
145rte_keepalive and the callback function to invoke in the
146case of a timeout.
147
148.. code-block:: c
149
150    rte_global_keepalive_info = rte_keepalive_create(&dead_core, NULL);
151    if (rte_global_keepalive_info == NULL)
152        rte_exit(EXIT_FAILURE, "keepalive_create() failed");
153
154The function that issues the pings keepalive_dispatch_pings()
155is configured to run every check_period milliseconds.
156
157.. code-block:: c
158
159    if (rte_timer_reset(&hb_timer,
160            (check_period * rte_get_timer_hz()) / 1000,
161            PERIODICAL,
162            rte_lcore_id(),
163            &rte_keepalive_dispatch_pings,
164            rte_global_keepalive_info
165            ) != 0 )
166        rte_exit(EXIT_FAILURE, "Keepalive setup failure.\n");
167
168The rest of the initialization and run-time path follows
169the same paths as the the L2 forwarding application. The only
170addition to the main processing loop is the mark alive
171functionality and the example random failures.
172
173.. code-block:: c
174
175    rte_keepalive_mark_alive(&rte_global_keepalive_info);
176    cur_tsc = rte_rdtsc();
177
178    /* Die randomly within 7 secs for demo purposes.. */
179    if (cur_tsc - tsc_initial > tsc_lifetime)
180    break;
181
182The rte_keepalive_mark_alive function simply sets the core state to alive.
183
184.. code-block:: c
185
186    static inline void
187    rte_keepalive_mark_alive(struct rte_keepalive *keepcfg)
188    {
189        keepcfg->state_flags[rte_lcore_id()] = ALIVE;
190    }
191