hash_lib.rst revision 97f17497
1..  BSD LICENSE
2    Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
3    All rights reserved.
4
5    Redistribution and use in source and binary forms, with or without
6    modification, are permitted provided that the following conditions
7    are met:
8
9    * Redistributions of source code must retain the above copyright
10    notice, this list of conditions and the following disclaimer.
11    * Redistributions in binary form must reproduce the above copyright
12    notice, this list of conditions and the following disclaimer in
13    the documentation and/or other materials provided with the
14    distribution.
15    * Neither the name of Intel Corporation nor the names of its
16    contributors may be used to endorse or promote products derived
17    from this software without specific prior written permission.
18
19    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
20    "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
21    LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
22    A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
23    OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
24    SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
25    LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
26    DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
27    THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
28    (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
29    OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
30
31.. _Hash_Library:
32
33Hash Library
34============
35
36The DPDK provides a Hash Library for creating hash table for fast lookup.
37The hash table is a data structure optimized for searching through a set of entries that are each identified by a unique key.
38For increased performance the DPDK Hash requires that all the keys have the same number of bytes which is set at the hash creation time.
39
40Hash API Overview
41-----------------
42
43The main configuration parameters for the hash are:
44
45*   Total number of hash entries
46
47*   Size of the key in bytes
48
49The hash also allows the configuration of some low-level implementation related parameters such as:
50
51*   Hash function to translate the key into a bucket index
52
53The main methods exported by the hash are:
54
55*   Add entry with key: The key is provided as input. If a new entry is successfully added to the hash for the specified key,
56    or there is already an entry in the hash for the specified key, then the position of the entry is returned.
57    If the operation was not successful, for example due to lack of free entries in the hash, then a negative value is returned;
58
59*   Delete entry with key: The key is provided as input. If an entry with the specified key is found in the hash,
60    then the entry is removed from the hash and the position where the entry was found in the hash is returned.
61    If no entry with the specified key exists in the hash, then a negative value is returned
62
63*   Lookup for entry with key: The key is provided as input. If an entry with the specified key is found in the hash (lookup hit),
64    then the position of the entry is returned, otherwise (lookup miss) a negative value is returned.
65
66Apart from these method explained above, the API allows the user three more options:
67
68*   Add / lookup / delete with key and precomputed hash: Both the key and its precomputed hash are provided as input. This allows
69    the user to perform these operations faster, as hash is already computed.
70
71*   Add / lookup with key and data: A pair of key-value is provided as input. This allows the user to store
72    not only the key, but also data which may be either a 8-byte integer or a pointer to external data (if data size is more than 8 bytes).
73
74*   Combination of the two options above: User can provide key, precomputed hash and data.
75
76Also, the API contains a method to allow the user to look up entries in bursts, achieving higher performance
77than looking up individual entries, as the function prefetches next entries at the time it is operating
78with the first ones, which reduces significantly the impact of the necessary memory accesses.
79Notice that this method uses a pipeline of 8 entries (4 stages of 2 entries), so it is highly recommended
80to use at least 8 entries per burst.
81
82The actual data associated with each key can be either managed by the user using a separate table that
83mirrors the hash in terms of number of entries and position of each entry,
84as shown in the Flow Classification use case describes in the following sections,
85or stored in the hash table itself.
86
87The example hash tables in the L2/L3 Forwarding sample applications defines which port to forward a packet to based on a packet flow identified by the five-tuple lookup.
88However, this table could also be used for more sophisticated features and provide many other functions and actions that could be performed on the packets and flows.
89
90Multi-process support
91---------------------
92
93The hash library can be used in a multi-process environment, minding that only lookups are thread-safe.
94The only function that can only be used in single-process mode is rte_hash_set_cmp_func(), which sets up
95a custom compare function, which is assigned to a function pointer (therefore, it is not supported in
96multi-process mode).
97
98Implementation Details
99----------------------
100
101The hash table has two main tables:
102
103* First table is an array of entries which is further divided into buckets,
104  with the same number of consecutive array entries in each bucket. Each entry contains the computed primary
105  and secondary hashes of a given key (explained below), and an index to the second table.
106
107* The second table is an array of all the keys stored in the hash table and its data associated to each key.
108
109The hash library uses the cuckoo hash method to resolve collisions.
110For any input key, there are two possible buckets (primary and secondary/alternative location)
111where that key can be stored in the hash, therefore only the entries within those bucket need to be examined
112when the key is looked up.
113The lookup speed is achieved by reducing the number of entries to be scanned from the total
114number of hash entries down to the number of entries in the two hash buckets,
115as opposed to the basic method of linearly scanning all the entries in the array.
116The hash uses a hash function (configurable) to translate the input key into a 4-byte key signature.
117The bucket index is the key signature modulo the number of hash buckets.
118
119Once the buckets are identified, the scope of the hash add,
120delete and lookup operations is reduced to the entries in those buckets (it is very likely that entries are in the primary bucket).
121
122To speed up the search logic within the bucket, each hash entry stores the 4-byte key signature together with the full key for each hash entry.
123For large key sizes, comparing the input key against a key from the bucket can take significantly more time than
124comparing the 4-byte signature of the input key against the signature of a key from the bucket.
125Therefore, the signature comparison is done first and the full key comparison done only when the signatures matches.
126The full key comparison is still necessary, as two input keys from the same bucket can still potentially have the same 4-byte hash signature,
127although this event is relatively rare for hash functions providing good uniform distributions for the set of input keys.
128
129Example of lookup:
130
131First of all, the primary bucket is identified and entry is likely to be stored there.
132If signature was stored there, we compare its key against the one provided and return the position
133where it was stored and/or the data associated to that key if there is a match.
134If signature is not in the primary bucket, the secondary bucket is looked up, where same procedure
135is carried out. If there is no match there either, key is considered not to be in the table.
136
137Example of addition:
138
139Like lookup, the primary and secondary buckets are identified. If there is an empty slot in
140the primary bucket, primary and secondary signatures are stored in that slot, key and data (if any) are added to
141the second table and an index to the position in the second table is stored in the slot of the first table.
142If there is no space in the primary bucket, one of the entries on that bucket is pushed to its alternative location,
143and the key to be added is inserted in its position.
144To know where the alternative bucket of the evicted entry is, the secondary signature is looked up and alternative bucket index
145is calculated from doing the modulo, as seen above. If there is room in the alternative bucket, the evicted entry
146is stored in it. If not, same process is repeated (one of the entries gets pushed) until a non full bucket is found.
147Notice that despite all the entry movement in the first table, the second table is not touched, which would impact
148greatly in performance.
149
150In the very unlikely event that table enters in a loop where same entries are being evicted indefinitely,
151key is considered not able to be stored.
152With random keys, this method allows the user to get around 90% of the table utilization, without
153having to drop any stored entry (LRU) or allocate more memory (extended buckets).
154
155Entry distribution in hash table
156--------------------------------
157
158As mentioned above, Cuckoo hash implementation pushes elements out of their bucket,
159if there is a new entry to be added which primary location coincides with their current bucket,
160being pushed to their alternative location.
161Therefore, as user adds more entries to the hash table, distribution of the hash values
162in the buckets will change, being most of them in their primary location and a few in
163their secondary location, which the later will increase, as table gets busier.
164This information is quite useful, as performance may be lower as more entries
165are evicted to their secondary location.
166
167See the tables below showing example entry distribution as table utilization increases.
168
169.. _table_hash_lib_1:
170
171.. table:: Entry distribution measured with an example table with 1024 random entries using jhash algorithm
172
173   +--------------+-----------------------+-------------------------+
174   | % Table used | % In Primary location | % In Secondary location |
175   +==============+=======================+=========================+
176   |      25      |         100           |           0             |
177   +--------------+-----------------------+-------------------------+
178   |      50      |         96.1          |           3.9           |
179   +--------------+-----------------------+-------------------------+
180   |      75      |         88.2          |           11.8          |
181   +--------------+-----------------------+-------------------------+
182   |      80      |         86.3          |           13.7          |
183   +--------------+-----------------------+-------------------------+
184   |      85      |         83.1          |           16.9          |
185   +--------------+-----------------------+-------------------------+
186   |      90      |         77.3          |           22.7          |
187   +--------------+-----------------------+-------------------------+
188   |      95.8    |         64.5          |           35.5          |
189   +--------------+-----------------------+-------------------------+
190
191|
192
193.. _table_hash_lib_2:
194
195.. table:: Entry distribution measured with an example table with 1 million random entries using jhash algorithm
196
197   +--------------+-----------------------+-------------------------+
198   | % Table used | % In Primary location | % In Secondary location |
199   +==============+=======================+=========================+
200   |      50      |         96            |           4             |
201   +--------------+-----------------------+-------------------------+
202   |      75      |         86.9          |           13.1          |
203   +--------------+-----------------------+-------------------------+
204   |      80      |         83.9          |           16.1          |
205   +--------------+-----------------------+-------------------------+
206   |      85      |         80.1          |           19.9          |
207   +--------------+-----------------------+-------------------------+
208   |      90      |         74.8          |           25.2          |
209   +--------------+-----------------------+-------------------------+
210   |      94.5    |         67.4          |           32.6          |
211   +--------------+-----------------------+-------------------------+
212
213.. note::
214
215   Last values on the tables above are the average maximum table
216   utilization with random keys and using Jenkins hash function.
217
218Use Case: Flow Classification
219-----------------------------
220
221Flow classification is used to map each input packet to the connection/flow it belongs to.
222This operation is necessary as the processing of each input packet is usually done in the context of their connection,
223so the same set of operations is applied to all the packets from the same flow.
224
225Applications using flow classification typically have a flow table to manage, with each separate flow having an entry associated with it in this table.
226The size of the flow table entry is application specific, with typical values of 4, 16, 32 or 64 bytes.
227
228Each application using flow classification typically has a mechanism defined to uniquely identify a flow based on
229a number of fields read from the input packet that make up the flow key.
230One example is to use the DiffServ 5-tuple made up of the following fields of the IP and transport layer packet headers:
231Source IP Address, Destination IP Address, Protocol, Source Port, Destination Port.
232
233The DPDK hash provides a generic method to implement an application specific flow classification mechanism.
234Given a flow table implemented as an array, the application should create a hash object with the same number of entries as the flow table and
235with the hash key size set to the number of bytes in the selected flow key.
236
237The flow table operations on the application side are described below:
238
239*   Add flow: Add the flow key to hash.
240    If the returned position is valid, use it to access the flow entry in the flow table for adding a new flow or
241    updating the information associated with an existing flow.
242    Otherwise, the flow addition failed, for example due to lack of free entries for storing new flows.
243
244*   Delete flow: Delete the flow key from the hash. If the returned position is valid,
245    use it to access the flow entry in the flow table to invalidate the information associated with the flow.
246
247*   Lookup flow: Lookup for the flow key in the hash.
248    If the returned position is valid (flow lookup hit), use the returned position to access the flow entry in the flow table.
249    Otherwise (flow lookup miss) there is no flow registered for the current packet.
250
251References
252----------
253
254*   Donald E. Knuth, The Art of Computer Programming, Volume 3: Sorting and Searching (2nd Edition), 1998, Addison-Wesley Professional
255