DragonFly On-Line Manual Pages
HTABLE(3) htable manual HTABLE(3)
NAME
HTABLE_HEAD, HTABLE_ENTRY, HTABLE_SIZE, HTABLE_COUNT, HTABLE_EMPTY,
HTABLE_COLLS, HTABLE_LOAD, HTABLE_INITIALIZER, HTABLE_INIT,
HTABLE_PROTOTYPE, HTABLE_GENERATE, HTABLE_INSERT, HTABLE_REMOVE,
HTABLE_LOOKUP, HTABLE_FIRST, HTABLE_NEXT, HTABLE_FOREACH,
implementation of hash tables.
SYNOPSIS
#include "htable.h"
HTABLE_HEAD(NAME, SIZE, TYPE);
HTABLE_ENTRY(TYPE);
size_t
HTABLE_SIZE(HTABLE_HEAD *head);
uint32_t
HTABLE_COUNT(HTABLE_HEAD *head);
int
HTABLE_EMPTY(HTABLE_HEAD *head);
float
HTABLE_COLLS(HTABLE_HEAD *head);
float
HTABLE_LOAD(HTABLE_HEAD *head);
HTABLE_INITIALIZER(HTABLE_HEAD *head);
HTABLE_INIT(HTABLE_HEAD *head);
HTABLE_PROTOTYPE(NAME, TYPE);
HTABLE_GENERATE(NAME, TYPE, KEY, CMP);
HTABLE_ENTRY *
HTABLE_INSERT(NAME, HTABLE_HEAD *head, HTABLE_ENTRY *elm);
HTABLE_ENTRY *
HTABLE_REMOVE(NAME, HTABLE_HEAD *head, HTABLE_ENTRY *elm);
HTABLE_ENTRY *
HTABLE_LOOKUP(NAME, HTABLE_HEAD *head, HTABLE_ENTRY *elm);
HTABLE_ENTRY *
HTABLE_FIRST(NAME, HTABLE_HEAD *head);
HTABLE_ENTRY *
HTABLE_NEXT(NAME, HTABLE_HEAD *head, HTABLE_ENTRY *elm);
HTABLE_FOREACH(HTABLE_ENTRY *elm, NAME, HTABLE_HEAD *head);
DESCRIPTION
These macros define and operate on a hash table data structure. The
following functionalities are supported:
1 Insertion of a new entry in the hash table.
2 Retrieval of an entry in the hash table.
3 Removal of an entry from the hash table.
4 Iterating over all entries found in the hash table.
5 Computing the number of entries found in the hash table.
6 Computing the collision percentage for the hash table.
7 Computing the load percentage for the hash table.
Hash tables are ideal for applications with datasets needing a lot of
adding, searching or removal, as those are normally constant-time
operations. The primary operation it supports efficiently is a lookup:
given a key (e.g. a person's name), find the corresponding value (e.g.
that person's telephone number). It works by transforming the key using
a hash function into a hash, a number that is used as an index in an
array to locate the desired location ("bucket") where the values should
be.
Hash tables support the efficient insertion of new entries, in expected
O(1) time. The time spent in searching depends on the hash function and
the load of the hash table; both insertion and search approach O(1)
time with well chosen values and hashes.
HASH TABLES
In the macro definitions, TYPE is the name tag of a user defined
structure that must contain a field of type HTABLE_ENTRY. For example,
to define a data structure looking like a phone book that will be
stored in a hash table, one could write something like:
struct phonebook_s {
char *name, *phone;
HTABLE_ENTRY (phonebook_s);
};
The argument NAME in the macro definitions is the name tag of a user
defined structure that must be declared using the macro HTABLE_HEAD()
as follows:
HTABLE_HEAD (NAME, SIZE, TYPE) hash_table;
The argument NAME has to be a unique name prefix for every hash table
that is defined. SIZE is the number of buckets the hash table will
hold. A pointer to such a hash table structure could then later be
defined as:
struct NAME *tableptr;
Once a hash table was defined, it must be initialized using the
HTABLE_INIT() macro, head being a reference to this hash table. It is
also possible to initialize it statically by using the
HTABLE_INITIALIZER() macro like this:
HTABLE_HEAD (NAME, SIZE, TYPE) htable =
HTABLE_INITIALIZER (&htable);
In order to use the functions that manipulate the hash table structure,
their prototypes need to be declared with the HTABLE_PROTOTYPE() macro,
where NAME is a unique identifier for this particular hash_table. The
TYPE argument is the type of the structure that is being managed by the
hash table.
The function bodies are generated with the HTABLE_GENERATE() macro,
which must be used only once. It takes the same two first arguments as
the HTABLE_PROTOTYPE() macro, and the two last arguments are the names
of user-defined functions used to extract key information from a hash
table entry and to compare two entries.
The function used to retrieve information related to the key given a
hash table entry must have the following prototype:
void (*key) (HTABLE_ENTRY *elm, char **key, int *len);
where elm is the given pointer to the hash table entry, key and len
must be filled in with respectively the pointer to the corresponding
key and with this key's length.
The function used to compare two hash tables entries must follow this
prototype:
int (*cmp) (HTABLE_ENTRY *elm1, HTABLE_ENTRY *elm2);
where elm1 and elm2 are the entries to compare. This function must
return an integer value, being 0 in case the keys are equal, and a
value different from 0 otherwise.
See section EXAMPLE for possible implementations of such functions.
The HTABLE_INSERT() macro inserts the element elm in the hash table
pointed at by head. A pointer to the element is returned in case it was
successfully inserted. Otherwise, NULL is returned, meaning the
insertion did not occur (e.g. the element was already stored in the
hash table).
The HTABLE_REMOVE() macro removes the element elm from the hash table
pointed at by head. The removed element is returned to the user so it
can be freed if necessary. If the element was not found, NULL is
returned.
The HTABLE_LOOKUP() macro finds the element elm in the hash table
pointed at by head. The data corresponding to the removed element is
returned to the user (NULL is returned in case the element was not
found).
The HTABLE_FIRST() and HTABLE_NEXT() macros can be used to traverse the
hash table:
for (elm = HTABLE_FIRST (NAME, &head);
elm != NULL;
elm = HTABLE_NEXT (NAME, &head, elm))
Or, for simplicity, one can use the HTABLE_FOREACH() macro:
HTABLE_FOREACH (elm, NAME, &head)
There are also some macros useful to get information about a given hash
table:
The HTABLE_SIZE() macro returns the total number of buckets contained
in the hash table pointed at by head.
The HTABLE_COUNT() returns the number of items contained in the hash
table pointed at by head.
The HTABLE_COLLS() returns a percentage indicating the collisions (e.g.
when two keys hash to the same bucket) there are in the hash table
pointed at by head.
The HTABLE_LOAD() macro returns a percentage indicating the load factor
(e.g. the number of filled buckets over the total number of buckets) of
the hash table pointed at by head.
The HTABLE_EMPTY() macro should be used to check wether a hash table is
empty.
EXAMPLES
The following example demonstrates how to declare a hash table. Values
are inserted into it, and one of them is then retrieved from the hash
table. Next, the contents of the hash table are printed, and one
element is finally removed. Last, the total number of items contained
in the hash table is displayed.
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include "htable.h"
#define HSIZE 100
struct book_s {
char *name, *phone;
HTABLE_ENTRY (book_s);
};
void
extract_key (struct book_s *data, char **key, int *len)
{
*key = data->name;
*len = strlen (data->name);
}
int
compare (struct book_s *data1, struct book_s *data2)
{
const int KEYLEN = strlen (data1->name);
if (strlen (data2->name) == KEYLEN
&& !memcmp (data1->name, data2->name, KEYLEN))
return 0;
else
return 1;
}
int
main ()
{
int i;
struct book_s *elm;
struct book_s entries[] = {
{"friend1", "555-1111"},
{"friend2", "555-2222"},
{"person3", "555-3333"},
{"person4", "555-4444"}
};
const int NOENTRIES = sizeof (entries) / sizeof (struct book_s);
HTABLE_HEAD (pbook, HSIZE, book_s) htable =
HTABLE_INITIALIZER (&htable);
HTABLE_GENERATE (pbook, book_s, extract_key, compare);
for (i = 0; i < NOENTRIES; i++)
HTABLE_INSERT (pbook, &htable, &entries[i]);
elm = HTABLE_LOOKUP (pbook, &htable, &entries[1]);
printf ("friend2's Phone number is: %s\n", elm->phone);
HTABLE_FOREACH (elm, pbook, &htable)
{
printf ("Entry:\n");
printf (" name: %s\n", elm->name);
printf ("phone: %s\n", elm->phone);
}
elm = HTABLE_REMOVE (pbook, &htable, &entries[2]);
printf ("Number of items in hash table: %u\n",
HTABLE_COUNT (&htable));
return EXIT_SUCCESS;
}
NOTES
If the hash table macros need to be used several times, it is advised
to build wrappers around them, as code is inlined and executable could
have its size grow needlessly. For example, to remove elements from a
hash table and free the corresponding data structure associated with
it, one could write the following function:
/*
* Wrapper around the HTABLE_REMOVE macro.
*
* A hash table was previously defined using:
* HTABLE_HEAD (my_hash, HSIZE, my_entry) htable =
* HTABLE_INITIALIZER (&htable);
*/
void
htable_free (struct my_hash *ht, struct my_entry *elm)
{
struct my_entry *removed;
removed = HTABLE_REMOVE (my_hash, ht, elm);
if (removed != NULL)
free (removed);
}
HASH FUNCTIONS
By default, Jenkin's hash function "LOOKUP" is used to transform a key
into a bucket number (reference can be found in the SEE ALSO section).
However, other hash functions are available and can be chosen at
compile time by defining the HASH_FUNCTION macro.
The following functions are available:
HASH_JEN
The default hash function, Jenkins' Lookup hash.
HASH_OAT
Jenkins' "One at a time" hash function.
For example, to specify that Jenkins' "One at a time" hash function
must be used for the "test" program, one must compile it using a
command such as:
cc -I/path/to/htable.h -DHASH_FUNCTION=HASH_OAT -o test test.c
To determine the best hash function for your key domain, you can use
the HTABLE_COLLS and HTABLE_LOAD macros to compare the collisions and
load factors obtained with the different hash functions.
SEE ALSO
Bob Jenkins' work on hash functions can be found at:
http://burtleburtle.net/bob/hash/
Those macros were greatly inspired by the implementations of spray and
red-black trees found in the *BSD kernels (see file
/usr/src/sys/sys/tree.h).
AUTHORS
Frederic Culot <frederic@culot.org>.
Version 1.2 August 19, 2010 HTABLE(3)