DragonFly On-Line Manual Pages
HASH(3) DragonFly Library Functions Manual HASH(3)
NAME
hash_initialise, hash_insert, hash_retrieve, hash_delete,
hash_iterator_initialise, hash_fetch_next, hash_iterator_deinitialise,
hash_deinitialise - hash table manipulation functions
LIBRARY
Hash Library (libhash, -lhash)
SYNOPSIS
#include <sys/types.h>
#include <hash.h>
int
hash_initialise(hash *h, u_int32_t number_of_buckets,
u_int32_t (*hash_function)(void *key, u_int32_t number_of_buckets),
int (*compare_keys)(void *key1, void *key2),
int (*duplicate_key)(void **destination, void *source),
void (*free_key)(void *key), void (*free_value)(void *value));
int
hash_insert(hash *h, void *key, void *value);
int
hash_retrieve(hash *h, void *key, void **value);
int
hash_delete(hash *h, void *key);
int
hash_iterator_initialise(hash *h, struct hash_iterator *it);
int
hash_fetch_next(hash *h, struct hash_iterator *it, void **key,
void **value);
int
hash_iterator_deinitialise(hash *h, struct hash_iterator *it);
void
hash_deinitialise(hash *h);
DESCRIPTION
These functions manipulate a hash table, sometimes known as an
associative array. A hash table is a data structure that allows you to
store "values" indexed by "key". A key can be any kind of data structure
but is commonly a string. Indexing by integer is a reasonable way to
simulate a sparse array. The data can also be any kind of data structure,
user-defined or built-in.
To allow any kind of data structure to be used as a key or a value a void
* is used to represent each. While giving flexibility, this use of void *
does force the user to supply functions to perform all manipulation and
examination of the keys - it is not possible for libhash to check for
equality, copy or delete keys on its own. Pointers to these user
nominated functions are provided to the hash_initialise() function,
called before any other operations on the hash table are allowed.
The hash_initialise() function is called to initialise the hash table
pointed to by h. It must be called before any other hash function on the
hash table h.
The parameter number_of_buckets sets the "size" of the hash table - the
number of buckets that the data is distributed over. For the best
performance the general rule is that this number should be prime and
approximately twice the size of the expected data set. Chaining is used
in case of collisions so there is no actual limit to the amount of data
that can be stored, subject to system resource limits.
The parameter hash_function is a pointer to the hash function to be used.
The hash function's job is to map keys to buckets so it must be aware of
the data structure being used for keys and must return a number between 0
and number_of_buckets - 1 inclusive. A good hash function maps the data
set evenly over all buckets. See hash_hash_int(3) for some provided hash
functions. This argument may be NULL to use a provided function that
works with strings.
The parameter compare_keys is a pointer to a function that can provide
ordering information about 2 keys. This function should return a value
less than 0 if key1 is less than key2, greater than 0 if key1 is greater
than key2 and 0 if key1 is equal to key2. This argument may be NULL to
use a provided function that works with strings.
The parameter duplicate_key is a pointer to a function that will allocate
memory and make a copy of a key, the original located at the address
source and the address of the new copy to be stored at destination. See
hash_copy_int(3) for some provided duplicate functions. This argument may
be NULL to use a provided function that works with strings.
The parameter free_key is a pointer to a function that will free the
memory used by a key allocated with duplicate_key. This argument may be
NULL if keys don't need "freeing".
The parameter free_value is a pointer to a function that will free the
memory used by a value stored in the hash table. This argument may be
NULL if values don't need "freeing". The ability to pass NULL is useful
when data added to the hash table is already part of another data
structure or memory management scheme - the hash table being used as an
index.
The hash_insert() function is used to add key/value pairs to the hash
table. h is a pointer to the hash table to which the pair is to be
added. key is the key under which the value is to be indexed under.
value is the value to be stored. The hash_insert() function will make a
copy of key but only store a pointer to value so it is the caller's
responsibility to ensure the memory used by value is not lost
prematurely.
The hash_retrieve() function is used to retieve data previously stored in
the hash table by the function hash_insert(). The parameter h is a
pointer to the hash table to retrieve from. The parameter key is a
pointer to a key considered equivalent (by the function compare_keys
provided to hash_initialise()) to the key used to store the value. The
parameter value is a pointer to a location at which to store the address
of the data.
The hash_delete() function is used to remove a key/value pair from a hash
table. The parameter h is a pointer to the hash table from which to
remove the pair. The parameter key is a pointer to a key considered
equivalent (by the function compare_keys provided to hash_initialise())
to the key used to store the pair. The functions pointed to by the
parameters free_key and free_value provided to the hash_initialise()
function are used (if not NULL) to free the memory used by the key/value
pair. It is important to note that if you are using data retrieved by
hash_retrieve() and then call hash_delete() on the same pair, the pointer
returned by hash_retrieve() is invalid.
The function hash_iterator_initialise() is used to initialise a struct
hash_iterator. A struct hash_iterator, in combination with
hash_fetch_next(), is used to allow a caller to iterate over all
key/value pairs stored in a hash table. The parameter h is a pointer to
the hash table to be iterated over. The it parameter is a pointer to the
struct hash_iterator. hash_iterator_initialise() must be called before
any other functions that use the struct hash_iterator.
The hash_fetch_next() function fetches the "next" key/value pair from a
hash table. By repeatedly calling it, this function can be used to
iterate through all key/value pairs stored in a hash table. Note that the
pairs are returned in an order which is an artifact of the internal
workings of the library and probably meaningless to a user. The parameter
h is a pointer to the hash table to iterate over (this should be the same
as the h argument provided to hash_iterator_initialise()). The parameter
it is a pointer to the struct hash_iterator previously initialised with
hash_iterator_initialise(). The parameters key and value are pointers to
locations to store the addresses of the key and value fetched. If
hash_fetch_next() is called after the last pair in the hash table has
been fetched then 0 is returned, nothing is stored at key or value, and
the struct hash_iterator is reset so the next call to hash_fetch_next()
will return the first pair in the hash table. If a pair is added to the
table whilst a struct hash_iterator is active it may or may not be
returned by hash_fetch_next() before the struct hash_iterator is reset.
The calling of hash_delete() while a struct hash_iterator is active is
safe for any pair that has already been returned by hash_fetch_next().
The function hash_iterator_deinitialise() should be called when finished
with a struct hash_iterator. The parameter h is a pointer to the hash
table originally passed to hash_iterator_initialise(). The parameter it
is the struct hash_iterator to be deinitialised. This function should be
called before the memory used by a struct hash_iterator is disposed of.
The function hash_deinitialise() is used to dispose of a hash table. The
parameter h is the hash table to dispose of. After calling
hash_deinitialise() all pointers returned by hash_retrieve() or
hash_fetch_next() are invalid.
IMPLEMENTATION NOTES
Upon collision (where 2 keys are mapped by the hash function to the same
bucket) ordered linked lists are used. On retrieval a sequential search
is used. On a hash table that is overly full (has a lot of collisions)
retrieval time is therefore O(n/(2 * number_of_buckets)) as opposed to
O(1) for a sparsely populated table. Therefore it is wise to set
number_of_buckets to be large enough and to use a hash function well
suited for your data.
RETURN VALUES
The hash_initialise(), hash_insert(), hash_retrieve(), hash_delete(),
hash_iterator_initialise(), hash_fetch_next() and
hash_iterator_deinitialise() functions return 0 on failure and set the
global variable errno to indicate the error.
The hash_deinitialise() function returns no value.
EXAMPLES
See the tests directory included in the distribution.
ERRORS
[ENOENT]
There was no value stored for the provided key or in the case of
hash_fetch_next() the end of the hash table has been reached.
[ENOMEM]
A call to malloc failed.
SEE ALSO
dbopen(3), emmao(8), libhash_convenience(3), queue(3)
HISTORY
The libhash library was written in January 2002 for emmao (a program to
kill off rogue processes under Solaris). libhash was written under
FreeBSD 4.4.
AUTHORS
Andrew Stevenson <andrew@ugh.net.au>
BUGS
Please report them to <andrew@ugh.net.au>.
UgH! January 13, 2002 UgH!