DragonFly On-Line Manual Pages

Search: Section:  


HASH(3)               DragonFly Library Functions Manual               HASH(3)

NAME

hash_initialise, hash_insert, hash_retrieve, hash_delete, hash_iterator_initialise, hash_fetch_next, hash_iterator_deinitialise, hash_deinitialise - hash table manipulation functions

LIBRARY

Hash Library (libhash, -lhash)

SYNOPSIS

#include <sys/types.h> #include <hash.h> int hash_initialise(hash *h, u_int32_t number_of_buckets, u_int32_t (*hash_function)(void *key, u_int32_t number_of_buckets), int (*compare_keys)(void *key1, void *key2), int (*duplicate_key)(void **destination, void *source), void (*free_key)(void *key), void (*free_value)(void *value)); int hash_insert(hash *h, void *key, void *value); int hash_retrieve(hash *h, void *key, void **value); int hash_delete(hash *h, void *key); int hash_iterator_initialise(hash *h, struct hash_iterator *it); int hash_fetch_next(hash *h, struct hash_iterator *it, void **key, void **value); int hash_iterator_deinitialise(hash *h, struct hash_iterator *it); void hash_deinitialise(hash *h);

DESCRIPTION

These functions manipulate a hash table, sometimes known as an associative array. A hash table is a data structure that allows you to store "values" indexed by "key". A key can be any kind of data structure but is commonly a string. Indexing by integer is a reasonable way to simulate a sparse array. The data can also be any kind of data structure, user-defined or built-in. To allow any kind of data structure to be used as a key or a value a void * is used to represent each. While giving flexibility, this use of void * does force the user to supply functions to perform all manipulation and examination of the keys - it is not possible for libhash to check for equality, copy or delete keys on its own. Pointers to these user nominated functions are provided to the hash_initialise() function, called before any other operations on the hash table are allowed. The hash_initialise() function is called to initialise the hash table pointed to by h. It must be called before any other hash function on the hash table h. The parameter number_of_buckets sets the "size" of the hash table - the number of buckets that the data is distributed over. For the best performance the general rule is that this number should be prime and approximately twice the size of the expected data set. Chaining is used in case of collisions so there is no actual limit to the amount of data that can be stored, subject to system resource limits. The parameter hash_function is a pointer to the hash function to be used. The hash function's job is to map keys to buckets so it must be aware of the data structure being used for keys and must return a number between 0 and number_of_buckets - 1 inclusive. A good hash function maps the data set evenly over all buckets. See hash_hash_int(3) for some provided hash functions. This argument may be NULL to use a provided function that works with strings. The parameter compare_keys is a pointer to a function that can provide ordering information about 2 keys. This function should return a value less than 0 if key1 is less than key2, greater than 0 if key1 is greater than key2 and 0 if key1 is equal to key2. This argument may be NULL to use a provided function that works with strings. The parameter duplicate_key is a pointer to a function that will allocate memory and make a copy of a key, the original located at the address source and the address of the new copy to be stored at destination. See hash_copy_int(3) for some provided duplicate functions. This argument may be NULL to use a provided function that works with strings. The parameter free_key is a pointer to a function that will free the memory used by a key allocated with duplicate_key. This argument may be NULL if keys don't need "freeing". The parameter free_value is a pointer to a function that will free the memory used by a value stored in the hash table. This argument may be NULL if values don't need "freeing". The ability to pass NULL is useful when data added to the hash table is already part of another data structure or memory management scheme - the hash table being used as an index. The hash_insert() function is used to add key/value pairs to the hash table. h is a pointer to the hash table to which the pair is to be added. key is the key under which the value is to be indexed under. value is the value to be stored. The hash_insert() function will make a copy of key but only store a pointer to value so it is the caller's responsibility to ensure the memory used by value is not lost prematurely. The hash_retrieve() function is used to retieve data previously stored in the hash table by the function hash_insert(). The parameter h is a pointer to the hash table to retrieve from. The parameter key is a pointer to a key considered equivalent (by the function compare_keys provided to hash_initialise()) to the key used to store the value. The parameter value is a pointer to a location at which to store the address of the data. The hash_delete() function is used to remove a key/value pair from a hash table. The parameter h is a pointer to the hash table from which to remove the pair. The parameter key is a pointer to a key considered equivalent (by the function compare_keys provided to hash_initialise()) to the key used to store the pair. The functions pointed to by the parameters free_key and free_value provided to the hash_initialise() function are used (if not NULL) to free the memory used by the key/value pair. It is important to note that if you are using data retrieved by hash_retrieve() and then call hash_delete() on the same pair, the pointer returned by hash_retrieve() is invalid. The function hash_iterator_initialise() is used to initialise a struct hash_iterator. A struct hash_iterator, in combination with hash_fetch_next(), is used to allow a caller to iterate over all key/value pairs stored in a hash table. The parameter h is a pointer to the hash table to be iterated over. The it parameter is a pointer to the struct hash_iterator. hash_iterator_initialise() must be called before any other functions that use the struct hash_iterator. The hash_fetch_next() function fetches the "next" key/value pair from a hash table. By repeatedly calling it, this function can be used to iterate through all key/value pairs stored in a hash table. Note that the pairs are returned in an order which is an artifact of the internal workings of the library and probably meaningless to a user. The parameter h is a pointer to the hash table to iterate over (this should be the same as the h argument provided to hash_iterator_initialise()). The parameter it is a pointer to the struct hash_iterator previously initialised with hash_iterator_initialise(). The parameters key and value are pointers to locations to store the addresses of the key and value fetched. If hash_fetch_next() is called after the last pair in the hash table has been fetched then 0 is returned, nothing is stored at key or value, and the struct hash_iterator is reset so the next call to hash_fetch_next() will return the first pair in the hash table. If a pair is added to the table whilst a struct hash_iterator is active it may or may not be returned by hash_fetch_next() before the struct hash_iterator is reset. The calling of hash_delete() while a struct hash_iterator is active is safe for any pair that has already been returned by hash_fetch_next(). The function hash_iterator_deinitialise() should be called when finished with a struct hash_iterator. The parameter h is a pointer to the hash table originally passed to hash_iterator_initialise(). The parameter it is the struct hash_iterator to be deinitialised. This function should be called before the memory used by a struct hash_iterator is disposed of. The function hash_deinitialise() is used to dispose of a hash table. The parameter h is the hash table to dispose of. After calling hash_deinitialise() all pointers returned by hash_retrieve() or hash_fetch_next() are invalid.

IMPLEMENTATION NOTES

Upon collision (where 2 keys are mapped by the hash function to the same bucket) ordered linked lists are used. On retrieval a sequential search is used. On a hash table that is overly full (has a lot of collisions) retrieval time is therefore O(n/(2 * number_of_buckets)) as opposed to O(1) for a sparsely populated table. Therefore it is wise to set number_of_buckets to be large enough and to use a hash function well suited for your data.

RETURN VALUES

The hash_initialise(), hash_insert(), hash_retrieve(), hash_delete(), hash_iterator_initialise(), hash_fetch_next() and hash_iterator_deinitialise() functions return 0 on failure and set the global variable errno to indicate the error. The hash_deinitialise() function returns no value.

EXAMPLES

See the tests directory included in the distribution.

ERRORS

[ENOENT] There was no value stored for the provided key or in the case of hash_fetch_next() the end of the hash table has been reached. [ENOMEM] A call to malloc failed.

SEE ALSO

dbopen(3), emmao(8), libhash_convenience(3), queue(3)

HISTORY

The libhash library was written in January 2002 for emmao (a program to kill off rogue processes under Solaris). libhash was written under FreeBSD 4.4.

AUTHORS

Andrew Stevenson <andrew@ugh.net.au>

BUGS

Please report them to <andrew@ugh.net.au>. UgH! January 13, 2002 UgH!

Search: Section: