Documentation / technical / api-hashmap.txton commit Merge branch 'mh/notes-allow-reading-treeish' (b4e8e0e)
   1hashmap API
   2===========
   3
   4The hashmap API is a generic implementation of hash-based key-value mappings.
   5
   6Data Structures
   7---------------
   8
   9`struct hashmap`::
  10
  11        The hash table structure. Members can be used as follows, but should
  12        not be modified directly:
  13+
  14The `size` member keeps track of the total number of entries (0 means the
  15hashmap is empty).
  16+
  17`tablesize` is the allocated size of the hash table. A non-0 value indicates
  18that the hashmap is initialized. It may also be useful for statistical purposes
  19(i.e. `size / tablesize` is the current load factor).
  20+
  21`cmpfn` stores the comparison function specified in `hashmap_init()`. In
  22advanced scenarios, it may be useful to change this, e.g. to switch between
  23case-sensitive and case-insensitive lookup.
  24
  25`struct hashmap_entry`::
  26
  27        An opaque structure representing an entry in the hash table, which must
  28        be used as first member of user data structures. Ideally it should be
  29        followed by an int-sized member to prevent unused memory on 64-bit
  30        systems due to alignment.
  31+
  32The `hash` member is the entry's hash code and the `next` member points to the
  33next entry in case of collisions (i.e. if multiple entries map to the same
  34bucket).
  35
  36`struct hashmap_iter`::
  37
  38        An iterator structure, to be used with hashmap_iter_* functions.
  39
  40Types
  41-----
  42
  43`int (*hashmap_cmp_fn)(const void *entry, const void *entry_or_key, const void *keydata)`::
  44
  45        User-supplied function to test two hashmap entries for equality. Shall
  46        return 0 if the entries are equal.
  47+
  48This function is always called with non-NULL `entry` / `entry_or_key`
  49parameters that have the same hash code. When looking up an entry, the `key`
  50and `keydata` parameters to hashmap_get and hashmap_remove are always passed
  51as second and third argument, respectively. Otherwise, `keydata` is NULL.
  52
  53Functions
  54---------
  55
  56`unsigned int strhash(const char *buf)`::
  57`unsigned int strihash(const char *buf)`::
  58`unsigned int memhash(const void *buf, size_t len)`::
  59`unsigned int memihash(const void *buf, size_t len)`::
  60
  61        Ready-to-use hash functions for strings, using the FNV-1 algorithm (see
  62        http://www.isthe.com/chongo/tech/comp/fnv).
  63+
  64`strhash` and `strihash` take 0-terminated strings, while `memhash` and
  65`memihash` operate on arbitrary-length memory.
  66+
  67`strihash` and `memihash` are case insensitive versions.
  68
  69`unsigned int sha1hash(const unsigned char *sha1)`::
  70
  71        Converts a cryptographic hash (e.g. SHA-1) into an int-sized hash code
  72        for use in hash tables. Cryptographic hashes are supposed to have
  73        uniform distribution, so in contrast to `memhash()`, this just copies
  74        the first `sizeof(int)` bytes without shuffling any bits. Note that
  75        the results will be different on big-endian and little-endian
  76        platforms, so they should not be stored or transferred over the net.
  77
  78`void hashmap_init(struct hashmap *map, hashmap_cmp_fn equals_function, size_t initial_size)`::
  79
  80        Initializes a hashmap structure.
  81+
  82`map` is the hashmap to initialize.
  83+
  84The `equals_function` can be specified to compare two entries for equality.
  85If NULL, entries are considered equal if their hash codes are equal.
  86+
  87If the total number of entries is known in advance, the `initial_size`
  88parameter may be used to preallocate a sufficiently large table and thus
  89prevent expensive resizing. If 0, the table is dynamically resized.
  90
  91`void hashmap_free(struct hashmap *map, int free_entries)`::
  92
  93        Frees a hashmap structure and allocated memory.
  94+
  95`map` is the hashmap to free.
  96+
  97If `free_entries` is true, each hashmap_entry in the map is freed as well
  98(using stdlib's free()).
  99
 100`void hashmap_entry_init(void *entry, unsigned int hash)`::
 101
 102        Initializes a hashmap_entry structure.
 103+
 104`entry` points to the entry to initialize.
 105+
 106`hash` is the hash code of the entry.
 107
 108`void *hashmap_get(const struct hashmap *map, const void *key, const void *keydata)`::
 109
 110        Returns the hashmap entry for the specified key, or NULL if not found.
 111+
 112`map` is the hashmap structure.
 113+
 114`key` is a hashmap_entry structure (or user data structure that starts with
 115hashmap_entry) that has at least been initialized with the proper hash code
 116(via `hashmap_entry_init`).
 117+
 118If an entry with matching hash code is found, `key` and `keydata` are passed
 119to `hashmap_cmp_fn` to decide whether the entry matches the key.
 120
 121`void *hashmap_get_from_hash(const struct hashmap *map, unsigned int hash, const void *keydata)`::
 122
 123        Returns the hashmap entry for the specified hash code and key data,
 124        or NULL if not found.
 125+
 126`map` is the hashmap structure.
 127+
 128`hash` is the hash code of the entry to look up.
 129+
 130If an entry with matching hash code is found, `keydata` is passed to
 131`hashmap_cmp_fn` to decide whether the entry matches the key. The
 132`entry_or_key` parameter points to a bogus hashmap_entry structure that
 133should not be used in the comparison.
 134
 135`void *hashmap_get_next(const struct hashmap *map, const void *entry)`::
 136
 137        Returns the next equal hashmap entry, or NULL if not found. This can be
 138        used to iterate over duplicate entries (see `hashmap_add`).
 139+
 140`map` is the hashmap structure.
 141+
 142`entry` is the hashmap_entry to start the search from, obtained via a previous
 143call to `hashmap_get` or `hashmap_get_next`.
 144
 145`void hashmap_add(struct hashmap *map, void *entry)`::
 146
 147        Adds a hashmap entry. This allows to add duplicate entries (i.e.
 148        separate values with the same key according to hashmap_cmp_fn).
 149+
 150`map` is the hashmap structure.
 151+
 152`entry` is the entry to add.
 153
 154`void *hashmap_put(struct hashmap *map, void *entry)`::
 155
 156        Adds or replaces a hashmap entry. If the hashmap contains duplicate
 157        entries equal to the specified entry, only one of them will be replaced.
 158+
 159`map` is the hashmap structure.
 160+
 161`entry` is the entry to add or replace.
 162+
 163Returns the replaced entry, or NULL if not found (i.e. the entry was added).
 164
 165`void *hashmap_remove(struct hashmap *map, const void *key, const void *keydata)`::
 166
 167        Removes a hashmap entry matching the specified key. If the hashmap
 168        contains duplicate entries equal to the specified key, only one of
 169        them will be removed.
 170+
 171`map` is the hashmap structure.
 172+
 173`key` is a hashmap_entry structure (or user data structure that starts with
 174hashmap_entry) that has at least been initialized with the proper hash code
 175(via `hashmap_entry_init`).
 176+
 177If an entry with matching hash code is found, `key` and `keydata` are
 178passed to `hashmap_cmp_fn` to decide whether the entry matches the key.
 179+
 180Returns the removed entry, or NULL if not found.
 181
 182`void hashmap_iter_init(struct hashmap *map, struct hashmap_iter *iter)`::
 183`void *hashmap_iter_next(struct hashmap_iter *iter)`::
 184`void *hashmap_iter_first(struct hashmap *map, struct hashmap_iter *iter)`::
 185
 186        Used to iterate over all entries of a hashmap.
 187+
 188`hashmap_iter_init` initializes a `hashmap_iter` structure.
 189+
 190`hashmap_iter_next` returns the next hashmap_entry, or NULL if there are no
 191more entries.
 192+
 193`hashmap_iter_first` is a combination of both (i.e. initializes the iterator
 194and returns the first entry, if any).
 195
 196`const char *strintern(const char *string)`::
 197`const void *memintern(const void *data, size_t len)`::
 198
 199        Returns the unique, interned version of the specified string or data,
 200        similar to the `String.intern` API in Java and .NET, respectively.
 201        Interned strings remain valid for the entire lifetime of the process.
 202+
 203Can be used as `[x]strdup()` or `xmemdupz` replacement, except that interned
 204strings / data must not be modified or freed.
 205+
 206Interned strings are best used for short strings with high probability of
 207duplicates.
 208+
 209Uses a hashmap to store the pool of interned strings.
 210
 211Usage example
 212-------------
 213
 214Here's a simple usage example that maps long keys to double values.
 215------------
 216struct hashmap map;
 217
 218struct long2double {
 219        struct hashmap_entry ent; /* must be the first member! */
 220        long key;
 221        double value;
 222};
 223
 224static int long2double_cmp(const struct long2double *e1, const struct long2double *e2, const void *unused)
 225{
 226        return !(e1->key == e2->key);
 227}
 228
 229void long2double_init(void)
 230{
 231        hashmap_init(&map, (hashmap_cmp_fn) long2double_cmp, 0);
 232}
 233
 234void long2double_free(void)
 235{
 236        hashmap_free(&map, 1);
 237}
 238
 239static struct long2double *find_entry(long key)
 240{
 241        struct long2double k;
 242        hashmap_entry_init(&k, memhash(&key, sizeof(long)));
 243        k.key = key;
 244        return hashmap_get(&map, &k, NULL);
 245}
 246
 247double get_value(long key)
 248{
 249        struct long2double *e = find_entry(key);
 250        return e ? e->value : 0;
 251}
 252
 253void set_value(long key, double value)
 254{
 255        struct long2double *e = find_entry(key);
 256        if (!e) {
 257                e = malloc(sizeof(struct long2double));
 258                hashmap_entry_init(e, memhash(&key, sizeof(long)));
 259                e->key = key;
 260                hashmap_add(&map, e);
 261        }
 262        e->value = value;
 263}
 264------------
 265
 266Using variable-sized keys
 267-------------------------
 268
 269The `hashmap_entry_get` and `hashmap_entry_remove` functions expect an ordinary
 270`hashmap_entry` structure as key to find the correct entry. If the key data is
 271variable-sized (e.g. a FLEX_ARRAY string) or quite large, it is undesirable
 272to create a full-fledged entry structure on the heap and copy all the key data
 273into the structure.
 274
 275In this case, the `keydata` parameter can be used to pass
 276variable-sized key data directly to the comparison function, and the `key`
 277parameter can be a stripped-down, fixed size entry structure allocated on the
 278stack.
 279
 280See test-hashmap.c for an example using arbitrary-length strings as keys.