Documentation / technical / api-hashmap.txton commit Merge branch 'rs/sha1-name-readdir-optim' (5ab148d)
   1hashmap API
   2===========
   3
   4The hashmap API is a generic implementation of hash-based key-value mappings.
   5
   6Data Structures
   7---------------
   8
   9`struct hashmap`::
  10
  11        The hash table structure. Members can be used as follows, but should
  12        not be modified directly:
  13+
  14The `size` member keeps track of the total number of entries (0 means the
  15hashmap is empty).
  16+
  17`tablesize` is the allocated size of the hash table. A non-0 value indicates
  18that the hashmap is initialized. It may also be useful for statistical purposes
  19(i.e. `size / tablesize` is the current load factor).
  20+
  21`cmpfn` stores the comparison function specified in `hashmap_init()`. In
  22advanced scenarios, it may be useful to change this, e.g. to switch between
  23case-sensitive and case-insensitive lookup.
  24+
  25When `disallow_rehash` is set, automatic rehashes are prevented during inserts
  26and deletes.
  27
  28`struct hashmap_entry`::
  29
  30        An opaque structure representing an entry in the hash table, which must
  31        be used as first member of user data structures. Ideally it should be
  32        followed by an int-sized member to prevent unused memory on 64-bit
  33        systems due to alignment.
  34+
  35The `hash` member is the entry's hash code and the `next` member points to the
  36next entry in case of collisions (i.e. if multiple entries map to the same
  37bucket).
  38
  39`struct hashmap_iter`::
  40
  41        An iterator structure, to be used with hashmap_iter_* functions.
  42
  43Types
  44-----
  45
  46`int (*hashmap_cmp_fn)(const void *entry, const void *entry_or_key, const void *keydata)`::
  47
  48        User-supplied function to test two hashmap entries for equality. Shall
  49        return 0 if the entries are equal.
  50+
  51This function is always called with non-NULL `entry` / `entry_or_key`
  52parameters that have the same hash code. When looking up an entry, the `key`
  53and `keydata` parameters to hashmap_get and hashmap_remove are always passed
  54as second and third argument, respectively. Otherwise, `keydata` is NULL.
  55
  56Functions
  57---------
  58
  59`unsigned int strhash(const char *buf)`::
  60`unsigned int strihash(const char *buf)`::
  61`unsigned int memhash(const void *buf, size_t len)`::
  62`unsigned int memihash(const void *buf, size_t len)`::
  63`unsigned int memihash_cont(unsigned int hash_seed, const void *buf, size_t len)`::
  64
  65        Ready-to-use hash functions for strings, using the FNV-1 algorithm (see
  66        http://www.isthe.com/chongo/tech/comp/fnv).
  67+
  68`strhash` and `strihash` take 0-terminated strings, while `memhash` and
  69`memihash` operate on arbitrary-length memory.
  70+
  71`strihash` and `memihash` are case insensitive versions.
  72+
  73`memihash_cont` is a variant of `memihash` that allows a computation to be
  74continued with another chunk of data.
  75
  76`unsigned int sha1hash(const unsigned char *sha1)`::
  77
  78        Converts a cryptographic hash (e.g. SHA-1) into an int-sized hash code
  79        for use in hash tables. Cryptographic hashes are supposed to have
  80        uniform distribution, so in contrast to `memhash()`, this just copies
  81        the first `sizeof(int)` bytes without shuffling any bits. Note that
  82        the results will be different on big-endian and little-endian
  83        platforms, so they should not be stored or transferred over the net.
  84
  85`void hashmap_init(struct hashmap *map, hashmap_cmp_fn equals_function, size_t initial_size)`::
  86
  87        Initializes a hashmap structure.
  88+
  89`map` is the hashmap to initialize.
  90+
  91The `equals_function` can be specified to compare two entries for equality.
  92If NULL, entries are considered equal if their hash codes are equal.
  93+
  94If the total number of entries is known in advance, the `initial_size`
  95parameter may be used to preallocate a sufficiently large table and thus
  96prevent expensive resizing. If 0, the table is dynamically resized.
  97
  98`void hashmap_free(struct hashmap *map, int free_entries)`::
  99
 100        Frees a hashmap structure and allocated memory.
 101+
 102`map` is the hashmap to free.
 103+
 104If `free_entries` is true, each hashmap_entry in the map is freed as well
 105(using stdlib's free()).
 106
 107`void hashmap_entry_init(void *entry, unsigned int hash)`::
 108
 109        Initializes a hashmap_entry structure.
 110+
 111`entry` points to the entry to initialize.
 112+
 113`hash` is the hash code of the entry.
 114+
 115The hashmap_entry structure does not hold references to external resources,
 116and it is safe to just discard it once you are done with it (i.e. if
 117your structure was allocated with xmalloc(), you can just free(3) it,
 118and if it is on stack, you can just let it go out of scope).
 119
 120`void *hashmap_get(const struct hashmap *map, const void *key, const void *keydata)`::
 121
 122        Returns the hashmap entry for the specified key, or NULL if not found.
 123+
 124`map` is the hashmap structure.
 125+
 126`key` is a hashmap_entry structure (or user data structure that starts with
 127hashmap_entry) that has at least been initialized with the proper hash code
 128(via `hashmap_entry_init`).
 129+
 130If an entry with matching hash code is found, `key` and `keydata` are passed
 131to `hashmap_cmp_fn` to decide whether the entry matches the key.
 132
 133`void *hashmap_get_from_hash(const struct hashmap *map, unsigned int hash, const void *keydata)`::
 134
 135        Returns the hashmap entry for the specified hash code and key data,
 136        or NULL if not found.
 137+
 138`map` is the hashmap structure.
 139+
 140`hash` is the hash code of the entry to look up.
 141+
 142If an entry with matching hash code is found, `keydata` is passed to
 143`hashmap_cmp_fn` to decide whether the entry matches the key. The
 144`entry_or_key` parameter points to a bogus hashmap_entry structure that
 145should not be used in the comparison.
 146
 147`void *hashmap_get_next(const struct hashmap *map, const void *entry)`::
 148
 149        Returns the next equal hashmap entry, or NULL if not found. This can be
 150        used to iterate over duplicate entries (see `hashmap_add`).
 151+
 152`map` is the hashmap structure.
 153+
 154`entry` is the hashmap_entry to start the search from, obtained via a previous
 155call to `hashmap_get` or `hashmap_get_next`.
 156
 157`void hashmap_add(struct hashmap *map, void *entry)`::
 158
 159        Adds a hashmap entry. This allows to add duplicate entries (i.e.
 160        separate values with the same key according to hashmap_cmp_fn).
 161+
 162`map` is the hashmap structure.
 163+
 164`entry` is the entry to add.
 165
 166`void *hashmap_put(struct hashmap *map, void *entry)`::
 167
 168        Adds or replaces a hashmap entry. If the hashmap contains duplicate
 169        entries equal to the specified entry, only one of them will be replaced.
 170+
 171`map` is the hashmap structure.
 172+
 173`entry` is the entry to add or replace.
 174+
 175Returns the replaced entry, or NULL if not found (i.e. the entry was added).
 176
 177`void *hashmap_remove(struct hashmap *map, const void *key, const void *keydata)`::
 178
 179        Removes a hashmap entry matching the specified key. If the hashmap
 180        contains duplicate entries equal to the specified key, only one of
 181        them will be removed.
 182+
 183`map` is the hashmap structure.
 184+
 185`key` is a hashmap_entry structure (or user data structure that starts with
 186hashmap_entry) that has at least been initialized with the proper hash code
 187(via `hashmap_entry_init`).
 188+
 189If an entry with matching hash code is found, `key` and `keydata` are
 190passed to `hashmap_cmp_fn` to decide whether the entry matches the key.
 191+
 192Returns the removed entry, or NULL if not found.
 193
 194`void hashmap_disallow_rehash(struct hashmap *map, unsigned value)`::
 195
 196        Disallow/allow automatic rehashing of the hashmap during inserts
 197        and deletes.
 198+
 199This is useful if the caller knows that the hashmap will be accessed
 200by multiple threads.
 201+
 202The caller is still responsible for any necessary locking; this simply
 203prevents unexpected rehashing.  The caller is also responsible for properly
 204sizing the initial hashmap to ensure good performance.
 205+
 206A call to allow rehashing does not force a rehash; that might happen
 207with the next insert or delete.
 208
 209`void hashmap_iter_init(struct hashmap *map, struct hashmap_iter *iter)`::
 210`void *hashmap_iter_next(struct hashmap_iter *iter)`::
 211`void *hashmap_iter_first(struct hashmap *map, struct hashmap_iter *iter)`::
 212
 213        Used to iterate over all entries of a hashmap. Note that it is
 214        not safe to add or remove entries to the hashmap while
 215        iterating.
 216+
 217`hashmap_iter_init` initializes a `hashmap_iter` structure.
 218+
 219`hashmap_iter_next` returns the next hashmap_entry, or NULL if there are no
 220more entries.
 221+
 222`hashmap_iter_first` is a combination of both (i.e. initializes the iterator
 223and returns the first entry, if any).
 224
 225`const char *strintern(const char *string)`::
 226`const void *memintern(const void *data, size_t len)`::
 227
 228        Returns the unique, interned version of the specified string or data,
 229        similar to the `String.intern` API in Java and .NET, respectively.
 230        Interned strings remain valid for the entire lifetime of the process.
 231+
 232Can be used as `[x]strdup()` or `xmemdupz` replacement, except that interned
 233strings / data must not be modified or freed.
 234+
 235Interned strings are best used for short strings with high probability of
 236duplicates.
 237+
 238Uses a hashmap to store the pool of interned strings.
 239
 240Usage example
 241-------------
 242
 243Here's a simple usage example that maps long keys to double values.
 244------------
 245struct hashmap map;
 246
 247struct long2double {
 248        struct hashmap_entry ent; /* must be the first member! */
 249        long key;
 250        double value;
 251};
 252
 253static int long2double_cmp(const struct long2double *e1, const struct long2double *e2, const void *unused)
 254{
 255        return !(e1->key == e2->key);
 256}
 257
 258void long2double_init(void)
 259{
 260        hashmap_init(&map, (hashmap_cmp_fn) long2double_cmp, 0);
 261}
 262
 263void long2double_free(void)
 264{
 265        hashmap_free(&map, 1);
 266}
 267
 268static struct long2double *find_entry(long key)
 269{
 270        struct long2double k;
 271        hashmap_entry_init(&k, memhash(&key, sizeof(long)));
 272        k.key = key;
 273        return hashmap_get(&map, &k, NULL);
 274}
 275
 276double get_value(long key)
 277{
 278        struct long2double *e = find_entry(key);
 279        return e ? e->value : 0;
 280}
 281
 282void set_value(long key, double value)
 283{
 284        struct long2double *e = find_entry(key);
 285        if (!e) {
 286                e = malloc(sizeof(struct long2double));
 287                hashmap_entry_init(e, memhash(&key, sizeof(long)));
 288                e->key = key;
 289                hashmap_add(&map, e);
 290        }
 291        e->value = value;
 292}
 293------------
 294
 295Using variable-sized keys
 296-------------------------
 297
 298The `hashmap_entry_get` and `hashmap_entry_remove` functions expect an ordinary
 299`hashmap_entry` structure as key to find the correct entry. If the key data is
 300variable-sized (e.g. a FLEX_ARRAY string) or quite large, it is undesirable
 301to create a full-fledged entry structure on the heap and copy all the key data
 302into the structure.
 303
 304In this case, the `keydata` parameter can be used to pass
 305variable-sized key data directly to the comparison function, and the `key`
 306parameter can be a stripped-down, fixed size entry structure allocated on the
 307stack.
 308
 309See test-hashmap.c for an example using arbitrary-length strings as keys.