Merge branch 'ds/multi-pack-index'
authorJunio C Hamano <gitster@pobox.com>
Mon, 17 Sep 2018 20:53:50 +0000 (13:53 -0700)
committerJunio C Hamano <gitster@pobox.com>
Mon, 17 Sep 2018 20:53:50 +0000 (13:53 -0700)
When there are too many packfiles in a repository (which is not
recommended), looking up an object in these would require
consulting many pack .idx files; a new mechanism to have a single
file that consolidates all of these .idx files is introduced.

* ds/multi-pack-index: (32 commits)
pack-objects: consider packs in multi-pack-index
midx: test a few commands that use get_all_packs
treewide: use get_all_packs
packfile: add all_packs list
midx: fix bug that skips midx with alternates
midx: stop reporting garbage
midx: mark bad packed objects
multi-pack-index: store local property
multi-pack-index: provide more helpful usage info
midx: clear midx on repack
packfile: skip loading index if in multi-pack-index
midx: prevent duplicate packfile loads
midx: use midx in approximate_object_count
midx: use existing midx when writing new one
midx: use midx in abbreviation calculations
midx: read objects from multi-pack-index
config: create core.multiPackIndex setting
midx: write object offsets
midx: write object id fanout chunk
midx: write object ids in a chunk
...

1  2 
Documentation/config.txt
Makefile
builtin/pack-objects.c
http-backend.c
pack-objects.c
t/helper/test-tool.h
diff --combined Documentation/config.txt
index 69a27eb688167e0c27f6f62bf14c5d5218cc20a8,8283443c979ec4543ebfe4539cf21c2bf4a8569b..6ecd70df0a5324eed8afe206e945aa1598d6aaad
@@@ -462,20 -462,10 +462,20 @@@ core.untrackedCache:
        See linkgit:git-update-index[1]. `keep` by default.
  
  core.checkStat::
 -      Determines which stat fields to match between the index
 -      and work tree. The user can set this to 'default' or
 -      'minimal'. Default (or explicitly 'default'), is to check
 -      all fields, including the sub-second part of mtime and ctime.
 +      When missing or is set to `default`, many fields in the stat
 +      structure are checked to detect if a file has been modified
 +      since Git looked at it.  When this configuration variable is
 +      set to `minimal`, sub-second part of mtime and ctime, the
 +      uid and gid of the owner of the file, the inode number (and
 +      the device number, if Git was compiled to use it), are
 +      excluded from the check among these fields, leaving only the
 +      whole-second part of mtime (and ctime, if `core.trustCtime`
 +      is set) and the filesize to be checked.
 ++
 +There are implementations of Git that do not leave usable values in
 +some fields (e.g. JGit); by excluding these fields from the
 +comparison, the `minimal` mode may help interoperability when the
 +same repository is used by these other systems at the same time.
  
  core.quotePath::
        Commands that output paths (e.g. 'ls-files', 'diff'), will
@@@ -927,16 -917,23 +927,21 @@@ core.notesRef:
  This setting defaults to "refs/notes/commits", and it can be overridden by
  the `GIT_NOTES_REF` environment variable.  See linkgit:git-notes[1].
  
 -gc.commitGraph::
 -      If true, then gc will rewrite the commit-graph file when
 -      linkgit:git-gc[1] is run. When using linkgit:git-gc[1]
 -      '--auto' the commit-graph will be updated if housekeeping is
 -      required. Default is false. See linkgit:git-commit-graph[1]
 -      for details.
 +core.commitGraph::
 +      If true, then git will read the commit-graph file (if it exists)
 +      to parse the graph structure of commits. Defaults to false. See
 +      linkgit:git-commit-graph[1] for more information.
  
  core.useReplaceRefs::
        If set to `false`, behave as if the `--no-replace-objects`
        option was given on the command line. See linkgit:git[1] and
        linkgit:git-replace[1] for more information.
  
+ core.multiPackIndex::
+       Use the multi-pack-index file to track multiple packfiles using a
+       single index. See link:technical/multi-pack-index.html[the
+       multi-pack-index design document].
  core.sparseCheckout::
        Enable "sparse checkout" feature. See section "Sparse checkout" in
        linkgit:git-read-tree[1] for more information.
@@@ -1052,12 -1049,6 +1057,12 @@@ branch.autoSetupRebase:
        branch to track another branch.
        This option defaults to never.
  
 +branch.sort::
 +      This variable controls the sort ordering of branches when displayed by
 +      linkgit:git-branch[1]. Without the "--sort=<value>" option provided, the
 +      value of this variable will be used as the default.
 +      See linkgit:git-for-each-ref[1] field names for valid values.
 +
  branch.<name>.remote::
        When on branch <name>, it tells 'git fetch' and 'git push'
        which remote to fetch from/push to.  The remote to push to
@@@ -1154,14 -1145,6 +1159,14 @@@ and by linkgit:git-worktree[1] when 'gi
  remote branch. This setting might be used for other checkout-like
  commands or functionality in the future.
  
 +checkout.optimizeNewBranch
 +      Optimizes the performance of "git checkout -b <new_branch>" when
 +      using sparse-checkout.  When set to true, git will not update the
 +      repo based on the current sparse-checkout settings.  This means it
 +      will not update the skip-worktree bit in the index nor add/remove
 +      files in the working directory to reflect the current sparse checkout
 +      settings nor will it show the local changes.
 +
  clean.requireForce::
        A boolean to make git-clean do nothing unless given -f,
        -i or -n.   Defaults to true.
@@@ -1225,6 -1208,18 +1230,6 @@@ This does not affect linkgit:git-format
  'git-diff-{asterisk}' plumbing commands.  Can be overridden on the
  command line with the `--color[=<when>]` option.
  
 -diff.colorMoved::
 -      If set to either a valid `<mode>` or a true value, moved lines
 -      in a diff are colored differently, for details of valid modes
 -      see '--color-moved' in linkgit:git-diff[1]. If simply set to
 -      true the default color mode will be used. When set to false,
 -      moved lines are not colored.
 -
 -diff.colorMovedWS::
 -      When moved lines are colored using e.g. the `diff.colorMoved` setting,
 -      this option controls the `<mode>` how spaces are treated
 -      for details of valid modes see '--color-moved-ws' in linkgit:git-diff[1].
 -
  color.diff.<slot>::
        Use customized color for diff colorization.  `<slot>` specifies
        which part of the patch to use the specified color, and is one
@@@ -1773,13 -1768,6 +1778,13 @@@ this configuration variable is ignored
  will be repacked. After this the number of packs should go below
  gc.autoPackLimit and gc.bigPackThreshold should be respected again.
  
 +gc.writeCommitGraph::
 +      If true, then gc will rewrite the commit-graph file when
 +      linkgit:git-gc[1] is run. When using linkgit:git-gc[1]
 +      '--auto' the commit-graph will be updated if housekeeping is
 +      required. Default is false. See linkgit:git-commit-graph[1]
 +      for details.
 +
  gc.logExpiry::
        If the file gc.log exists, then `git gc --auto` won't run
        unless that file is more than 'gc.logExpiry' old.  Default is
diff --combined Makefile
index 5a969f5830a4105d3e3e6236eaa51e19880cc873,377379fcc05c1dbb04cb7849ee610e235c3ef4ab..bd83683a87aa9797d4519bf7c20591df9d5c8de8
+++ b/Makefile
@@@ -723,6 -723,7 +723,7 @@@ TEST_BUILTINS_OBJS += test-online-cpus.
  TEST_BUILTINS_OBJS += test-path-utils.o
  TEST_BUILTINS_OBJS += test-prio-queue.o
  TEST_BUILTINS_OBJS += test-read-cache.o
+ TEST_BUILTINS_OBJS += test-read-midx.o
  TEST_BUILTINS_OBJS += test-ref-store.o
  TEST_BUILTINS_OBJS += test-regex.o
  TEST_BUILTINS_OBJS += test-repository.o
@@@ -900,6 -901,7 +901,7 @@@ LIB_OBJS += merge.
  LIB_OBJS += merge-blobs.o
  LIB_OBJS += merge-recursive.o
  LIB_OBJS += mergesort.o
+ LIB_OBJS += midx.o
  LIB_OBJS += name-hash.o
  LIB_OBJS += negotiator/default.o
  LIB_OBJS += negotiator/skipping.o
@@@ -1060,6 -1062,7 +1062,7 @@@ BUILTIN_OBJS += builtin/merge-recursive
  BUILTIN_OBJS += builtin/merge-tree.o
  BUILTIN_OBJS += builtin/mktag.o
  BUILTIN_OBJS += builtin/mktree.o
+ BUILTIN_OBJS += builtin/multi-pack-index.o
  BUILTIN_OBJS += builtin/mv.o
  BUILTIN_OBJS += builtin/name-rev.o
  BUILTIN_OBJS += builtin/notes.o
@@@ -2047,7 -2050,7 +2050,7 @@@ $(BUILT_INS): git$
  
  command-list.h: generate-cmdlist.sh command-list.txt
  
 -command-list.h: $(wildcard Documentation/git*.txt) Documentation/config.txt
 +command-list.h: $(wildcard Documentation/git*.txt) Documentation/*config.txt
        $(QUIET_GEN)$(SHELL_PATH) ./generate-cmdlist.sh command-list.txt >$@+ && mv $@+ $@
  
  SCRIPT_DEFINES = $(SHELL_PATH_SQ):$(DIFF_SQ):$(GIT_VERSION):\
diff --combined builtin/pack-objects.c
index d1144a8f7ef79f7efa5bf64141a9133cfeee66d1,807f0343653ada7dce87c11a376c6984ed838332..caa4cd0211c40e4ac0acb4178a4df260db651b88
@@@ -31,6 -31,7 +31,7 @@@
  #include "packfile.h"
  #include "object-store.h"
  #include "dir.h"
+ #include "midx.h"
  
  #define IN_PACK(obj) oe_in_pack(&to_pack, obj)
  #define SIZE(obj) oe_size(&to_pack, obj)
@@@ -1040,6 -1041,7 +1041,7 @@@ static int want_object_in_pack(const st
  {
        int want;
        struct list_head *pos;
+       struct multi_pack_index *m;
  
        if (!exclude && local && has_loose_object_nonlocal(oid))
                return 0;
                if (want != -1)
                        return want;
        }
+       for (m = get_multi_pack_index(the_repository); m; m = m->next) {
+               struct pack_entry e;
+               if (fill_midx_entry(oid, &e, m)) {
+                       struct packed_git *p = e.p;
+                       off_t offset;
+                       if (p == *found_pack)
+                               offset = *found_offset;
+                       else
+                               offset = find_pack_entry_one(oid->hash, p);
+                       if (offset) {
+                               if (!*found_pack) {
+                                       if (!is_pack_valid(p))
+                                               continue;
+                                       *found_offset = offset;
+                                       *found_pack = p;
+                               }
+                               want = want_found_object(exclude, p);
+                               if (want != -1)
+                                       return want;
+                       }
+               }
+       }
        list_for_each(pos, get_packed_git_mru(the_repository)) {
                struct packed_git *p = list_entry(pos, struct packed_git, mru);
                off_t offset;
@@@ -2041,6 -2069,10 +2069,6 @@@ static int try_delta(struct unpacked *t
        delta_buf = create_delta(src->index, trg->data, trg_size, &delta_size, max_size);
        if (!delta_buf)
                return 0;
 -      if (delta_size >= (1U << OE_DELTA_SIZE_BITS)) {
 -              free(delta_buf);
 -              return 0;
 -      }
  
        if (DELTA(trg_entry)) {
                /* Prefer only shallower same-sized deltas. */
@@@ -2299,7 -2331,6 +2327,7 @@@ static void init_threaded_search(void
        pthread_mutex_init(&cache_mutex, NULL);
        pthread_mutex_init(&progress_mutex, NULL);
        pthread_cond_init(&progress_cond, NULL);
 +      pthread_mutex_init(&to_pack.lock, NULL);
        old_try_to_free_routine = set_try_to_free_routine(try_to_free_from_threads);
  }
  
@@@ -2806,7 -2837,7 +2834,7 @@@ static void add_objects_in_unpacked_pac
  
        memset(&in_pack, 0, sizeof(in_pack));
  
-       for (p = get_packed_git(the_repository); p; p = p->next) {
+       for (p = get_all_packs(the_repository); p; p = p->next) {
                struct object_id oid;
                struct object *o;
  
@@@ -2870,7 -2901,7 +2898,7 @@@ static int has_sha1_pack_kept_or_nonloc
        struct packed_git *p;
  
        p = (last_found != (void *)1) ? last_found :
-                                       get_packed_git(the_repository);
+                                       get_all_packs(the_repository);
  
        while (p) {
                if ((!p->pack_local || p->pack_keep ||
                        return 1;
                }
                if (p == last_found)
-                       p = get_packed_git(the_repository);
+                       p = get_all_packs(the_repository);
                else
                        p = p->next;
                if (p == last_found)
@@@ -2916,7 -2947,7 +2944,7 @@@ static void loosen_unused_packed_object
        uint32_t i;
        struct object_id oid;
  
-       for (p = get_packed_git(the_repository); p; p = p->next) {
+       for (p = get_all_packs(the_repository); p; p = p->next) {
                if (!p->pack_local || p->pack_keep || p->pack_keep_in_core)
                        continue;
  
@@@ -3063,7 -3094,7 +3091,7 @@@ static void add_extra_kept_packs(const 
        if (!names->nr)
                return;
  
-       for (p = get_packed_git(the_repository); p; p = p->next) {
+       for (p = get_all_packs(the_repository); p; p = p->next) {
                const char *name = basename(p->pack_name);
                int i;
  
@@@ -3336,7 -3367,7 +3364,7 @@@ int cmd_pack_objects(int argc, const ch
        add_extra_kept_packs(&keep_pack_list);
        if (ignore_packed_keep_on_disk) {
                struct packed_git *p;
-               for (p = get_packed_git(the_repository); p; p = p->next)
+               for (p = get_all_packs(the_repository); p; p = p->next)
                        if (p->pack_local && p->pack_keep)
                                break;
                if (!p) /* no keep-able packs found */
                 * it also covers non-local objects
                 */
                struct packed_git *p;
-               for (p = get_packed_git(the_repository); p; p = p->next) {
+               for (p = get_all_packs(the_repository); p; p = p->next) {
                        if (!p->pack_local) {
                                have_non_local_packs = 1;
                                break;
diff --combined http-backend.c
index 458642ef72b879a2f53d6e1f8f192847ed814111,809ba7d2c49eace9e2e9c4dae348c394dac28209..9e894f197f91ee3565b0f3c618fdb4042e2f229f
@@@ -353,7 -353,7 +353,7 @@@ static ssize_t get_content_length(void
        ssize_t val = -1;
        const char *str = getenv("CONTENT_LENGTH");
  
 -      if (str && !git_parse_ssize_t(str, &val))
 +      if (str && *str && !git_parse_ssize_t(str, &val))
                die("failed to parse CONTENT_LENGTH: %s", str);
        return val;
  }
@@@ -595,13 -595,13 +595,13 @@@ static void get_info_packs(struct strbu
        size_t cnt = 0;
  
        select_getanyfile(hdr);
-       for (p = get_packed_git(the_repository); p; p = p->next) {
+       for (p = get_all_packs(the_repository); p; p = p->next) {
                if (p->pack_local)
                        cnt++;
        }
  
        strbuf_grow(&buf, cnt * 53 + 2);
-       for (p = get_packed_git(the_repository); p; p = p->next) {
+       for (p = get_all_packs(the_repository); p; p = p->next) {
                if (p->pack_local)
                        strbuf_addf(&buf, "P %s\n", p->pack_name + objdirlen + 6);
        }
diff --combined pack-objects.c
index 6ef87e5683aacdf738c86679712078988c0899fd,832dcf7462797523fea8262db31077033c1a2910..d04cfa8e9f173b3e3b31e7b634c13adf735cb479
@@@ -99,7 -99,7 +99,7 @@@ static void prepare_in_pack_by_idx(stru
         * (i.e. in_pack_idx also zero) should return NULL.
         */
        mapping[cnt++] = NULL;
-       for (p = get_packed_git(the_repository); p; p = p->next, cnt++) {
+       for (p = get_all_packs(the_repository); p; p = p->next, cnt++) {
                if (cnt == nr) {
                        free(mapping);
                        return;
@@@ -146,8 -146,6 +146,8 @@@ void prepare_packing_data(struct packin
  
        pdata->oe_size_limit = git_env_ulong("GIT_TEST_OE_SIZE",
                                             1U << OE_SIZE_BITS);
 +      pdata->oe_delta_size_limit = git_env_ulong("GIT_TEST_OE_DELTA_SIZE",
 +                                                 1UL << OE_DELTA_SIZE_BITS);
  }
  
  struct object_entry *packlist_alloc(struct packing_data *pdata,
  
                if (!pdata->in_pack_by_idx)
                        REALLOC_ARRAY(pdata->in_pack, pdata->nr_alloc);
 +              if (pdata->delta_size)
 +                      REALLOC_ARRAY(pdata->delta_size, pdata->nr_alloc);
        }
  
        new_entry = pdata->objects + pdata->nr_objects++;
diff --combined t/helper/test-tool.h
index e954e8c5222f77e882f577198091adde95cf2532,70fc0285e8ddf1680fc0e7754e3ee0898d4d1e18..710fb1b28625d26555f809844955b8785f5e2a4c
@@@ -1,8 -1,6 +1,8 @@@
  #ifndef __TEST_TOOL_H__
  #define __TEST_TOOL_H__
  
 +#include "git-compat-util.h"
 +
  int cmd__chmtime(int argc, const char **argv);
  int cmd__config(int argc, const char **argv);
  int cmd__ctype(int argc, const char **argv);
@@@ -24,6 -22,7 +24,7 @@@ int cmd__online_cpus(int argc, const ch
  int cmd__path_utils(int argc, const char **argv);
  int cmd__prio_queue(int argc, const char **argv);
  int cmd__read_cache(int argc, const char **argv);
+ int cmd__read_midx(int argc, const char **argv);
  int cmd__ref_store(int argc, const char **argv);
  int cmd__regex(int argc, const char **argv);
  int cmd__repository(int argc, const char **argv);