list-objects: consume sparse tree walk
authorDerrick Stolee <dstolee@microsoft.com>
Wed, 16 Jan 2019 18:25:58 +0000 (10:25 -0800)
committerJunio C Hamano <gitster@pobox.com>
Thu, 17 Jan 2019 21:44:39 +0000 (13:44 -0800)
When creating a pack-file using 'git pack-objects --revs' we provide
a list of interesting and uninteresting commits. For example, a push
operation would make the local topic branch be interesting and the
known remote refs as uninteresting. We want to discover the set of
new objects to send to the server as a thin pack.

We walk these commits until we discover a frontier of commits such
that every commit walk starting at interesting commits ends in a root
commit or unintersting commit. We then need to discover which
non-commit objects are reachable from uninteresting commits. This
commit walk is not changing during this series.

The mark_edges_uninteresting() method in list-objects.c iterates on
the commit list and does the following:

* If the commit is UNINTERSTING, then mark its root tree and every
object it can reach as UNINTERESTING.

* If the commit is interesting, then mark the root tree of every
UNINTERSTING parent (and all objects that tree can reach) as
UNINTERSTING.

At the very end, we repeat the process on every commit directly
given to the revision walk from stdin. This helps ensure we properly
cover shallow commits that otherwise were not included in the
frontier.

The logic to recursively follow trees is in the
mark_tree_uninteresting() method in revision.c. The algorithm avoids
duplicate work by not recursing into trees that are already marked
UNINTERSTING.

Add a new 'sparse' option to the mark_edges_uninteresting() method
that performs this logic in a slightly different way. As we iterate
over the commits, we add all of the root trees to an oidset. Then,
call mark_trees_uninteresting_sparse() on that oidset. Note that we
include interesting trees in this process. The current implementation
of mark_trees_unintersting_sparse() will walk the same trees as
the old logic, but this will be replaced in a later change.

Add a '--sparse' flag in 'git pack-objects' to call this new logic.
Add a new test script t/t5322-pack-objects-sparse.sh that tests this
option. The tests currently demonstrate that the resulting object
list is the same as the old algorithm. This includes a case where
both algorithms pack an object that is not needed by a remote due to
limits on the explored set of trees. When the sparse algorithm is
changed in a later commit, we will add a test that demonstrates a
change of behavior in some cases.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Documentation/git-pack-objects.txt
bisect.c
builtin/pack-objects.c
builtin/rev-list.c
http-push.c
list-objects.c
list-objects.h
t/t5322-pack-objects-sparse.sh [new file with mode: 0755]
index 40c825c38197f4e335ebfb162415cdcc52bbdf1e..e45f3e680d3632c8122db01b2a51cc3971c27922 100644 (file)
@@ -14,7 +14,7 @@ SYNOPSIS
        [--local] [--incremental] [--window=<n>] [--depth=<n>]
        [--revs [--unpacked | --all]] [--keep-pack=<pack-name>]
        [--stdout [--filter=<filter-spec>] | base-name]
-       [--shallow] [--keep-true-parents] < object-list
+       [--shallow] [--keep-true-parents] [--sparse] < object-list
 
 
 DESCRIPTION
@@ -196,6 +196,15 @@ depth is 4095.
        Add --no-reuse-object if you want to force a uniform compression
        level on all data no matter the source.
 
+--sparse::
+       Use the "sparse" algorithm to determine which objects to include in
+       the pack, when combined with the "--revs" option. This algorithm
+       only walks trees that appear in paths that introduce new objects.
+       This can have significant performance benefits when computing
+       a pack to send a small change. However, it is possible that extra
+       objects are added to the pack-file if the included commits contain
+       certain types of direct renames.
+
 --thin::
        Create a "thin" pack by omitting the common objects between a
        sender and a receiver in order to reduce network transfer. This
index 487675c67249a3164294aae71c1c2a650f3c4947..842f8b4b8f373a29880f7248bc8a4f5b6cb1f543 100644 (file)
--- a/bisect.c
+++ b/bisect.c
@@ -656,7 +656,7 @@ static void bisect_common(struct rev_info *revs)
        if (prepare_revision_walk(revs))
                die("revision walk setup failed");
        if (revs->tree_objects)
-               mark_edges_uninteresting(revs, NULL);
+               mark_edges_uninteresting(revs, NULL, 0);
 }
 
 static void exit_if_skipped_commits(struct commit_list *tried,
index 411aefd6875b2d35ee4a12d1a043ba50027021b3..7d5b0735e396985c82dfa64bdd5d43badb6eb1ce 100644 (file)
@@ -84,6 +84,7 @@ static unsigned long pack_size_limit;
 static int depth = 50;
 static int delta_search_threads;
 static int pack_to_stdout;
+static int sparse;
 static int thin;
 static int num_preferred_base;
 static struct progress *progress_state;
@@ -3135,7 +3136,7 @@ static void get_object_list(int ac, const char **av)
 
        if (prepare_revision_walk(&revs))
                die(_("revision walk setup failed"));
-       mark_edges_uninteresting(&revs, show_edge);
+       mark_edges_uninteresting(&revs, show_edge, sparse);
 
        if (!fn_show_object)
                fn_show_object = show_object;
@@ -3292,6 +3293,8 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
                { OPTION_CALLBACK, 0, "unpack-unreachable", NULL, N_("time"),
                  N_("unpack unreachable objects newer than <time>"),
                  PARSE_OPT_OPTARG, option_parse_unpack_unreachable },
+               OPT_BOOL(0, "sparse", &sparse,
+                        N_("use the sparse reachability algorithm")),
                OPT_BOOL(0, "thin", &thin,
                         N_("create thin packs")),
                OPT_BOOL(0, "shallow", &shallow,
index 2880ed37e3f97193d2374657346b4de52bc44954..9663cbfae0b8edc7fc12ff993d9e368c16127689 100644 (file)
@@ -543,7 +543,7 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
        if (prepare_revision_walk(&revs))
                die("revision walk setup failed");
        if (revs.tree_objects)
-               mark_edges_uninteresting(&revs, show_edge);
+               mark_edges_uninteresting(&revs, show_edge, 0);
 
        if (bisect_list) {
                int reaches, all;
index cd485909127a79afcbb58ef18cd28977b65efb79..ea52d6f9f6481657aa0c4591d193ab1fe4864ae0 100644 (file)
@@ -1933,7 +1933,7 @@ int cmd_main(int argc, const char **argv)
                pushing = 0;
                if (prepare_revision_walk(&revs))
                        die("revision walk setup failed");
-               mark_edges_uninteresting(&revs, NULL);
+               mark_edges_uninteresting(&revs, NULL, 0);
                objects_to_send = get_delta(&revs, ref_lock);
                finish_all_active_slots();
 
index c41cc80db5bc86279bcf560109f599a9783b2755..fb728f784267268594777e6a15d7f94ae4e6fb71 100644 (file)
@@ -222,25 +222,73 @@ static void mark_edge_parents_uninteresting(struct commit *commit,
        }
 }
 
-void mark_edges_uninteresting(struct rev_info *revs, show_edge_fn show_edge)
+static void add_edge_parents(struct commit *commit,
+                            struct rev_info *revs,
+                            show_edge_fn show_edge,
+                            struct oidset *set)
+{
+       struct commit_list *parents;
+
+       for (parents = commit->parents; parents; parents = parents->next) {
+               struct commit *parent = parents->item;
+               struct tree *tree = get_commit_tree(parent);
+
+               if (!tree)
+                       continue;
+
+               oidset_insert(set, &tree->object.oid);
+
+               if (!(parent->object.flags & UNINTERESTING))
+                       continue;
+               tree->object.flags |= UNINTERESTING;
+
+               if (revs->edge_hint && !(parent->object.flags & SHOWN)) {
+                       parent->object.flags |= SHOWN;
+                       show_edge(parent);
+               }
+       }
+}
+
+void mark_edges_uninteresting(struct rev_info *revs,
+                             show_edge_fn show_edge,
+                             int sparse)
 {
        struct commit_list *list;
        int i;
 
-       for (list = revs->commits; list; list = list->next) {
-               struct commit *commit = list->item;
+       if (sparse) {
+               struct oidset set;
+               oidset_init(&set, 16);
 
-               if (commit->object.flags & UNINTERESTING) {
-                       mark_tree_uninteresting(revs->repo,
-                                               get_commit_tree(commit));
-                       if (revs->edge_hint_aggressive && !(commit->object.flags & SHOWN)) {
-                               commit->object.flags |= SHOWN;
-                               show_edge(commit);
+               for (list = revs->commits; list; list = list->next) {
+                       struct commit *commit = list->item;
+                       struct tree *tree = get_commit_tree(commit);
+
+                       if (commit->object.flags & UNINTERESTING)
+                               tree->object.flags |= UNINTERESTING;
+
+                       oidset_insert(&set, &tree->object.oid);
+                       add_edge_parents(commit, revs, show_edge, &set);
+               }
+
+               mark_trees_uninteresting_sparse(revs->repo, &set);
+               oidset_clear(&set);
+       } else {
+               for (list = revs->commits; list; list = list->next) {
+                       struct commit *commit = list->item;
+                       if (commit->object.flags & UNINTERESTING) {
+                               mark_tree_uninteresting(revs->repo,
+                                                       get_commit_tree(commit));
+                               if (revs->edge_hint_aggressive && !(commit->object.flags & SHOWN)) {
+                                       commit->object.flags |= SHOWN;
+                                       show_edge(commit);
+                               }
+                               continue;
                        }
-                       continue;
+                       mark_edge_parents_uninteresting(commit, revs, show_edge);
                }
-               mark_edge_parents_uninteresting(commit, revs, show_edge);
        }
+
        if (revs->edge_hint_aggressive) {
                for (i = 0; i < revs->cmdline.nr; i++) {
                        struct object *obj = revs->cmdline.rev[i].item;
index ad407629269a7e7c77953390beccf036fd2452f6..a952680e46671db2543bc4abff78a2d898cd1408 100644 (file)
@@ -10,7 +10,9 @@ typedef void (*show_object_fn)(struct object *, const char *, void *);
 void traverse_commit_list(struct rev_info *, show_commit_fn, show_object_fn, void *);
 
 typedef void (*show_edge_fn)(struct commit *);
-void mark_edges_uninteresting(struct rev_info *, show_edge_fn);
+void mark_edges_uninteresting(struct rev_info *revs,
+                             show_edge_fn show_edge,
+                             int sparse);
 
 struct oidset;
 struct list_objects_filter_options;
diff --git a/t/t5322-pack-objects-sparse.sh b/t/t5322-pack-objects-sparse.sh
new file mode 100755 (executable)
index 0000000..30aef64
--- /dev/null
@@ -0,0 +1,113 @@
+#!/bin/sh
+
+test_description='pack-objects object selection using sparse algorithm'
+. ./test-lib.sh
+
+test_expect_success 'setup repo' '
+       test_commit initial &&
+       for i in $(test_seq 1 3)
+       do
+               mkdir f$i &&
+               for j in $(test_seq 1 3)
+               do
+                       mkdir f$i/f$j &&
+                       echo $j >f$i/f$j/data.txt
+               done
+       done &&
+       git add . &&
+       git commit -m "Initialized trees" &&
+       for i in $(test_seq 1 3)
+       do
+               git checkout -b topic$i master &&
+               echo change-$i >f$i/f$i/data.txt &&
+               git commit -a -m "Changed f$i/f$i/data.txt"
+       done &&
+       cat >packinput.txt <<-EOF &&
+       topic1
+       ^topic2
+       ^topic3
+       EOF
+       git rev-parse                   \
+               topic1                  \
+               topic1^{tree}           \
+               topic1:f1               \
+               topic1:f1/f1            \
+               topic1:f1/f1/data.txt | sort >expect_objects.txt
+'
+
+test_expect_success 'non-sparse pack-objects' '
+       git pack-objects --stdout --revs <packinput.txt >nonsparse.pack &&
+       git index-pack -o nonsparse.idx nonsparse.pack &&
+       git show-index <nonsparse.idx | awk "{print \$2}" >nonsparse_objects.txt &&
+       test_cmp expect_objects.txt nonsparse_objects.txt
+'
+
+test_expect_success 'sparse pack-objects' '
+       git pack-objects --stdout --revs --sparse <packinput.txt >sparse.pack &&
+       git index-pack -o sparse.idx sparse.pack &&
+       git show-index <sparse.idx | awk "{print \$2}" >sparse_objects.txt &&
+       test_cmp expect_objects.txt sparse_objects.txt
+'
+
+test_expect_success 'duplicate a folder from f3 and commit to topic1' '
+       git checkout topic1 &&
+       echo change-3 >f3/f3/data.txt &&
+       git commit -a -m "Changed f3/f3/data.txt" &&
+       git rev-parse                   \
+               topic1~1                \
+               topic1~1^{tree}         \
+               topic1^{tree}           \
+               topic1                  \
+               topic1:f1               \
+               topic1:f1/f1            \
+               topic1:f1/f1/data.txt | sort >required_objects.txt
+'
+
+test_expect_success 'non-sparse pack-objects' '
+       git pack-objects --stdout --revs <packinput.txt >nonsparse.pack &&
+       git index-pack -o nonsparse.idx nonsparse.pack &&
+       git show-index <nonsparse.idx | awk "{print \$2}" >nonsparse_objects.txt &&
+       comm -1 -2 required_objects.txt nonsparse_objects.txt >nonsparse_required_objects.txt &&
+       test_cmp required_objects.txt nonsparse_required_objects.txt
+'
+
+test_expect_success 'sparse pack-objects' '
+       git pack-objects --stdout --revs --sparse <packinput.txt >sparse.pack &&
+       git index-pack -o sparse.idx sparse.pack &&
+       git show-index <sparse.idx | awk "{print \$2}" >sparse_objects.txt &&
+       comm -1 -2 required_objects.txt sparse_objects.txt >sparse_required_objects.txt &&
+       test_cmp required_objects.txt sparse_required_objects.txt
+'
+
+test_expect_success 'duplicate a folder from f1 into f3' '
+       mkdir f3/f4 &&
+       cp -r f1/f1/* f3/f4 &&
+       git add f3/f4 &&
+       git commit -m "Copied f1/f1 to f3/f4" &&
+       cat >packinput.txt <<-EOF &&
+       topic1
+       ^topic1~1
+       EOF
+       git rev-parse           \
+               topic1          \
+               topic1^{tree}   \
+               topic1:f3 | sort >required_objects.txt
+'
+
+test_expect_success 'non-sparse pack-objects' '
+       git pack-objects --stdout --revs <packinput.txt >nonsparse.pack &&
+       git index-pack -o nonsparse.idx nonsparse.pack &&
+       git show-index <nonsparse.idx | awk "{print \$2}" >nonsparse_objects.txt &&
+       comm -1 -2 required_objects.txt nonsparse_objects.txt >nonsparse_required_objects.txt &&
+       test_cmp required_objects.txt nonsparse_required_objects.txt
+'
+
+test_expect_success 'sparse pack-objects' '
+       git pack-objects --stdout --revs --sparse <packinput.txt >sparse.pack &&
+       git index-pack -o sparse.idx sparse.pack &&
+       git show-index <sparse.idx | awk "{print \$2}" >sparse_objects.txt &&
+       comm -1 -2 required_objects.txt sparse_objects.txt >sparse_required_objects.txt &&
+       test_cmp required_objects.txt sparse_required_objects.txt
+'
+
+test_done