Merge branch 'ds/commit-graph-fsck'
authorJunio C Hamano <gitster@pobox.com>
Thu, 2 Aug 2018 22:30:39 +0000 (15:30 -0700)
committerJunio C Hamano <gitster@pobox.com>
Thu, 2 Aug 2018 22:30:40 +0000 (15:30 -0700)
"git fsck" learns to make sure the optional commit-graph file is in
a sane state.

* ds/commit-graph-fsck: (23 commits)
coccinelle: update commit.cocci
commit-graph: update design document
gc: automatically write commit-graph files
commit-graph: add '--reachable' option
commit-graph: use string-list API for input
fsck: verify commit-graph
commit-graph: verify contents match checksum
commit-graph: test for corrupted octopus edge
commit-graph: verify commit date
commit-graph: verify generation number
commit-graph: verify parent list
commit-graph: verify root tree OIDs
commit-graph: verify objects exist
commit-graph: verify corrupt OID fanout and lookup
commit-graph: verify required chunks are present
commit-graph: verify catches corrupt signature
commit-graph: add 'verify' subcommand
commit-graph: load a root tree from specific graph
commit: force commit to parse from object database
commit-graph: parse commit from chosen graph
...

14 files changed:
Documentation/config.txt
Documentation/git-commit-graph.txt
Documentation/git-fsck.txt
Documentation/git-gc.txt
Documentation/technical/commit-graph.txt
builtin/commit-graph.c
builtin/fsck.c
builtin/gc.c
commit-graph.c
commit-graph.h
commit.c
commit.h
contrib/coccinelle/commit.cocci
t/t5318-commit-graph.sh
index 43b2de7b5fe21bffb623ccf8aba286c346440d51..8c4831b82ff101391435aff499aaff1fb16e0ef3 100644 (file)
@@ -907,9 +907,12 @@ core.notesRef::
 This setting defaults to "refs/notes/commits", and it can be overridden by
 the `GIT_NOTES_REF` environment variable.  See linkgit:git-notes[1].
 
-core.commitGraph::
-       Enable git commit graph feature. Allows reading from the
-       commit-graph file.
+gc.commitGraph::
+       If true, then gc will rewrite the commit-graph file when
+       linkgit:git-gc[1] is run. When using linkgit:git-gc[1]
+       '--auto' the commit-graph will be updated if housekeeping is
+       required. Default is false. See linkgit:git-commit-graph[1]
+       for details.
 
 core.sparseCheckout::
        Enable "sparse checkout" feature. See section "Sparse checkout" in
index 4c97b555cca5c1daa66c1f44c097ef6a640e0914..dececb79d772447e5ca0d3ea8253f20ea288ddf6 100644 (file)
@@ -10,6 +10,7 @@ SYNOPSIS
 --------
 [verse]
 'git commit-graph read' [--object-dir <dir>]
+'git commit-graph verify' [--object-dir <dir>]
 'git commit-graph write' <options> [--object-dir <dir>]
 
 
@@ -37,12 +38,16 @@ Write a commit graph file based on the commits found in packfiles.
 +
 With the `--stdin-packs` option, generate the new commit graph by
 walking objects only in the specified pack-indexes. (Cannot be combined
-with --stdin-commits.)
+with `--stdin-commits` or `--reachable`.)
 +
 With the `--stdin-commits` option, generate the new commit graph by
 walking commits starting at the commits specified in stdin as a list
 of OIDs in hex, one OID per line. (Cannot be combined with
---stdin-packs.)
+`--stdin-packs` or `--reachable`.)
++
+With the `--reachable` option, generate the new commit graph by walking
+commits starting at all refs. (Cannot be combined with `--stdin-commits`
+or `--stdin-packs`.)
 +
 With the `--append` option, include all commits that are present in the
 existing commit-graph file.
@@ -52,6 +57,11 @@ existing commit-graph file.
 Read a graph file given by the commit-graph file and output basic
 details about the graph file. Used for debugging purposes.
 
+'verify'::
+
+Read the commit-graph file and verify its contents against the object
+database. Used to check for corrupted data.
+
 
 EXAMPLES
 --------
index b9f060e3b207f981932957d8148dfcfcd912e33c..ab9a93fb9b8fb7291362cc419a7aca8b464d3d76 100644 (file)
@@ -110,6 +110,9 @@ Any corrupt objects you will have to find in backups or other archives
 (i.e., you can just remove them and do an 'rsync' with some other site in
 the hopes that somebody else has the object you have corrupted).
 
+If core.commitGraph is true, the commit-graph file will also be inspected
+using 'git commit-graph verify'. See linkgit:git-commit-graph[1].
+
 Extracted Diagnostics
 ---------------------
 
index 24b2dd44fe445a66121fa957f0af8e2209a85676..f5bc98ccb3673079fa2e1205b57bc87acb3c1e90 100644 (file)
@@ -136,6 +136,10 @@ The optional configuration variable `gc.packRefs` determines if
 it within all non-bare repos or it can be set to a boolean value.
 This defaults to true.
 
+The optional configuration variable `gc.commitGraph` determines if
+'git gc' should run 'git commit-graph write'. This can be set to a
+boolean value. This defaults to false.
+
 The optional configuration variable `gc.aggressiveWindow` controls how
 much time is spent optimizing the delta compression of the objects in
 the repository when the --aggressive option is specified.  The larger
index e1a883eb462cd9bddbc192a315464282146ac433..c664acbd765d06f000b2feeef1d7e98f1616ff62 100644 (file)
@@ -118,9 +118,6 @@ Future Work
 - The commit graph feature currently does not honor commit grafts. This can
   be remedied by duplicating or refactoring the current graft logic.
 
-- The 'commit-graph' subcommand does not have a "verify" mode that is
-  necessary for integration with fsck.
-
 - After computing and storing generation numbers, we must make graph
   walks aware of generation numbers to gain the performance benefits they
   enable. This will mostly be accomplished by swapping a commit-date-ordered
@@ -130,25 +127,6 @@ Future Work
     - 'log --topo-order'
     - 'tag --merged'
 
-- Currently, parse_commit_gently() requires filling in the root tree
-  object for a commit. This passes through lookup_tree() and consequently
-  lookup_object(). Also, it calls lookup_commit() when loading the parents.
-  These method calls check the ODB for object existence, even if the
-  consumer does not need the content. For example, we do not need the
-  tree contents when computing merge bases. Now that commit parsing is
-  removed from the computation time, these lookup operations are the
-  slowest operations keeping graph walks from being fast. Consider
-  loading these objects without verifying their existence in the ODB and
-  only loading them fully when consumers need them. Consider a method
-  such as "ensure_tree_loaded(commit)" that fully loads a tree before
-  using commit->tree.
-
-- The current design uses the 'commit-graph' subcommand to generate the graph.
-  When this feature stabilizes enough to recommend to most users, we should
-  add automatic graph writes to common operations that create many commits.
-  For example, one could compute a graph on 'clone', 'fetch', or 'repack'
-  commands.
-
 - A server could provide a commit graph file as part of the network protocol
   to avoid extra calculations by clients. This feature is only of benefit if
   the user is willing to trust the file, because verifying the file is correct
index 37420ae0fde81ea2e957bedd4949a2eba0cd6ff9..c7d0db5ab4b668618720eb30cffad904f0c57b14 100644 (file)
@@ -3,12 +3,19 @@
 #include "dir.h"
 #include "lockfile.h"
 #include "parse-options.h"
+#include "repository.h"
 #include "commit-graph.h"
 
 static char const * const builtin_commit_graph_usage[] = {
        N_("git commit-graph [--object-dir <objdir>]"),
        N_("git commit-graph read [--object-dir <objdir>]"),
-       N_("git commit-graph write [--object-dir <objdir>] [--append] [--stdin-packs|--stdin-commits]"),
+       N_("git commit-graph verify [--object-dir <objdir>]"),
+       N_("git commit-graph write [--object-dir <objdir>] [--append] [--reachable|--stdin-packs|--stdin-commits]"),
+       NULL
+};
+
+static const char * const builtin_commit_graph_verify_usage[] = {
+       N_("git commit-graph verify [--object-dir <objdir>]"),
        NULL
 };
 
@@ -18,17 +25,48 @@ static const char * const builtin_commit_graph_read_usage[] = {
 };
 
 static const char * const builtin_commit_graph_write_usage[] = {
-       N_("git commit-graph write [--object-dir <objdir>] [--append] [--stdin-packs|--stdin-commits]"),
+       N_("git commit-graph write [--object-dir <objdir>] [--append] [--reachable|--stdin-packs|--stdin-commits]"),
        NULL
 };
 
 static struct opts_commit_graph {
        const char *obj_dir;
+       int reachable;
        int stdin_packs;
        int stdin_commits;
        int append;
 } opts;
 
+
+static int graph_verify(int argc, const char **argv)
+{
+       struct commit_graph *graph = NULL;
+       char *graph_name;
+
+       static struct option builtin_commit_graph_verify_options[] = {
+               OPT_STRING(0, "object-dir", &opts.obj_dir,
+                          N_("dir"),
+                          N_("The object directory to store the graph")),
+               OPT_END(),
+       };
+
+       argc = parse_options(argc, argv, NULL,
+                            builtin_commit_graph_verify_options,
+                            builtin_commit_graph_verify_usage, 0);
+
+       if (!opts.obj_dir)
+               opts.obj_dir = get_object_directory();
+
+       graph_name = get_commit_graph_filename(opts.obj_dir);
+       graph = load_commit_graph_one(graph_name);
+       FREE_AND_NULL(graph_name);
+
+       if (!graph)
+               return 0;
+
+       return verify_commit_graph(the_repository, graph);
+}
+
 static int graph_read(int argc, const char **argv)
 {
        struct commit_graph *graph = NULL;
@@ -51,8 +89,11 @@ static int graph_read(int argc, const char **argv)
        graph_name = get_commit_graph_filename(opts.obj_dir);
        graph = load_commit_graph_one(graph_name);
 
-       if (!graph)
+       if (!graph) {
+               UNLEAK(graph_name);
                die("graph file %s does not exist", graph_name);
+       }
+
        FREE_AND_NULL(graph_name);
 
        printf("header: %08x %d %d %d %d\n",
@@ -79,18 +120,16 @@ static int graph_read(int argc, const char **argv)
 
 static int graph_write(int argc, const char **argv)
 {
-       const char **pack_indexes = NULL;
-       int packs_nr = 0;
-       const char **commit_hex = NULL;
-       int commits_nr = 0;
-       const char **lines = NULL;
-       int lines_nr = 0;
-       int lines_alloc = 0;
+       struct string_list *pack_indexes = NULL;
+       struct string_list *commit_hex = NULL;
+       struct string_list lines;
 
        static struct option builtin_commit_graph_write_options[] = {
                OPT_STRING(0, "object-dir", &opts.obj_dir,
                        N_("dir"),
                        N_("The object directory to store the graph")),
+               OPT_BOOL(0, "reachable", &opts.reachable,
+                       N_("start walk at all refs")),
                OPT_BOOL(0, "stdin-packs", &opts.stdin_packs,
                        N_("scan pack-indexes listed by stdin for commits")),
                OPT_BOOL(0, "stdin-commits", &opts.stdin_commits,
@@ -104,39 +143,35 @@ static int graph_write(int argc, const char **argv)
                             builtin_commit_graph_write_options,
                             builtin_commit_graph_write_usage, 0);
 
-       if (opts.stdin_packs && opts.stdin_commits)
-               die(_("cannot use both --stdin-commits and --stdin-packs"));
+       if (opts.reachable + opts.stdin_packs + opts.stdin_commits > 1)
+               die(_("use at most one of --reachable, --stdin-commits, or --stdin-packs"));
        if (!opts.obj_dir)
                opts.obj_dir = get_object_directory();
 
+       if (opts.reachable) {
+               write_commit_graph_reachable(opts.obj_dir, opts.append);
+               return 0;
+       }
+
+       string_list_init(&lines, 0);
        if (opts.stdin_packs || opts.stdin_commits) {
                struct strbuf buf = STRBUF_INIT;
-               lines_nr = 0;
-               lines_alloc = 128;
-               ALLOC_ARRAY(lines, lines_alloc);
-
-               while (strbuf_getline(&buf, stdin) != EOF) {
-                       ALLOC_GROW(lines, lines_nr + 1, lines_alloc);
-                       lines[lines_nr++] = strbuf_detach(&buf, NULL);
-               }
-
-               if (opts.stdin_packs) {
-                       pack_indexes = lines;
-                       packs_nr = lines_nr;
-               }
-               if (opts.stdin_commits) {
-                       commit_hex = lines;
-                       commits_nr = lines_nr;
-               }
+
+               while (strbuf_getline(&buf, stdin) != EOF)
+                       string_list_append(&lines, strbuf_detach(&buf, NULL));
+
+               if (opts.stdin_packs)
+                       pack_indexes = &lines;
+               if (opts.stdin_commits)
+                       commit_hex = &lines;
        }
 
        write_commit_graph(opts.obj_dir,
                           pack_indexes,
-                          packs_nr,
                           commit_hex,
-                          commits_nr,
                           opts.append);
 
+       string_list_clear(&lines, 0);
        return 0;
 }
 
@@ -162,6 +197,8 @@ int cmd_commit_graph(int argc, const char **argv, const char *prefix)
        if (argc > 0) {
                if (!strcmp(argv[0], "read"))
                        return graph_read(argc, argv);
+               if (!strcmp(argv[0], "verify"))
+                       return graph_verify(argc, argv);
                if (!strcmp(argv[0], "write"))
                        return graph_write(argc, argv);
        }
index 3ad4f160f9959a30262c81d5c3f85c36649d2895..eca7900ee08841ec3efdaf24555a76e426fbbf28 100644 (file)
@@ -18,6 +18,7 @@
 #include "decorate.h"
 #include "packfile.h"
 #include "object-store.h"
+#include "run-command.h"
 
 #define REACHABLE 0x0001
 #define SEEN      0x0002
@@ -47,6 +48,7 @@ static int name_objects;
 #define ERROR_REACHABLE 02
 #define ERROR_PACK 04
 #define ERROR_REFS 010
+#define ERROR_COMMIT_GRAPH 020
 
 static const char *describe_object(struct object *obj)
 {
@@ -822,5 +824,24 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
        }
 
        check_connectivity();
+
+       if (core_commit_graph) {
+               struct child_process commit_graph_verify = CHILD_PROCESS_INIT;
+               const char *verify_argv[] = { "commit-graph", "verify", NULL, NULL, NULL };
+
+               commit_graph_verify.argv = verify_argv;
+               commit_graph_verify.git_cmd = 1;
+               if (run_command(&commit_graph_verify))
+                       errors_found |= ERROR_COMMIT_GRAPH;
+
+               prepare_alt_odb(the_repository);
+               for (alt =  the_repository->objects->alt_odb_list; alt; alt = alt->next) {
+                       verify_argv[2] = "--object-dir";
+                       verify_argv[3] = alt->path;
+                       if (run_command(&commit_graph_verify))
+                               errors_found |= ERROR_COMMIT_GRAPH;
+               }
+       }
+
        return errors_found;
 }
index ccfb1ceaeb3eb9c6a8cbe9297bceac94fa54bcac..e103f0f85d7995999fbd7e52057e563005ae819e 100644 (file)
@@ -20,6 +20,7 @@
 #include "sigchain.h"
 #include "argv-array.h"
 #include "commit.h"
+#include "commit-graph.h"
 #include "packfile.h"
 #include "object-store.h"
 #include "pack.h"
@@ -40,6 +41,7 @@ static int aggressive_depth = 50;
 static int aggressive_window = 250;
 static int gc_auto_threshold = 6700;
 static int gc_auto_pack_limit = 50;
+static int gc_write_commit_graph;
 static int detach_auto = 1;
 static timestamp_t gc_log_expire_time;
 static const char *gc_log_expire = "1.day.ago";
@@ -129,6 +131,7 @@ static void gc_config(void)
        git_config_get_int("gc.aggressivedepth", &aggressive_depth);
        git_config_get_int("gc.auto", &gc_auto_threshold);
        git_config_get_int("gc.autopacklimit", &gc_auto_pack_limit);
+       git_config_get_bool("gc.writecommitgraph", &gc_write_commit_graph);
        git_config_get_bool("gc.autodetach", &detach_auto);
        git_config_get_expiry("gc.pruneexpire", &prune_expire);
        git_config_get_expiry("gc.worktreepruneexpire", &prune_worktrees_expire);
@@ -641,6 +644,9 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
        if (pack_garbage.nr > 0)
                clean_pack_garbage();
 
+       if (gc_write_commit_graph)
+               write_commit_graph_reachable(get_object_directory(), 0);
+
        if (auto_gc && too_many_loose_objects())
                warning(_("There are too many unreachable loose objects; "
                        "run 'git prune' to remove them."));
index b63a1fc85eaded844fcb1a2634067f9509b5937c..212232e752a812c7707d174fcf19db5f0eb02010 100644 (file)
@@ -7,10 +7,12 @@
 #include "packfile.h"
 #include "commit.h"
 #include "object.h"
+#include "refs.h"
 #include "revision.h"
 #include "sha1-lookup.h"
 #include "commit-graph.h"
 #include "object-store.h"
+#include "alloc.h"
 
 #define GRAPH_SIGNATURE 0x43475048 /* "CGPH" */
 #define GRAPH_CHUNKID_OIDFANOUT 0x4f494446 /* "OIDF" */
 
 #define GRAPH_LAST_EDGE 0x80000000
 
+#define GRAPH_HEADER_SIZE 8
 #define GRAPH_FANOUT_SIZE (4 * 256)
 #define GRAPH_CHUNKLOOKUP_WIDTH 12
-#define GRAPH_MIN_SIZE (5 * GRAPH_CHUNKLOOKUP_WIDTH + GRAPH_FANOUT_SIZE + \
-                       GRAPH_OID_LEN + 8)
+#define GRAPH_MIN_SIZE (GRAPH_HEADER_SIZE + 4 * GRAPH_CHUNKLOOKUP_WIDTH \
+                       + GRAPH_FANOUT_SIZE + GRAPH_OID_LEN)
 
 char *get_commit_graph_filename(const char *obj_dir)
 {
@@ -241,6 +244,10 @@ static struct commit_list **insert_parent_or_die(struct commit_graph *g,
 {
        struct commit *c;
        struct object_id oid;
+
+       if (pos >= g->num_commits)
+               die("invalid parent position %"PRIu64, pos);
+
        hashcpy(oid.hash, g->chunk_oid_lookup + g->hash_len * pos);
        c = lookup_commit(&oid);
        if (!c)
@@ -313,7 +320,7 @@ static int find_commit_in_graph(struct commit *item, struct commit_graph *g, uin
        }
 }
 
-int parse_commit_in_graph(struct commit *item)
+static int parse_commit_in_graph_one(struct commit_graph *g, struct commit *item)
 {
        uint32_t pos;
 
@@ -321,9 +328,21 @@ int parse_commit_in_graph(struct commit *item)
                return 0;
        if (item->object.parsed)
                return 1;
+
+       if (find_commit_in_graph(item, g, &pos))
+               return fill_commit_in_graph(item, g, pos);
+
+       return 0;
+}
+
+int parse_commit_in_graph(struct commit *item)
+{
+       if (!core_commit_graph)
+               return 0;
+
        prepare_commit_graph();
-       if (commit_graph && find_commit_in_graph(item, commit_graph, &pos))
-               return fill_commit_in_graph(item, commit_graph, pos);
+       if (commit_graph)
+               return parse_commit_in_graph_one(commit_graph, item);
        return 0;
 }
 
@@ -349,14 +368,20 @@ static struct tree *load_tree_for_commit(struct commit_graph *g, struct commit *
        return c->maybe_tree;
 }
 
-struct tree *get_commit_tree_in_graph(const struct commit *c)
+static struct tree *get_commit_tree_in_graph_one(struct commit_graph *g,
+                                                const struct commit *c)
 {
        if (c->maybe_tree)
                return c->maybe_tree;
        if (c->graph_pos == COMMIT_NOT_FROM_GRAPH)
-               BUG("get_commit_tree_in_graph called from non-commit-graph commit");
+               BUG("get_commit_tree_in_graph_one called from non-commit-graph commit");
+
+       return load_tree_for_commit(g, (struct commit *)c);
+}
 
-       return load_tree_for_commit(commit_graph, (struct commit *)c);
+struct tree *get_commit_tree_in_graph(const struct commit *c)
+{
+       return get_commit_tree_in_graph_one(commit_graph, c);
 }
 
 static void write_graph_chunk_fanout(struct hashfile *f,
@@ -632,11 +657,28 @@ static void compute_generation_numbers(struct packed_commit_list* commits)
        }
 }
 
+static int add_ref_to_list(const char *refname,
+                          const struct object_id *oid,
+                          int flags, void *cb_data)
+{
+       struct string_list *list = (struct string_list *)cb_data;
+
+       string_list_append(list, oid_to_hex(oid));
+       return 0;
+}
+
+void write_commit_graph_reachable(const char *obj_dir, int append)
+{
+       struct string_list list;
+
+       string_list_init(&list, 1);
+       for_each_ref(add_ref_to_list, &list);
+       write_commit_graph(obj_dir, NULL, &list, append);
+}
+
 void write_commit_graph(const char *obj_dir,
-                       const char **pack_indexes,
-                       int nr_packs,
-                       const char **commit_hex,
-                       int nr_commits,
+                       struct string_list *pack_indexes,
+                       struct string_list *commit_hex,
                        int append)
 {
        struct packed_oid_list oids;
@@ -677,10 +719,10 @@ void write_commit_graph(const char *obj_dir,
                int dirlen;
                strbuf_addf(&packname, "%s/pack/", obj_dir);
                dirlen = packname.len;
-               for (i = 0; i < nr_packs; i++) {
+               for (i = 0; i < pack_indexes->nr; i++) {
                        struct packed_git *p;
                        strbuf_setlen(&packname, dirlen);
-                       strbuf_addstr(&packname, pack_indexes[i]);
+                       strbuf_addstr(&packname, pack_indexes->items[i].string);
                        p = add_packed_git(packname.buf, packname.len, 1);
                        if (!p)
                                die("error adding pack %s", packname.buf);
@@ -693,12 +735,13 @@ void write_commit_graph(const char *obj_dir,
        }
 
        if (commit_hex) {
-               for (i = 0; i < nr_commits; i++) {
+               for (i = 0; i < commit_hex->nr; i++) {
                        const char *end;
                        struct object_id oid;
                        struct commit *result;
 
-                       if (commit_hex[i] && parse_oid_hex(commit_hex[i], &oid, &end))
+                       if (commit_hex->items[i].string &&
+                           parse_oid_hex(commit_hex->items[i].string, &oid, &end))
                                continue;
 
                        result = lookup_commit_reference_gently(&oid, 1);
@@ -808,3 +851,179 @@ void write_commit_graph(const char *obj_dir,
        oids.alloc = 0;
        oids.nr = 0;
 }
+
+#define VERIFY_COMMIT_GRAPH_ERROR_HASH 2
+static int verify_commit_graph_error;
+
+static void graph_report(const char *fmt, ...)
+{
+       va_list ap;
+
+       verify_commit_graph_error = 1;
+       va_start(ap, fmt);
+       vfprintf(stderr, fmt, ap);
+       fprintf(stderr, "\n");
+       va_end(ap);
+}
+
+#define GENERATION_ZERO_EXISTS 1
+#define GENERATION_NUMBER_EXISTS 2
+
+int verify_commit_graph(struct repository *r, struct commit_graph *g)
+{
+       uint32_t i, cur_fanout_pos = 0;
+       struct object_id prev_oid, cur_oid, checksum;
+       int generation_zero = 0;
+       struct hashfile *f;
+       int devnull;
+
+       if (!g) {
+               graph_report("no commit-graph file loaded");
+               return 1;
+       }
+
+       verify_commit_graph_error = 0;
+
+       if (!g->chunk_oid_fanout)
+               graph_report("commit-graph is missing the OID Fanout chunk");
+       if (!g->chunk_oid_lookup)
+               graph_report("commit-graph is missing the OID Lookup chunk");
+       if (!g->chunk_commit_data)
+               graph_report("commit-graph is missing the Commit Data chunk");
+
+       if (verify_commit_graph_error)
+               return verify_commit_graph_error;
+
+       devnull = open("/dev/null", O_WRONLY);
+       f = hashfd(devnull, NULL);
+       hashwrite(f, g->data, g->data_len - g->hash_len);
+       finalize_hashfile(f, checksum.hash, CSUM_CLOSE);
+       if (hashcmp(checksum.hash, g->data + g->data_len - g->hash_len)) {
+               graph_report(_("the commit-graph file has incorrect checksum and is likely corrupt"));
+               verify_commit_graph_error = VERIFY_COMMIT_GRAPH_ERROR_HASH;
+       }
+
+       for (i = 0; i < g->num_commits; i++) {
+               struct commit *graph_commit;
+
+               hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
+
+               if (i && oidcmp(&prev_oid, &cur_oid) >= 0)
+                       graph_report("commit-graph has incorrect OID order: %s then %s",
+                                    oid_to_hex(&prev_oid),
+                                    oid_to_hex(&cur_oid));
+
+               oidcpy(&prev_oid, &cur_oid);
+
+               while (cur_oid.hash[0] > cur_fanout_pos) {
+                       uint32_t fanout_value = get_be32(g->chunk_oid_fanout + cur_fanout_pos);
+
+                       if (i != fanout_value)
+                               graph_report("commit-graph has incorrect fanout value: fanout[%d] = %u != %u",
+                                            cur_fanout_pos, fanout_value, i);
+                       cur_fanout_pos++;
+               }
+
+               graph_commit = lookup_commit(&cur_oid);
+               if (!parse_commit_in_graph_one(g, graph_commit))
+                       graph_report("failed to parse %s from commit-graph",
+                                    oid_to_hex(&cur_oid));
+       }
+
+       while (cur_fanout_pos < 256) {
+               uint32_t fanout_value = get_be32(g->chunk_oid_fanout + cur_fanout_pos);
+
+               if (g->num_commits != fanout_value)
+                       graph_report("commit-graph has incorrect fanout value: fanout[%d] = %u != %u",
+                                    cur_fanout_pos, fanout_value, i);
+
+               cur_fanout_pos++;
+       }
+
+       if (verify_commit_graph_error & ~VERIFY_COMMIT_GRAPH_ERROR_HASH)
+               return verify_commit_graph_error;
+
+       for (i = 0; i < g->num_commits; i++) {
+               struct commit *graph_commit, *odb_commit;
+               struct commit_list *graph_parents, *odb_parents;
+               uint32_t max_generation = 0;
+
+               hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
+
+               graph_commit = lookup_commit(&cur_oid);
+               odb_commit = (struct commit *)create_object(r, cur_oid.hash, alloc_commit_node(r));
+               if (parse_commit_internal(odb_commit, 0, 0)) {
+                       graph_report("failed to parse %s from object database",
+                                    oid_to_hex(&cur_oid));
+                       continue;
+               }
+
+               if (oidcmp(&get_commit_tree_in_graph_one(g, graph_commit)->object.oid,
+                          get_commit_tree_oid(odb_commit)))
+                       graph_report("root tree OID for commit %s in commit-graph is %s != %s",
+                                    oid_to_hex(&cur_oid),
+                                    oid_to_hex(get_commit_tree_oid(graph_commit)),
+                                    oid_to_hex(get_commit_tree_oid(odb_commit)));
+
+               graph_parents = graph_commit->parents;
+               odb_parents = odb_commit->parents;
+
+               while (graph_parents) {
+                       if (odb_parents == NULL) {
+                               graph_report("commit-graph parent list for commit %s is too long",
+                                            oid_to_hex(&cur_oid));
+                               break;
+                       }
+
+                       if (oidcmp(&graph_parents->item->object.oid, &odb_parents->item->object.oid))
+                               graph_report("commit-graph parent for %s is %s != %s",
+                                            oid_to_hex(&cur_oid),
+                                            oid_to_hex(&graph_parents->item->object.oid),
+                                            oid_to_hex(&odb_parents->item->object.oid));
+
+                       if (graph_parents->item->generation > max_generation)
+                               max_generation = graph_parents->item->generation;
+
+                       graph_parents = graph_parents->next;
+                       odb_parents = odb_parents->next;
+               }
+
+               if (odb_parents != NULL)
+                       graph_report("commit-graph parent list for commit %s terminates early",
+                                    oid_to_hex(&cur_oid));
+
+               if (!graph_commit->generation) {
+                       if (generation_zero == GENERATION_NUMBER_EXISTS)
+                               graph_report("commit-graph has generation number zero for commit %s, but non-zero elsewhere",
+                                            oid_to_hex(&cur_oid));
+                       generation_zero = GENERATION_ZERO_EXISTS;
+               } else if (generation_zero == GENERATION_ZERO_EXISTS)
+                       graph_report("commit-graph has non-zero generation number for commit %s, but zero elsewhere",
+                                    oid_to_hex(&cur_oid));
+
+               if (generation_zero == GENERATION_ZERO_EXISTS)
+                       continue;
+
+               /*
+                * If one of our parents has generation GENERATION_NUMBER_MAX, then
+                * our generation is also GENERATION_NUMBER_MAX. Decrement to avoid
+                * extra logic in the following condition.
+                */
+               if (max_generation == GENERATION_NUMBER_MAX)
+                       max_generation--;
+
+               if (graph_commit->generation != max_generation + 1)
+                       graph_report("commit-graph generation for commit %s is %u != %u",
+                                    oid_to_hex(&cur_oid),
+                                    graph_commit->generation,
+                                    max_generation + 1);
+
+               if (graph_commit->date != odb_commit->date)
+                       graph_report("commit date for commit %s in commit-graph is %"PRItime" != %"PRItime,
+                                    oid_to_hex(&cur_oid),
+                                    graph_commit->date,
+                                    odb_commit->date);
+       }
+
+       return verify_commit_graph_error;
+}
index 96cccb10f3d53a6da77f4ad06a971dbaa152f70c..506cb45fb1178256b2d6c9752c28e300fc0791b2 100644 (file)
@@ -2,6 +2,8 @@
 #define COMMIT_GRAPH_H
 
 #include "git-compat-util.h"
+#include "repository.h"
+#include "string-list.h"
 
 char *get_commit_graph_filename(const char *obj_dir);
 
@@ -46,11 +48,12 @@ struct commit_graph {
 
 struct commit_graph *load_commit_graph_one(const char *graph_file);
 
+void write_commit_graph_reachable(const char *obj_dir, int append);
 void write_commit_graph(const char *obj_dir,
-                       const char **pack_indexes,
-                       int nr_packs,
-                       const char **commit_hex,
-                       int nr_commits,
+                       struct string_list *pack_indexes,
+                       struct string_list *commit_hex,
                        int append);
 
+int verify_commit_graph(struct repository *r, struct commit_graph *g);
+
 #endif
index bbca413ba75b20d80ab24dd2401c865def0d90cd..8985c9c049bfcfb96a6d02be4496adc1ce1bb0ba 100644 (file)
--- a/commit.c
+++ b/commit.c
@@ -423,7 +423,7 @@ int parse_commit_buffer(struct commit *item, const void *buffer, unsigned long s
        return 0;
 }
 
-int parse_commit_gently(struct commit *item, int quiet_on_missing)
+int parse_commit_internal(struct commit *item, int quiet_on_missing, int use_commit_graph)
 {
        enum object_type type;
        void *buffer;
@@ -434,7 +434,7 @@ int parse_commit_gently(struct commit *item, int quiet_on_missing)
                return -1;
        if (item->object.parsed)
                return 0;
-       if (parse_commit_in_graph(item))
+       if (use_commit_graph && parse_commit_in_graph(item))
                return 0;
        buffer = read_object_file(&item->object.oid, &type, &size);
        if (!buffer)
@@ -446,6 +446,7 @@ int parse_commit_gently(struct commit *item, int quiet_on_missing)
                return error("Object %s not a commit",
                             oid_to_hex(&item->object.oid));
        }
+
        ret = parse_commit_buffer(item, buffer, size, 0);
        if (save_commit_buffer && !ret) {
                set_commit_buffer(item, buffer, size);
@@ -455,6 +456,11 @@ int parse_commit_gently(struct commit *item, int quiet_on_missing)
        return ret;
 }
 
+int parse_commit_gently(struct commit *item, int quiet_on_missing)
+{
+       return parse_commit_internal(item, quiet_on_missing, 1);
+}
+
 void parse_commit_or_die(struct commit *item)
 {
        if (parse_commit(item))
index 01b8b1d6896b9ce8e532cd49474cc384704fecfc..f089f547ed8e6244d372d23f2305434d1dcec894 100644 (file)
--- a/commit.h
+++ b/commit.h
@@ -77,6 +77,7 @@ struct commit *lookup_commit_reference_by_name(const char *name);
 struct commit *lookup_commit_or_die(const struct object_id *oid, const char *ref_name);
 
 int parse_commit_buffer(struct commit *item, const void *buffer, unsigned long size, int check_graph);
+int parse_commit_internal(struct commit *item, int quiet_on_missing, int use_commit_graph);
 int parse_commit_gently(struct commit *item, int quiet_on_missing);
 static inline int parse_commit(struct commit *item)
 {
index a7e9215ffc370d945ba2ac28b7eff9e8cb1d0679..aec3345adb4f0fb83b511335e0727f1097f97e29 100644 (file)
@@ -12,7 +12,7 @@ expression c;
 
 // These excluded functions must access c->maybe_tree direcly.
 @@
-identifier f !~ "^(get_commit_tree|get_commit_tree_in_graph|load_tree_for_commit)$";
+identifier f !~ "^(get_commit_tree|get_commit_tree_in_graph_one|load_tree_for_commit)$";
 expression c;
 @@
   f(...) {...
index 77d85aefe7da18a0d7cc2db2baaedacead6b2e97..5947de3d2438ea301e33f2fc3430ee703d1436b9 100755 (executable)
@@ -11,6 +11,11 @@ test_expect_success 'setup full repo' '
        objdir=".git/objects"
 '
 
+test_expect_success 'verify graph with no graph file' '
+       cd "$TRASH_DIRECTORY/full" &&
+       git commit-graph verify
+'
+
 test_expect_success 'write graph with no packs' '
        cd "$TRASH_DIRECTORY/full" &&
        git commit-graph write --object-dir . &&
@@ -28,8 +33,8 @@ test_expect_success 'create commits and repack' '
 '
 
 graph_git_two_modes() {
-       git -c core.graph=true $1 >output
-       git -c core.graph=false $1 >expect
+       git -c core.commitGraph=true $1 >output
+       git -c core.commitGraph=false $1 >expect
        test_cmp output expect
 }
 
@@ -200,6 +205,16 @@ test_expect_success 'build graph from commits with append' '
 graph_git_behavior 'append graph, commit 8 vs merge 1' full commits/8 merge/1
 graph_git_behavior 'append graph, commit 8 vs merge 2' full commits/8 merge/2
 
+test_expect_success 'build graph using --reachable' '
+       cd "$TRASH_DIRECTORY/full" &&
+       git commit-graph write --reachable &&
+       test_path_is_file $objdir/info/commit-graph &&
+       graph_read_expect "11" "large_edges"
+'
+
+graph_git_behavior 'append graph, commit 8 vs merge 1' full commits/8 merge/1
+graph_git_behavior 'append graph, commit 8 vs merge 2' full commits/8 merge/2
+
 test_expect_success 'setup bare repo' '
        cd "$TRASH_DIRECTORY" &&
        git clone --bare --no-local full bare &&
@@ -230,4 +245,190 @@ test_expect_success 'perform fast-forward merge in full repo' '
        test_cmp expect output
 '
 
+test_expect_success 'check that gc computes commit-graph' '
+       cd "$TRASH_DIRECTORY/full" &&
+       git commit --allow-empty -m "blank" &&
+       git commit-graph write --reachable &&
+       cp $objdir/info/commit-graph commit-graph-before-gc &&
+       git reset --hard HEAD~1 &&
+       git config gc.writeCommitGraph true &&
+       git gc &&
+       cp $objdir/info/commit-graph commit-graph-after-gc &&
+       ! test_cmp commit-graph-before-gc commit-graph-after-gc &&
+       git commit-graph write --reachable &&
+       test_cmp commit-graph-after-gc $objdir/info/commit-graph
+'
+
+# the verify tests below expect the commit-graph to contain
+# exactly the commits reachable from the commits/8 branch.
+# If the file changes the set of commits in the list, then the
+# offsets into the binary file will result in different edits
+# and the tests will likely break.
+
+test_expect_success 'git commit-graph verify' '
+       cd "$TRASH_DIRECTORY/full" &&
+       git rev-parse commits/8 | git commit-graph write --stdin-commits &&
+       git commit-graph verify >output
+'
+
+NUM_COMMITS=9
+NUM_OCTOPUS_EDGES=2
+HASH_LEN=20
+GRAPH_BYTE_VERSION=4
+GRAPH_BYTE_HASH=5
+GRAPH_BYTE_CHUNK_COUNT=6
+GRAPH_CHUNK_LOOKUP_OFFSET=8
+GRAPH_CHUNK_LOOKUP_WIDTH=12
+GRAPH_CHUNK_LOOKUP_ROWS=5
+GRAPH_BYTE_OID_FANOUT_ID=$GRAPH_CHUNK_LOOKUP_OFFSET
+GRAPH_BYTE_OID_LOOKUP_ID=$(($GRAPH_CHUNK_LOOKUP_OFFSET + \
+                           1 * $GRAPH_CHUNK_LOOKUP_WIDTH))
+GRAPH_BYTE_COMMIT_DATA_ID=$(($GRAPH_CHUNK_LOOKUP_OFFSET + \
+                            2 * $GRAPH_CHUNK_LOOKUP_WIDTH))
+GRAPH_FANOUT_OFFSET=$(($GRAPH_CHUNK_LOOKUP_OFFSET + \
+                      $GRAPH_CHUNK_LOOKUP_WIDTH * $GRAPH_CHUNK_LOOKUP_ROWS))
+GRAPH_BYTE_FANOUT1=$(($GRAPH_FANOUT_OFFSET + 4 * 4))
+GRAPH_BYTE_FANOUT2=$(($GRAPH_FANOUT_OFFSET + 4 * 255))
+GRAPH_OID_LOOKUP_OFFSET=$(($GRAPH_FANOUT_OFFSET + 4 * 256))
+GRAPH_BYTE_OID_LOOKUP_ORDER=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN * 8))
+GRAPH_BYTE_OID_LOOKUP_MISSING=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN * 4 + 10))
+GRAPH_COMMIT_DATA_OFFSET=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN * $NUM_COMMITS))
+GRAPH_BYTE_COMMIT_TREE=$GRAPH_COMMIT_DATA_OFFSET
+GRAPH_BYTE_COMMIT_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN))
+GRAPH_BYTE_COMMIT_EXTRA_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 4))
+GRAPH_BYTE_COMMIT_WRONG_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 3))
+GRAPH_BYTE_COMMIT_GENERATION=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 11))
+GRAPH_BYTE_COMMIT_DATE=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 12))
+GRAPH_COMMIT_DATA_WIDTH=$(($HASH_LEN + 16))
+GRAPH_OCTOPUS_DATA_OFFSET=$(($GRAPH_COMMIT_DATA_OFFSET + \
+                            $GRAPH_COMMIT_DATA_WIDTH * $NUM_COMMITS))
+GRAPH_BYTE_OCTOPUS=$(($GRAPH_OCTOPUS_DATA_OFFSET + 4))
+GRAPH_BYTE_FOOTER=$(($GRAPH_OCTOPUS_DATA_OFFSET + 4 * $NUM_OCTOPUS_EDGES))
+
+# usage: corrupt_graph_and_verify <position> <data> <string>
+# Manipulates the commit-graph file at the position
+# by inserting the data, then runs 'git commit-graph verify'
+# and places the output in the file 'err'. Test 'err' for
+# the given string.
+corrupt_graph_and_verify() {
+       pos=$1
+       data="${2:-\0}"
+       grepstr=$3
+       cd "$TRASH_DIRECTORY/full" &&
+       test_when_finished mv commit-graph-backup $objdir/info/commit-graph &&
+       cp $objdir/info/commit-graph commit-graph-backup &&
+       printf "$data" | dd of="$objdir/info/commit-graph" bs=1 seek="$pos" conv=notrunc &&
+       test_must_fail git commit-graph verify 2>test_err &&
+       grep -v "^+" test_err >err
+       test_i18ngrep "$grepstr" err
+}
+
+test_expect_success 'detect bad signature' '
+       corrupt_graph_and_verify 0 "\0" \
+               "graph signature"
+'
+
+test_expect_success 'detect bad version' '
+       corrupt_graph_and_verify $GRAPH_BYTE_VERSION "\02" \
+               "graph version"
+'
+
+test_expect_success 'detect bad hash version' '
+       corrupt_graph_and_verify $GRAPH_BYTE_HASH "\02" \
+               "hash version"
+'
+
+test_expect_success 'detect low chunk count' '
+       corrupt_graph_and_verify $GRAPH_BYTE_CHUNK_COUNT "\02" \
+               "missing the .* chunk"
+'
+
+test_expect_success 'detect missing OID fanout chunk' '
+       corrupt_graph_and_verify $GRAPH_BYTE_OID_FANOUT_ID "\0" \
+               "missing the OID Fanout chunk"
+'
+
+test_expect_success 'detect missing OID lookup chunk' '
+       corrupt_graph_and_verify $GRAPH_BYTE_OID_LOOKUP_ID "\0" \
+               "missing the OID Lookup chunk"
+'
+
+test_expect_success 'detect missing commit data chunk' '
+       corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_DATA_ID "\0" \
+               "missing the Commit Data chunk"
+'
+
+test_expect_success 'detect incorrect fanout' '
+       corrupt_graph_and_verify $GRAPH_BYTE_FANOUT1 "\01" \
+               "fanout value"
+'
+
+test_expect_success 'detect incorrect fanout final value' '
+       corrupt_graph_and_verify $GRAPH_BYTE_FANOUT2 "\01" \
+               "fanout value"
+'
+
+test_expect_success 'detect incorrect OID order' '
+       corrupt_graph_and_verify $GRAPH_BYTE_OID_LOOKUP_ORDER "\01" \
+               "incorrect OID order"
+'
+
+test_expect_success 'detect OID not in object database' '
+       corrupt_graph_and_verify $GRAPH_BYTE_OID_LOOKUP_MISSING "\01" \
+               "from object database"
+'
+
+test_expect_success 'detect incorrect tree OID' '
+       corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_TREE "\01" \
+               "root tree OID for commit"
+'
+
+test_expect_success 'detect incorrect parent int-id' '
+       corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_PARENT "\01" \
+               "invalid parent"
+'
+
+test_expect_success 'detect extra parent int-id' '
+       corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_EXTRA_PARENT "\00" \
+               "is too long"
+'
+
+test_expect_success 'detect wrong parent' '
+       corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_WRONG_PARENT "\01" \
+               "commit-graph parent for"
+'
+
+test_expect_success 'detect incorrect generation number' '
+       corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_GENERATION "\070" \
+               "generation for commit"
+'
+
+test_expect_success 'detect incorrect generation number' '
+       corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_GENERATION "\01" \
+               "non-zero generation number"
+'
+
+test_expect_success 'detect incorrect commit date' '
+       corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_DATE "\01" \
+               "commit date"
+'
+
+test_expect_success 'detect incorrect parent for octopus merge' '
+       corrupt_graph_and_verify $GRAPH_BYTE_OCTOPUS "\01" \
+               "invalid parent"
+'
+
+test_expect_success 'detect invalid checksum hash' '
+       corrupt_graph_and_verify $GRAPH_BYTE_FOOTER "\00" \
+               "incorrect checksum"
+'
+
+test_expect_success 'git fsck (checks commit-graph)' '
+       cd "$TRASH_DIRECTORY/full" &&
+       git fsck &&
+       corrupt_graph_and_verify $GRAPH_BYTE_FOOTER "\00" \
+               "incorrect checksum" &&
+       test_must_fail git fsck
+'
+
 test_done