all existing objects. You can force recompression by passing the -F option
to linkgit:git-repack[1].
+pack.island::
+ An extended regular expression configuring a set of delta
+ islands. See "DELTA ISLANDS" in linkgit:git-pack-objects[1]
+ for details.
+
+pack.islandCore::
+ Specify an island name which gets to have its objects be
+ packed first. This creates a kind of pseudo-pack at the front
+ of one pack, so that the objects from the specified island are
+ hopefully faster to copy into any pack that should be served
+ to a user requesting these objects. In practice this means
+ that the island specified should likely correspond to what is
+ the most commonly cloned in the repo. See also "DELTA ISLANDS"
+ in linkgit:git-pack-objects[1].
+
pack.deltaCacheSize::
The maximum memory in bytes used for caching deltas in
linkgit:git-pack-objects[1] before writing them out to a pack.
--unpack-unreachable::
Keep unreachable objects in loose form. This implies `--revs`.
+--delta-islands::
+ Restrict delta matches based on "islands". See DELTA ISLANDS
+ below.
+
+
+DELTA ISLANDS
+-------------
+
+When possible, `pack-objects` tries to reuse existing on-disk deltas to
+avoid having to search for new ones on the fly. This is an important
+optimization for serving fetches, because it means the server can avoid
+inflating most objects at all and just send the bytes directly from
+disk. This optimization can't work when an object is stored as a delta
+against a base which the receiver does not have (and which we are not
+already sending). In that case the server "breaks" the delta and has to
+find a new one, which has a high CPU cost. Therefore it's important for
+performance that the set of objects in on-disk delta relationships match
+what a client would fetch.
+
+In a normal repository, this tends to work automatically. The objects
+are mostly reachable from the branches and tags, and that's what clients
+fetch. Any deltas we find on the server are likely to be between objects
+the client has or will have.
+
+But in some repository setups, you may have several related but separate
+groups of ref tips, with clients tending to fetch those groups
+independently. For example, imagine that you are hosting several "forks"
+of a repository in a single shared object store, and letting clients
+view them as separate repositories through `GIT_NAMESPACE` or separate
+repos using the alternates mechanism. A naive repack may find that the
+optimal delta for an object is against a base that is only found in
+another fork. But when a client fetches, they will not have the base
+object, and we'll have to find a new delta on the fly.
+
+A similar situation may exist if you have many refs outside of
+`refs/heads/` and `refs/tags/` that point to related objects (e.g.,
+`refs/pull` or `refs/changes` used by some hosting providers). By
+default, clients fetch only heads and tags, and deltas against objects
+found only in those other groups cannot be sent as-is.
+
+Delta islands solve this problem by allowing you to group your refs into
+distinct "islands". Pack-objects computes which objects are reachable
+from which islands, and refuses to make a delta from an object `A`
+against a base which is not present in all of `A`'s islands. This
+results in slightly larger packs (because we miss some delta
+opportunities), but guarantees that a fetch of one island will not have
+to recompute deltas on the fly due to crossing island boundaries.
+
+When repacking with delta islands the delta window tends to get
+clogged with candidates that are forbidden by the config. Repacking
+with a big --window helps (and doesn't take as long as it otherwise
+might because we can reject some object pairs based on islands before
+doing any computation on the content).
+
+Islands are configured via the `pack.island` option, which can be
+specified multiple times. Each value is a left-anchored regular
+expressions matching refnames. For example:
+
+-------------------------------------------
+[pack]
+island = refs/heads/
+island = refs/tags/
+-------------------------------------------
+
+puts heads and tags into an island (whose name is the empty string; see
+below for more on naming). Any refs which do not match those regular
+expressions (e.g., `refs/pull/123`) is not in any island. Any object
+which is reachable only from `refs/pull/` (but not heads or tags) is
+therefore not a candidate to be used as a base for `refs/heads/`.
+
+Refs are grouped into islands based on their "names", and two regexes
+that produce the same name are considered to be in the same
+island. The names are computed from the regexes by concatenating any
+capture groups from the regex, with a '-' dash in between. (And if
+there are no capture groups, then the name is the empty string, as in
+the above example.) This allows you to create arbitrary numbers of
+islands. Only up to 14 such capture groups are supported though.
+
+For example, imagine you store the refs for each fork in
+`refs/virtual/ID`, where `ID` is a numeric identifier. You might then
+configure:
+
+-------------------------------------------
+[pack]
+island = refs/virtual/([0-9]+)/heads/
+island = refs/virtual/([0-9]+)/tags/
+island = refs/virtual/([0-9]+)/(pull)/
+-------------------------------------------
+
+That puts the heads and tags for each fork in their own island (named
+"1234" or similar), and the pull refs for each go into their own
+"1234-pull".
+
+Note that we pick a single island for each regex to go into, using "last
+one wins" ordering (which allows repo-specific config to take precedence
+over user-wide config, and so forth).
+
SEE ALSO
--------
linkgit:git-rev-list[1]
#include "streaming.h"
#include "thread-utils.h"
#include "pack-bitmap.h"
+#include "delta-islands.h"
#include "reachable.h"
#include "sha1-array.h"
#include "argv-array.h"
static struct pack_idx_entry **written_list;
static uint32_t nr_result, nr_written, nr_seen;
+static uint32_t write_layer;
static int non_empty;
static int reuse_delta = 1, reuse_object = 1;
static int exclude_promisor_objects;
+static int use_delta_islands;
+
static unsigned long delta_cache_size = 0;
static unsigned long max_delta_cache_size = DEFAULT_DELTA_CACHE_SIZE;
static unsigned long cache_max_small_delta_size = 1000;
unsigned int *endp,
struct object_entry *e)
{
- if (e->filled)
+ if (e->filled || e->layer != write_layer)
return;
wo[(*endp)++] = e;
e->filled = 1;
* Finally all the rest in really tight order
*/
for (i = last_untagged; i < to_pack.nr_objects; i++) {
- if (!objects[i].filled)
+ if (!objects[i].filled && objects[i].layer == write_layer)
add_family_to_write_order(wo, wo_end, &objects[i]);
}
}
static struct object_entry **compute_write_order(void)
{
+ uint32_t max_layers = 1;
unsigned int i, wo_end;
struct object_entry **wo;
*/
for_each_tag_ref(mark_tagged, NULL);
- /*
- * Give the objects in the original recency order until
- * we see a tagged tip.
- */
+ if (use_delta_islands)
+ max_layers = compute_pack_layers(&to_pack);
+
ALLOC_ARRAY(wo, to_pack.nr_objects);
wo_end = 0;
- compute_layer_order(wo, &wo_end);
+ for (; write_layer < max_layers; ++write_layer)
+ compute_layer_order(wo, &wo_end);
if (wo_end != to_pack.nr_objects)
die("ordered %u objects, expected %"PRIu32, wo_end, to_pack.nr_objects);
break;
}
- if (base_ref && (base_entry = packlist_find(&to_pack, base_ref, NULL))) {
+ if (base_ref && (base_entry = packlist_find(&to_pack, base_ref, NULL)) &&
+ in_same_island(&entry->idx.oid, &base_entry->idx.oid)) {
/*
* If base_ref was set above that means we wish to
* reuse delta data, and we even found that base
return -1;
if (a->preferred_base < b->preferred_base)
return 1;
+ if (use_delta_islands) {
+ int island_cmp = island_delta_cmp(&a->idx.oid, &b->idx.oid);
+ if (island_cmp)
+ return island_cmp;
+ }
if (a_size > b_size)
return -1;
if (a_size < b_size)
if (trg_size < src_size / 32)
return 0;
+ if (!in_same_island(&trg->entry->idx.oid, &src->entry->idx.oid))
+ return 0;
+
/* Load data if not already done */
if (!trg->data) {
read_lock();
uint32_t i, nr_deltas;
unsigned n;
+ if (use_delta_islands)
+ resolve_tree_islands(progress, &to_pack);
+
get_object_details();
/*
if (write_bitmap_index)
index_commit_for_bitmap(commit);
+
+ if (use_delta_islands)
+ propagate_island_marks(commit);
}
static void show_object(struct object *obj, const char *name, void *data)
add_preferred_base_object(name);
add_object_entry(&obj->oid, obj->type, name, 0);
obj->flags |= OBJECT_ADDED;
+
+ if (use_delta_islands) {
+ const char *p;
+ unsigned depth = 0;
+ struct object_entry *ent;
+
+ for (p = strchr(name, '/'); p; p = strchr(p + 1, '/'))
+ depth++;
+
+ ent = packlist_find(&to_pack, obj->oid.hash, NULL);
+ if (ent && depth > ent->tree_depth)
+ ent->tree_depth = depth;
+ }
}
static void show_object__ma_allow_any(struct object *obj, const char *name, void *data)
if (use_bitmap_index && !get_object_list_from_bitmap(&revs))
return;
+ if (use_delta_islands)
+ load_delta_islands();
+
if (prepare_revision_walk(&revs))
die("revision walk setup failed");
mark_edges_uninteresting(&revs, show_edge);
option_parse_missing_action },
OPT_BOOL(0, "exclude-promisor-objects", &exclude_promisor_objects,
N_("do not pack objects in promisor packfiles")),
+ OPT_BOOL(0, "delta-islands", &use_delta_islands,
+ N_("respect islands during delta compression")),
OPT_END(),
};
if (pack_to_stdout || !rev_list_all)
write_bitmap_index = 0;
+ if (use_delta_islands)
+ argv_array_push(&rp, "--topo-order");
+
if (progress && all_progress_implied)
progress = 2;