fetch-pack: avoid quadratic behavior in rev_list_push
When we call find_common to start finding common ancestors
with the remote side of a fetch, the first thing we do is
insert the tip of each ref into our rev_list linked list. We
keep the list sorted the whole time with
commit_list_insert_by_date, which means our insertion ends
up doing O(n^2) timestamp comparisons.
We could teach rev_list_push to use an unsorted list, and
then sort it once after we have added each ref. However, in
get_rev, we process the list by popping commits off the
front and adding parents back in timestamp-sorted order. So
that procedure would still operate on the large list.
Instead, we can replace the linked list with a heap-based
priority queue, which can do O(log n) insertion, making the
whole insertion procedure O(n log n).
As a result of switching to the prio_queue struct, we fix
two minor bugs:
1. When we "pop" a commit in get_rev, and when we clear
the rev_list in find_common, we do not take care to
free the "struct commit_list", and just leak its
memory. With the prio_queue implementation, the memory
management is handled for us.
2. In get_rev, we look at the head commit of the list,
possibly push its parents onto the list, and then "pop"
the front of the list off, assuming it is the same
element that we just peeked at. This is typically going
to be the case, but would not be in the face of clock
skew: the parents are inserted by date, and could
potentially be inserted at the head of the list if they
have a timestamp newer than their descendent. In this
case, we would accidentally pop the parent, and never
process it at all.
The new implementation pulls the commit off of the
queue as we examine it, and so does not suffer from
this problem.
With this patch, a fetch of a single commit into a
repository with 50,000 refs went from:
real 0m7.984s
user 0m7.852s
sys 0m0.120s
to:
real 0m2.017s
user 0m1.884s
sys 0m0.124s
Before this patch, a larger case with 370K refs still had
not completed after tens of minutes; with this patch, it
completes in about 12 seconds.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
commit.c: make compare_commits_by_commit_date global
This helper function was introduced as a prio_queue
comparator to help topological sorting. However, other users
of prio_queue who want to replace commit_list_insert_by_date
will want to use it, too. So let's make it public.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
fetch-pack: avoid quadratic list insertion in mark_complete
We insert the commit pointed to by each ref one-by-one into
the "complete" commit_list using insert_by_date. Because
each insertion is O(n), we end up with O(n^2) behavior.
This typically doesn't matter, because the number of refs is
reasonably small. And even if there are a lot of refs, they
often point to a smaller set of objects (in which case the
optimization in commit ea5f220 keeps our "n" small).
However, in pathological repositories (hundreds of thousands
of refs, each pointing to a unique commit), this quadratic
behavior can make a difference. Since we do not care about
the list order until we have finished building it, we can
simply keep it unsorted during the insertion phase, then
sort it afterwards.
On a repository like the one described above, this dropped
the time to do a no-op fetch from 2.0s to 1.7s. On normal
repositories, it probably does not matter at all, but it
does not hurt to protect ourselves from pathological cases.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git log" learned the "--author-date-order" option, with which the
output is topologically sorted and commits in parallel histories
are shown intermixed together based on the author timestamp.
* jc/topo-author-date-sort:
t6003: add --author-date-order test
topology tests: teach a helper to set author dates as well
t6003: add --date-order test
topology tests: teach a helper to take abbreviated timestamps
t/lib-t6000: style fixes
log: --author-date-order
sort-in-topological-order: use prio-queue
prio-queue: priority queue of pointers to structs
toposort: rename "lifo" field
Allow adding custom information to commit objects in order to
represent unbound number of flag bits etc.
* jk/commit-info-slab:
commit-slab: introduce a macro to define a slab for new type
commit-slab: avoid large realloc
commit: allow associating auxiliary info on-demand
* maint:
Start preparing for 1.8.3.3
check-ignore doc: fix broken link to ls-files page
test: spell 'ls-files --delete' option correctly in test descriptions
"git pack-refs" that races with new ref creation or deletion have
been susceptible to lossage of refs under right conditions, which
has been tightened up.
* mh/ref-races:
for_each_ref: load all loose refs before packed refs
get_packed_ref_cache: reload packed-refs file when it changes
add a stat_validity struct
Extract a struct stat_data from cache_entry
packed_ref_cache: increment refcount when locked
do_for_each_entry(): increment the packed refs cache refcount
refs: manage lifetime of packed refs cache via reference counting
refs: implement simple transactions for the packed-refs file
refs: wrap the packed refs cache in a level of indirection
pack_refs(): split creation of packed refs and entry writing
repack_without_ref(): split list curation and entry writing
"git diff" learned a mode that ignores hunks whose change consists
only of additions and removals of blank lines, which is the same as
"diff -B" (ignore blank lines) of GNU diff.
We read loose and packed rerferences in two steps, but after
deciding to read a loose ref but before actually opening it to read
it, another process racing with us can unlink it, which would cause
us to barf. Update the codepath to retry when such a race is
detected.
* mh/loose-refs-race-with-pack-ref:
resolve_ref_unsafe(): close race condition reading loose refs
resolve_ref_unsafe(): handle the case of an SHA-1 within loop
resolve_ref_unsafe(): extract function handle_missing_loose_ref()
Allow various subcommands of "git submodule" to be run not from the
top of the working tree of the superproject.
* jk/submodule-subdirectory-ok:
submodule: drop the top-level requirement
rev-parse: add --prefix option
submodule: show full path in error message
t7403: add missing && chaining
t7403: modernize style
t7401: make indentation consistent
Newer MacOS X encourages the programs to compile and link with their
CommonCrypto, not with OpenSSL.
* da/darwin:
imap-send: eliminate HMAC deprecation warnings on Mac OS X
cache.h: eliminate SHA-1 deprecation warnings on Mac OS X
Makefile: add support for Apple CommonCrypto facility
Makefile: fix default regex settings on Darwin
Merge branch 'nd/clone-connectivity-shortcut' (early part) into maint
Cloning with "git clone --depth N" while fetch.fsckobjects (or
transfer.fsckobjects) is set to true did not tell the cut-off points
of the shallow history to the process that validates the objects and
the history received, causing the validation to fail.
* 'nd/clone-connectivity-shortcut' (early part):
fetch-pack: prepare updated shallow file before fetching the pack
clone: let the user know when check_everything_connected is run
* rr/push-head:
push: make push.default = current use resolved HEAD
push: fail early with detached HEAD and current
push: factor out the detached HEAD error message
Merge branch 'jh/checkout-auto-tracking' into maint
* jh/checkout-auto-tracking:
glossary: Update and rephrase the definition of a remote-tracking branch
branch.c: Validate tracking branches with refspecs instead of refs/remotes/*
t9114.2: Don't use --track option against "svn-remote"-tracking branches
t7201.24: Add refspec to keep --track working
t3200.39: tracking setup should fail if there is no matching refspec.
checkout: Use remote refspecs when DWIMming tracking branches
t2024: Show failure to use refspec when DWIMming remote branch names
t2024: Add tests verifying current DWIM behavior of 'git checkout <branch>'
* bc/checkout-tracking-name-plug-leak:
t/t9802: explicitly name the upstream branch to use as a base
builtin/checkout.c: don't leak memory in check_tracking_name
Finishing touches for the "git rebase --autostash" feature
introduced earlier.
* rr/rebase-stash-store:
rebase: use 'git stash store' to simplify logic
stash: introduce 'git stash store'
stash: simplify option parser for create
stash doc: document short form -p in synopsis
stash doc: add a warning about using create
Fix for the codepath to parse patches that add new files, generated
by programs other than Git. THis is an old breakage in v1.7.11 and
will need to be merged down to the maintanance tracks.
* tr/maint-apply-non-git-patch-parsefix:
apply: carefully strdup a possibly-NULL name
Recent "rebase --autostash" update made it impossible to recover
with "git am --abort" from a repository where "git am" without mbox
was run by mistake and then was killed with "^C".
* rr/am-quit-empty-then-abort-fix:
t/am: use test_path_is_missing() where appropriate
am: handle stray $dotest directory
Allow various commit objects to be given to "git rebase" by ':/look
for this string' syntax, e.g. "git rebase --onto ':/there'".
* rr/rebase-sha1-by-string-query:
rebase: use peel_committish() where appropriate
sh-setup: add new peel_committish() helper
t/rebase: add failing tests for a peculiar revision
Make it possible to call into copy-notes API from the sequencer code.
* jh/libify-note-handling:
Move create_notes_commit() from notes-merge.c into notes-utils.c
Move copy_note_for_rewrite + friends from builtin/notes.c to notes-utils.c
finish_copy_notes_for_rewrite(): Let caller provide commit message
* mz/rebase-tests:
rebase topology tests: fix commit names on case-insensitive file systems
tests: move test for rebase messages from t3400 to t3406
t3406: modernize style
add tests for rebasing merged history
add tests for rebasing root
add tests for rebasing of empty commits
add tests for rebasing with patch-equivalence present
add simple tests of consistency across rebase types
* jk/apache-test-for-2.4:
lib-httpd/apache.conf: check version only after mod_version loads
t/lib-httpd/apache.conf: configure an MPM module for apache 2.4
t/lib-httpd/apache.conf: load compat access module in apache 2.4
t/lib-httpd/apache.conf: load extra auth modules in apache 2.4
t/lib-httpd/apache.conf: do not use LockFile in apache >= 2.4
* cm/remote-mediawiki-perlcritique: (31 commits)
git-remote-mediawiki: make error message more precise
git-remote-mediawiki: add a perlcritic rule in Makefile
git-remote-mediawiki: add a .perlcriticrc file
git-remote-mediawiki: clearly rewrite double dereference
git-remote-mediawiki: fix a typo ("mediwiki" instead of "mediawiki")
git-remote-mediawiki: put non-trivial numeric values in constants.
git-remote-mediawiki: don't use quotes for empty strings
git-remote-mediawiki: replace "unless" statements with negated "if" statements
git-remote-mediawiki: brace file handles for print for more clarity
git-remote-mediawiki: modify strings for a better coding-style
git-remote-mediawiki: put long code into a subroutine
git-remote-mediawiki: remove import of unused open2
git-remote-mediawiki: check return value of open
git-remote-mediawiki: assign a variable as undef and make proper indentation
git-remote-mediawiki: rename a variable ($last) which has the name of a keyword
git-remote-mediawiki: remove unused variable $entry
git-remote-mediawiki: turn double-negated expressions into simple expressions
git-remote-mediawiki: change the name of a variable
git-remote-mediawiki: add newline in the end of die() error messages
git-remote-mediawiki: change style in a regexp
...
* rr/rebase-autostash:
rebase: finish_rebase() in noop rebase
rebase: finish_rebase() in fast-forward rebase
rebase: guard against missing files in read_basic_state()
"git status" learned status.branch and status.short configuration
variables to use --branch and --short options by default (override
with --no-branch and --no-short options from the command line).
* jg/status-config:
status: introduce status.branch to enable --branch by default
status: introduce status.short to enable --short by default
add -i: add extra options at the right place in "diff" command line
Appending "--diff-algorithm=histogram" at the end of canned command
line for various modes of "diff" is correct for most of them but not
for "stash" that has a non-option already wired in, like so:
'stash' => {
DIFF => 'diff-index -p HEAD',
Appending an extra option after non-option may happen to work due to
overly lax command line parser, but that is not something we should
rely on. Instead, splice in the extra argument immediately after the
command name (i.e. 'diff-index', 'diff-files', etc.).
user-manual: Update download size for Git and the kernel
They've grown since d19fbc3 (Documentation: add git user's manual,
2007-01-07) when the stats were initially added. I've rounded
download sizes up to the nearest multiple of ten MiB to decrease the
precision and give a bit of growing room. Exact sizes:
lib-httpd/apache.conf: check version only after mod_version loads
Commit 0442743 introduced an <IfVersion> directive near the
top of the apache config file. However, at that point we
have not yet checked for and loaded the mod_version module.
This means that the directive will behave oddly if
mod_version is dynamically loaded, failing to match when it
should.
We can fix this by moving the whole block below the
LoadModule directive for mod_version.
Reported-by: Brian Gernhardt <brian@gernhardtsoftware.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Tweak the --topo/date-order test vector a bit and mark the author
dates of two commits (a2 and a3) earlier than their own committer
dates, making them much older than other commits that are made on
parallel branches to simulate the case where a long running topic
was rebased recently.
They will show up as recent in the --date-order output due to their
timestamps, but they appear a lot later in the --author-date-order
output, even though their committer timestamp says otherwise.
The "--date-order" output is a slight twist of "--topo-order" in
that commits from parallel histories are shown in their committer
date order without an attempt to clump commits from a single line
of history together like --topo-order does.
topology tests: teach a helper to take abbreviated timestamps
The on_committer_date helper in t/lib-t6000 is used in t6002 and
t6003 with timestamps on a single day within a single minute
(i.e. 1971-08-16 00:00) and the tests repeat this over and over.
The actual value of the timestamp, however, does not matter very
much; only their relative ordering does.
Introduce another helper to expand only the suffix of the timestamp
to a full timestamp to make the lines shorter, and use it in this
helper. Also, because all the commits in the test are made with
specific GIT_COMMITTER_DATE, stop unsetting it at the end of the
helper.
We'll be specifying the author timestamp to these test commits in a
later patch, which will be helped with this change.
Also remove a test that was commented-out from t6003; it used to
test a commit with the same parent listed twice, which was allowed
by mistake but was fixed long time ago.
While both GUI and console Cygwin browsers do exist, anecdotal evidence
suggests most users rely on their native Windows browser. cygstart,
which is a long-standing part of the base Cygwin installation, will
cause the page to be opened in the default Windows browser (the one
registered to open .html files).
Signed-off-by: Yaakov Selkowitz <yselkowitz@users.sourceforge.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
transport-helper: be quiet on read errors from helpers
Prior to commit 81d340d4, we did not print any error message
if a remote transport helper died unexpectedly. If a helper
did not print any error message (e.g., because it crashed),
the user could be left confused. That commit tried to
rectify the situation by printing a note that the helper
exited unexpectedly.
However, this makes a much more common case worse: when a
helper does die with a useful message, we print the extra
"Reading from 'git-remote-foo failed" message. This can also
end up confusing users, as they may not even know what
remote helpers are (e.g., the fact that http support comes
through git-remote-https is purely an implementation detail
that most users do not know or care about).
Since we do not have a good way of knowing whether the
helper printed a useful error, and since the common failure
mode is for it to do so, let's default to remaining quiet.
Debuggers can dig further by setting GIT_TRANSPORT_HELPER_DEBUG.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2901bbe (apply: free patch->{def,old,new}_name fields, 2012-03-21)
cleaned up the memory management of filenames in the patches, but
forgot that find_name_traditional() can return NULL as a way of saying
"I couldn't find a name".
That NULL unfortunately gets passed into xstrdup() next, resulting in
a segfault. Use null_strdup() so as to safely propagate the null,
which will let us emit the correct error message.
Reported-by: DevHC on #git Signed-off-by: Thomas Rast <trast@inf.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* fc/makefile:
Makefile: use $^ to avoid listing prerequisites on the command line
build: do not install git-remote-testgit
build: generate and clean test scripts