git clone now reports its progress to standard error, which throws off
t5570. Using test_i18ngrep instead of test_cmp allows the test to be
more flexible by only looking for the expected error and ignoring any
other output from the program.
Signed-off-by: Brian Gernhardt <brian@gernhardtsoftware.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Merge branch 'jk/clone-progress-to-stderr' into maint
"git clone" gave some progress messages to the standard output, not to
the standard error, and did not allow suppressing them with the
"--no-progress" option.
* jk/clone-progress-to-stderr:
clone: always set transport options
clone: treat "checking connectivity" like other progress
clone: send diagnostic messages to stderr
sha1_file: move comment about return value where it belongs
Commit 5b0864070 (sha1_object_info_extended: make type calculation
optional, Jul 12 2013) changed the return value of the
sha1_object_info_extended function to 0/-1 for success/error.
Previously this function returned the object type for success or
-1 for error. But unfortunately the above commit forgot to change
or move the comment above this function that says "returns enum
object_type or negative".
To fix this inconsistency, let's move the comment above the
sha1_object_info function where it is still true.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Merge branch 'jc/ls-files-killed-optim' into maint
"git ls-files -k" needs to crawl only the part of the working tree
that may overlap the paths in the index to find killed files, but
shared code with the logic to find all the untracked files, which
made it unnecessarily inefficient.
* jc/ls-files-killed-optim:
dir.c::test_one_path(): work around directory_exists_in_index_icase() breakage
t3010: update to demonstrate "ls-files -k" optimization pitfalls
ls-files -k: a directory only can be killed if the index has a non-directory
dir.c: use the cache_* macro to access the current index
Merge branch 'jh/checkout-auto-tracking' into maint
"git branch --track" had a minor regression in v1.8.3.2 and later
that made it impossible to base your local work on anything but a
local branch of the upstream repository you are tracking from.
* jh/checkout-auto-tracking:
t3200: fix failure on case-insensitive filesystems
branch.c: Relax unnecessary requirement on upstream's remote ref name
t3200: Add test demonstrating minor regression in 41c21f2
Refer to branch.<name>.remote/merge when documenting --track
t3200: Minor fix when preparing for tracking failure
t2024: Fix &&-chaining and a couple of typos
When there is no sufficient overlap between old and new history
during a "git fetch" into a shallow repository, objects that the
sending side knows the receiving end has were unnecessarily sent.
* nd/fetch-into-shallow:
Add testcase for needless objects during a shallow fetch
list-objects: mark more commits as edges in mark_edges_uninteresting
list-objects: reduce one argument in mark_edges_uninteresting
upload-pack: delegate rev walking in shallow fetch to pack-objects
shallow: add setup_temporary_shallow()
shallow: only add shallow graft points to new shallow file
move setup_alternate_shallow and write_shallow_commits to shallow.c
* es/rebase-i-no-abbrev:
rebase -i: fix short SHA-1 collision
t3404: rebase -i: demonstrate short SHA-1 collision
t3404: make tests more self-contained
A range notation "A..B" means exactly the same thing as what "^A B"
means, i.e. the set of commits that are reachable from B but not
from A. But the internal representation after the revision parser
parsed these two notations are subtly different.
- "rev-list ^A B" leaves A and B in the revs->pending.objects[]
array, with the former marked as UNINTERESTING and the revision
traversal machinery propagates the mark to underlying commit
objects A^0 and B^0.
- "rev-list A..B" peels tags and leaves A^0 (marked as
UNINTERESTING) and B^0 in revs->pending.objects[] array before
the traversal machinery kicks in.
This difference usually does not matter, but starts to matter when
the --objects option is used. For example, we see this:
With the former invocation, the revision traversal machinery never
hears about the tag v1.8.4 (it only sees the result of peeling it,
i.e. the commit v1.8.4^0), and the tag itself does not appear in the
output. The latter does send the tag object itself to the output.
Make the range notation keep the unpeeled objects and feed them to
the traversal machinery to fix this inconsistency.
We assume the name starts the line and runs until the first
"<". That starts the email address, which runs until the
first ">". Everything after that is assumed to be the
timestamp.
This works fine in the normal case, but is easily broken by
corrupted ident lines that contain an extra ">". Some
examples seen in the wild are:
Currently each of these produces some email address (which
is not necessarily the one the user intended) and end up
with a NULL date (which is generally interpreted as the
epoch by "git log" and friends).
But in each case we could get the correct timestamp simply
by parsing from the right-hand side, looking backwards for
the final ">", and then reading the timestamp from there.
In general, it's a losing battle to try to automatically
guess what the user meant with their broken crud. But this
particular workaround is probably worth doing. One, it's
dirt simple, and can't impact non-broken cases. Two, it
doesn't catch a single breakage we've seen, but rather a
large class of errors (i.e., any breakage inside the email
angle brackets may affect the email, but won't spill over
into the timestamp parsing). And three, the timestamp is
arguably more valuable to get right, because it can affect
correctness (e.g., in --until cutoffs).
This patch implements the right-to-left scheme described
above. We adjust the tests in t4212, which generate a commit
with such a broken ident, and now gets the timestamp right.
We also add a test that fsck continues to detect the
breakage.
For reference, here are pointers to the breakages seen (as
numbered above):
clone --branch: refuse to clone if upstream repo is empty
Since 920b691 (clone: refuse to clone if --branch
points to bogus ref) we refuse to clone with option
"-b" if the specified branch does not exist in the
(non-empty) upstream. If the upstream repository is empty,
the branch doesn't exist, either. So refuse the clone too.
Reported-by: Robert Mitwicki <robert.mitwicki@opensoftware.pl> Signed-off-by: Ralf Thielow <ralf.thielow@gmail.com> Acked-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
merge-recursive: fix parsing of "diff-algorithm" option
The "diff-algorithm" option to the recursive merge strategy takes the
name of the algorithm as an option, but it uses strcmp on the option
string to check if it starts with "diff-algorithm=", meaning that this
options cannot actually be used.
Fix this by switching to prefixcmp. At the same time, clarify the
following line by using strlen instead of a hard-coded length, which
also makes it consistent with nearby code.
Reported-by: Luke Noel-Storr <luke.noel-storr@integrate.co.uk> Signed-off-by: John Keeping <john@keeping.me.uk> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
* km/svn-1.8-serf-only:
Git.pm: revert _temp_cache use of temp_is_locked
git-svn: allow git-svn fetching to work using serf
Git.pm: add new temp_is_locked function
git-remote-mediawiki: bugfix for pages w/ >500 revisions
Mediawiki introduces a new API for queries w/ more than 500 results in
version 1.21. That change triggered an infinite loop while cloning a
mediawiki with such a page.
The latest API renamed and moved the "continuing" information in the
response, necessary to build the next query. The code failed to retrieve
that information but still detected that it was in a "continuing
query". As a result, it launched the same query over and over again.
If a "continuing" information is detected in the response (old or new),
the next query is updated accordingly. If not, we quit assuming it's not
a continuing query.
Reported-by: Benjamin Cathey Signed-off-by: Benoit Person <benoit.person@gmail.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
format-patch: print in-body "From" only when needed
Commit a908047 taught format-patch the "--from" option,
which places the author ident into an in-body from header,
and uses the committer ident in the rfc822 from header. The
documentation claims that it will omit the in-body header
when it is the same as the rfc822 header, but the code never
implemented that behavior.
This patch completes the feature by comparing the two idents
and doing nothing when they are the same (this is the same
as simply omitting the in-body header, as the two are by
definition indistinguishable in this case). This makes it
reasonable to turn on "--from" all the time (if it matches
your particular workflow), rather than only using it when
exporting other people's patches.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Most of git's traversals are robust against minor breakages
in commit data. For example, "git log" will still output an
entry for a commit that has a broken encoding or missing
author, and will not abort the whole operation.
Shortlog, on the other hand, will die as soon as it sees a
commit without an author, meaning that a repository with
a broken commit cannot get any shortlog output at all.
Let's downgrade this fatal error to a warning, and continue
the operation.
We simply ignore the commit and do not count it in the total
(since we do not have any author under which to file it).
Alternatively, we could output some kind of "<empty>" record
to collect these bogus commits. It is probably not worth it,
though; we have already warned to stderr, so the user is
aware that such bogosities exist, and any placeholder we
came up with would either be syntactically invalid, or would
potentially conflict with real data.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
A clone will always create a transport struct, whether we
are cloning locally or using an actual protocol. In the
local case, we only use the transport to get the list of
refs, and then transfer the objects out-of-band.
However, there are many options that we do not bother
setting up in the local case. For the most part, these are
noops, because they only affect the object-fetching stage
(e.g., the --depth option). However, some options do have a
visible impact. For example, giving the path to upload-pack
via "-u" does not currently work for a local clone, even
though we need upload-pack to get the ref list.
We can just drop the conditional entirely and set these
options for both local and non-local clones. Rather than
keep track of which options impact the object versus the ref
fetching stage, we can simply let the noops be noops (and
the cost of setting the options in the first place is not
high).
The one exception is that we also check that the transport
provides both a "get_refs_list" and a "fetch" method. We
will now be checking the former for both cases (which is
good, since a transport that cannot fetch refs would not
work for a local clone), and we tweak the conditional to
check for a "fetch" only when we are non-local.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
clone: treat "checking connectivity" like other progress
When stderr does not point to a tty, we typically suppress
"we are now in this phase" progress reporting (e.g., we ask
the server not to send us "counting objects" and the like).
The new "checking connectivity" message is in the same vein,
and should be suppressed. Since clone relies on the
transport code to make the decision, we can simply sneak a
peek at the "progress" field of the transport struct. That
properly takes into account both the verbosity and progress
options we were given, as well as the result of isatty().
Note that we do not set up that progress flag for a local
clone, as we do not fetch using the transport at all. That's
acceptable here, though, because we also do not perform a
connectivity check in that case.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Putting messages like "Cloning into.." and "done" on stdout
is un-Unix and uselessly clutters the stdout channel. Send
them to stderr.
We have to tweak two tests to accommodate this:
1. t5601 checks for doubled output due to forking, and
doesn't actually care where the output goes; adjust it
to check stderr.
2. t5702 is trying to test whether progress output was
sent to stderr, but naively does so by checking
whether stderr produced any output. Instead, have it
look for "%", a token found in progress output but not
elsewhere (and which lets us avoid hard-coding the
progress text in the test).
This should not regress any scripts that try to parse the
current output, as the output is already internationalized
and therefore unstable.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Merge branch 'bc/completion-for-bash-3.0' into maint
Some people still use rather old versions of bash, which cannot grok
some constructs like 'printf -v varname' the prompt and completion
code started to use recently.
* bc/completion-for-bash-3.0:
contrib/git-prompt.sh: handle missing 'printf -v' more gracefully
t9902-completion.sh: old Bash still does not support array+=('') notation
git-completion.bash: use correct Bash/Zsh array length syntax
Merge branch 'mm/no-shell-escape-in-die-message' into maint
Fixes a minor bug in "git rebase -i" (there could be others, as the
root cause is pretty generic) where the code feeds a random, data
dependeant string to 'echo' and expects it to come out literally.
* mm/no-shell-escape-in-die-message:
die_with_status: use "printf '%s\n'", not "echo"
Merge branch 'tr/log-full-diff-keep-true-parents' into maint
Output from "git log --full-diff -- <pathspec>" looked strange,
because comparison was done with the previous ancestor that touched
the specified <pathspec>, causing the patches for paths outside the
pathspec to show more than the single commit has changed.
* tr/log-full-diff-keep-true-parents:
log: use true parents for diff when walking reflogs
log: use true parents for diff even when rewriting
Merge branch 'jc/transport-do-not-use-connect-twice-in-fetch' into maint
The auto-tag-following code in "git fetch" tries to reuse the same
transport twice when the serving end does not cooperate and does
not give tags that point to commits that are asked for as part of
the primary transfer. Unfortunately, Git-aware transport helper
interface is not designed to be used more than once, hence this
does not work over smart-http transfer.
* jc/transport-do-not-use-connect-twice-in-fetch:
builtin/fetch.c: Fix a sparse warning
fetch: work around "transport-take-over" hack
fetch: refactor code that fetches leftover tags
fetch: refactor code that prepares a transport
fetch: rename file-scope global "transport" to "gtransport"
t5802: add test for connect helper
Merge branch 'sp/clip-read-write-to-8mb' into maint
Send a large request to read(2)/write(2) as a smaller but still
reasonably large chunks, which would improve the latency when the
operation needs to be killed and incidentally works around broken
64-bit systems that cannot take a 2GB write or read in one go.
* sp/clip-read-write-to-8mb:
Revert "compat/clipped-write.c: large write(2) fails on Mac OS X/XNU"
xread, xwrite: limit size of IO to 8MB
t3200: fix failure on case-insensitive filesystems
62d94a3a (t3200: Add test demonstrating minor regression in 41c21f2;
2013-09-08) introduced a test which creates a directory named 'a',
however, on case-insensitive filesystems, this action fails with a
"fatal: cannot mkdir a: File exists" error due to a file named 'A' left
over from earlier tests. Resolve this problem.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The HTTP 1.1 standard requires an Allow header for 405 Method Not Allowed:
The response MUST include an Allow header containing a list of valid methods
for the requested resource.
So provide such a header when we return a 405 to the user agent.
Signed-off-by: Brian M. Carlson <sandals@crustytoothpaste.net> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Acked-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
When determining the file mode from either ls-tree or diff-tree
output, we used to grab these octal mode string (typically 100644 or
100755) and then did
$git_perms .= "r" if ( $mode & 4 );
$git_perms .= "w" if ( $mode & 2 );
$git_perms .= "x" if ( $mode & 1 );
which was already wrong, as (100644 & 4) is very different from
oct("100644") & 4. An earlier refactoring 2c3af7e7 (cvsserver:
factor out git-log parsing logic, 2012-10-13) further changed it to
pick the third octal digit (10*0*644 or 10*0*755) from the left and
then do the above conversion, which does not make sense, either.
Let's use the third digit from the last of the octal mode string to
make sure we get the executable and read bits right.
Signed-off-by: Junio C Hamano <gitster@pobox.com> Tested-by: Michael Cronenworth <mike@cchtml.com>
send-email: don't call methods on undefined values
If SSL verification is enabled in git send-email, we could attempt to call a
method on an undefined value if the verification failed, since $smtp would end
up being undef. Look up the error string in a way that will produce a helpful
error message and not cause further errors.
Signed-off-by: Brian M. Carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
There is no reason not to turn on keepalives by default.
They take very little bandwidth, and significantly less than
the progress reporting they are replacing. And in the case
that progress reporting is on, we should never need to send
a keepalive anyway, as we will constantly be showing
progress and resetting the keepalive timer.
We do not necessarily know what the client's idea of a
reasonable timeout is, so let's keep this on the low side of
5 seconds. That is high enough that we will always prefer
our normal 1-second progress reports to sending a keepalive
packet, but low enough that no sane client should consider
the connection hung.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
upload-pack: send keepalive packets during pack computation
When upload-pack has started pack-objects, there may be a quiet
period while pack-objects prepares the pack (i.e., counting objects
and delta compression). Normally we would see (and send to the
client) progress information, but if "--quiet" is in effect,
pack-objects will produce nothing at all until the pack data is
ready. On a large repository, this can take tens of seconds (or even
minutes if the system is loaded or the repository is badly packed).
Clients or intermediate proxies can sometimes give up in this
situation, assuming that the server or connection has hung.
This patch introduces a "keepalive" option; if upload-pack sees no
data from pack-objects for a certain number of seconds, it will send
an empty sideband data packet to let the other side know that we are
still working on it.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
branch.c: Relax unnecessary requirement on upstream's remote ref name
When creating an upstream relationship, we use the configured remotes and
their refspecs to determine the upstream configuration settings
branch.<name>.remote and branch.<name>.merge. However, if the matching
refspec does not have refs/heads/<something> on the remote side, we end
up rejecting the match, and failing the upstream configuration.
It could be argued that when we set up an branch's upstream, we want that
upstream to also be a proper branch in the remote repo. Although this is
typically the common case, there are cases (as demonstrated by the previous
patch in this series) where this requirement prevents a useful upstream
relationship from being formed. Furthermore:
- We have fundamentally no say in how the remote repo have organized its
branches. The remote repo may put branches (or branch-like constructs
that are insteresting for downstreams to track) outside refs/heads/*.
- The user may intentionally want to track a non-branch from a remote
repo, by using a branch and configured upstream in the local repo.
Relaxing the checking to only require a matching remote/refspec allows the
testcase introduced in the previous patch to succeed, and has no negative
effect on the rest of the test suite.
This patch fixes a behavior (arguably a regression) first introduced in 41c21f2 (branch.c: Validate tracking branches with refspecs instead of
refs/remotes/*) on 2013-04-21 (released in >= v1.8.3.2).
Signed-off-by: Johan Herland <johan@herland.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
t3200: Add test demonstrating minor regression in 41c21f2
In 41c21f2 (branch.c: Validate tracking branches with refspecs instead of
refs/remotes/*), we changed the rules for what is considered a valid tracking
branch (a.k.a. upstream branch). We now use the configured remotes and their
refspecs to determine whether a proposed tracking branch is in fact within
the domain of a remote, and we then use that information to deduce the
upstream configuration (branch.<name>.remote and branch.<name>.merge).
However, with that change, we also check that - in addition to a matching
refspec - the result of mapping the tracking branch through that refspec
(i.e. the corresponding ref name in the remote repo) happens to start with
"refs/heads/". In other words, we require that a tracking branch refers to
a _branch_ in the remote repo.
Now, consider that you are e.g. setting up an automated building/testing
infrastructure for a group of similar "source" repositories. The build/test
infrastructure consists of a central scheduler, and a number of build/test
"slave" machines that perform the actual build/test work. The scheduler
monitors the group of similar repos for changes (e.g. with a periodic
"git fetch"), and triggers builds/tests to be run on one or more slaves.
Graphically the changes flow between the repos like this:
The scheduler maintains a single Git repo with each of the source repos set
up as distinct remotes. The slaves also need access to all the changes from
all of the source repos, so they pull from the scheduler repo, but using the
following custom refspec:
Git now looks at the configured remotes (in this case there is only "origin",
pointing to the scheduler's repo) and sees refs/remotes/source-1/some_branch
matching origin's refspec. Mapping through that refspec we find that the
corresponding remote ref name is "refs/remotes/source-1/some_branch".
However, since this remote ref name does not start with "refs/heads/", we
discard it as a suitable upstream, and the whole command fails.
This patch adds a testcase demonstrating this failure by creating two
source repos ("a" and "b") that are forwarded through a scheduler ("c")
to a slave repo ("d"), that then tries create a local branch with an
upstream. See the next patch in this series for the exciting conclusion
to this story...
Reported-by: Per Cederqvist <cederp@opera.com> Signed-off-by: Johan Herland <johan@herland.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
t3200: Minor fix when preparing for tracking failure
We're testing that trying to --track a ref that is not covered by any remote
refspec should fail. For that, we want to have refs/remotes/local/master
present, but we also want the remote.local.fetch refspec to NOT match
refs/remotes/local/master (so that the tracking setup will fail, as intended).
However, when doing "git fetch local" to ensure the existence of
refs/remotes/local/master, we must not already have changed remote.local.fetch
so as to cause refs/remotes/local/master not to be fetched. Therefore, set
remote.local.fetch to refs/heads/*:refs/remotes/local/* BEFORE we fetch, and
then reset it to refs/heads/s:refs/remotes/local/s AFTER we have fetched
(but before we test --track).
Signed-off-by: Johan Herland <johan@herland.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
rebase: fix run_specific_rebase's use of "return" on FreeBSD
Since a1549e10, git-rebase--am.sh uses the shell's "return" statement, to
mean "return from the current file inclusion", which is POSIXly correct,
but badly interpreted on FreeBSD, which returns from the current
function, hence skips the finish_rebase statement that follows the file
inclusion.
Make the use of "return" portable by using the file inclusion as the last
statement of a function.
Reported-by: Christoph Mallon <christoph.mallon@gmx.de> Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com>
add--interactive: fix external command invocation on Windows
Back in 21e9757e (Hack git-add--interactive to make it work with
ActiveState Perl, 2007-08-01), the invocation of external commands was
changed to use qx{} on Windows. The rationale was that the command
interpreter on Windows is not a POSIX shell, but rather Windows's CMD.
That patch was wrong to include 'msys' in the check whether to use qx{}
or not: 'msys' identifies MSYS perl as shipped with Git for Windows,
which does not need the special treatment; qx{} should be used only with
ActiveState perl, which is identified by 'MSWin32'.
Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make setup_git_env() resolve .git file when $GIT_DIR is not specified
This makes reinitializing on a .git file repository work.
This is probably the only case that setup_git_env() (via
set_git_dir()) is called on a .git file. Other cases in
setup_git_dir_gently() and enter_repo() both cover .git file case
explicitly because they need to verify the target repo is valid.
Reported-by: Ximin Luo <infinity0@gmx.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
has_sha1_file: re-check pack directory before giving up
When we read a sha1 file, we first look for a packed
version, then a loose version, and then re-check the pack
directory again before concluding that we cannot find it.
This lets us handle a process that is writing to the
repository simultaneously (e.g., receive-pack writing a new
pack followed by a ref update, or git-repack packing
existing loose objects into a new pack).
However, we do not do the same trick with has_sha1_file; we
only check the packed objects once, followed by loose
objects. This means that we might incorrectly report that we
do not have an object, even though we could find it if we
simply re-checked the pack directory.
By itself, this is usually not a big deal. The other process
is running simultaneously, so we may run has_sha1_file
before it writes, anyway. It is a race whether we see the
object or not. However, we may also see other things
the writing process has done (like updating refs); and in
that case, we must be able to also see the new objects.
For example, imagine we are doing a for_each_ref iteration,
and somebody simultaneously pushes. Receive-pack may write
the pack and update a ref after we have examined the
objects/pack directory, but before the iteration gets to the
updated ref. When we do finally see the updated ref,
for_each_ref will call has_sha1_file to check whether the
ref is broken. If has_sha1_file returns the wrong answer, we
erroneously will think that the ref is broken.
For a normal iteration without DO_FOR_EACH_INCLUDE_BROKEN,
this means that the caller does not see the ref at all
(neither the old nor the new value). So not only will we
fail to see the new value of the ref (which is acceptable,
since we are running simultaneously with the writer, and we
might well read the ref before the writer commits its
write), but we will not see the old value either. For
programs that act on reachability like pack-objects or
prune, this can cause data loss, as we may see the objects
referenced by the original ref value as dangling (and either
omit them from the pack, or delete them via prune).
There's no test included here, because the success case is
two processes running simultaneously forever. But you can
replicate the issue with:
# base.sh
# run this in one terminal; it creates and pushes
# repeatedly to a repository
git init parent &&
(cd parent &&
# create a base commit that will trigger us looking at
# the objects/pack directory before we hit the updated ref
echo content >file &&
git add file &&
git commit -m base &&
# set the unpack limit abnormally low, which
# lets us simulate full-size pushes using tiny ones
git config receive.unpackLimit 1
) &&
git clone parent child &&
cd child &&
n=0 &&
while true; do
echo $n >file && git add file && git commit -m $n &&
git push origin HEAD:refs/remotes/child/master &&
n=$(($n + 1))
done
# fsck.sh
# now run this simultaneously in another terminal; it
# repeatedly fscks, looking for us to consider the
# newly-pushed ref broken. We cannot use for-each-ref
# here, as it uses DO_FOR_EACH_INCLUDE_BROKEN, which
# skips the has_sha1_file check (and if it wants
# more information on the object, it will actually read
# the object, which does the proper two-step lookup)
cd parent &&
while true; do
broken=`git fsck 2>&1 | grep remotes/child`
if test -n "$broken"; then
echo $broken
exit 1
fi
done
Without this patch, the fsck loop fails within a few seconds
(and almost instantly if the test repository actually has a
large number of refs). With it, the two can run
indefinitely.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Sparse issues an "'prepare_transport' was not declared. Should it
be static?" warning. In order to suppress the warning, since this
symbol only requires file scope, we simply add the static modifier
to it's declaration.
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
mailmap: handle mailmap blobs without trailing newlines
The read_mailmap_buf function reads each line of the mailmap
using strchrnul, like:
const char *end = strchrnul(buf, '\n');
unsigned long linelen = end - buf + 1;
But that's off-by-one when we actually hit the NUL byte; our
line does not have a terminator, and so is only "end - buf"
bytes long. As a result, when we subtract the linelen from
the total len, we end up with (unsigned long)-1 bytes left
in the buffer, and we start reading random junk from memory.
We could fix it with:
unsigned long linelen = end - buf + !!*end;
but let's take a step back for a moment. It's questionable
in the first place for a function that takes a buffer and
length to be using strchrnul. But it works because we only
have one caller (and are only likely to ever have this one),
which is handing us data from read_sha1_file. Which means
that it's always NUL-terminated.
Instead of tightening the assumptions to make the
buffer/length pair work for a caller that doesn't actually
exist, let's let loosen the assumptions to what the real
caller has: a modifiable, NUL-terminated string.
This makes the code simpler and shorter (because we don't
have to correlate strchrnul with the length calculation),
correct (because the code with the off-by-one just goes
away), and more efficient (we can drop the extra allocation
we needed to create NUL-terminated strings for each line,
and just terminate in place).
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add testcase for needless objects during a shallow fetch
This is a testcase that checks for a problem where, during a specific
shallow fetch where the client does not have any commits that are a
successor of the new shallow root (i.e., the fetch creates a new
detached piece of history), the server would simply send over _all_
objects, instead of taking into account the objects already present in
the client.
The actual problem was fixed by a recent patch series by Nguyễn Thái
Ngọc Duy already.
Signed-off-by: Matthijs Kooijman <matthijs@stdin.nl> Signed-off-by: Junio C Hamano <gitster@pobox.com>
list-objects: mark more commits as edges in mark_edges_uninteresting
The purpose of edge commits is to let pack-objects know what objects
it can use as base, but does not need to include in the thin pack
because the other side is supposed to already have them. So far we
mark uninteresting parents of interesting commits as edges. But even
an unrelated uninteresting commit (that the other side has) may
become a good base for pack-objects and help produce more efficient
packs.
This is especially true for shallow clone, when the client issues a
fetch with a depth smaller or equal to the number of commits the
server is ahead of the client. For example, in this commit history
the client has up to "A" and the server has up to "B":
-------A---B
have--^ ^
/
want--+
If depth 1 is requested, the commit list to send to the client
includes only B. The way m_e_u is working, it checks if parent
commits of B are uninteresting, if so mark them as edges. Due to
shallow effect, commit B is grafted to have no parents and the
revision walker never sees A as the parent of B. In fact it marks no
edges at all in this simple case and sends everything B has to the
client even if it could have excluded what A and also the client
already have.
In a slightly different case where A is not a direct parent of B
(iow there are commits in between A and B), marking A as an edge can
still save some because B may still have stuff from the far ancestor
A.
There is another case from the earlier patch, when we deepen a ref
from C->E to A->E:
In this case we need to send A and B to the client, and C (i.e. the
current shallow point that the client informs the server) is a very
good base because it's closet to A and B. Normal m_e_u won't recognize
C as an edge because it only looks back to parents (i.e. A<-B) not the
opposite way B->C even if C is already marked as uninteresting commit
by the previous patch.
This patch includes all uninteresting commits from command line as
edges and lets pack-objects decide what's best to do. The upside is we
have better chance of producing better packs in certain cases. The
downside is we may need to process some extra objects on the server
side.
For the shallow case on git.git, when the client is 5 commits behind
and does "fetch --depth=3", the result pack is 99.26 KiB instead of
4.92 MiB.
Reported-and-analyzed-by: Matthijs Kooijman <matthijs@stdin.nl> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove the first argument and let mark_edges_uninteresting figure that
out by itself. It helps answer the question "are this commit list and
revs related in any way?" when looking at mark_edges_uninteresting
implementation.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
upload-pack: delegate rev walking in shallow fetch to pack-objects
upload-pack has a special revision walking code for shallow
recipients. It works almost like the similar code in pack-objects
except:
1. in upload-pack, graft points could be added for deepening;
2. also when the repository is deepened, the shallow point will be
moved further away from the tip, but the old shallow point will be
marked as edge to produce more efficient packs. See 6523078 (make
shallow repository deepening more network efficient - 2009-09-03).
Pass the file to pack-objects via --shallow-file. This will override
$GIT_DIR/shallow and give pack-objects the exact repository shape
that upload-pack has.
mark edge commits by revision command arguments. Even if old shallow
points are passed as "--not" revisions as in this patch, they will not
be picked up by mark_edges_uninteresting() because this function looks
up to parents for edges, while in this case the edge is the children,
in the opposite direction. This will be fixed in an later patch when
all given uninteresting commits are marked as edges.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
This function is like setup_alternate_shallow() except that it does
not lock $GIT_DIR/shallow. It is supposed to be used when a program
generates temporary shallow for use by another program, then throw
the shallow file away.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
shallow: only add shallow graft points to new shallow file
for_each_commit_graft() goes through all graft points, and shallow
boundaries are just one special kind of grafting.
If $GIT_DIR/shallow and $GIT_DIR/info/grafts are both present,
write_shallow_commits() may catch both sets, accidentally turning
some graft points to shallow boundaries. Don't do that.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
config: do not use C function names as struct members
According to C99, section 7.1.4:
Any function declared in a header may be additionally
implemented as a function-like macro defined in the
header.
Therefore calling our struct member function pointer "fgetc"
may run afoul of unwanted macro expansion when we call:
char c = cf->fgetc(cf);
This turned out to be a problem on uclibc, which defines
fgetc as a macro and causes compilation failure.
The standard suggests fixing this in a few ways:
1. Using extra parentheses to inhibit the function-like
macro expansion. E.g., "(cf->fgetc)(cf)". This is
undesirable as it's ugly, and each call site needs to
remember to use it (and on systems without the macro,
forgetting will compile just fine).
2. Using #undef (because a conforming implementation must
also be providing fgetc as a function). This is
undesirable because presumably the implementation was
using the macro for a performance benefit, and we are
dropping that optimization.
Instead, we can simply use non-colliding names.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'todo' sheet for interactive rebase shows abbreviated SHA-1's and
then performs its operations upon those shortened values. This can lead
to an abort if the SHA-1 of a reworded or edited commit is no longer
unique within the abbreviated SHA-1 space and a subsequent SHA-1 in the
todo list has the same abbreviated value.
If, after editing, the new SHA-1 of "first" also has prefix badbeef,
then the subsequent 'pick badbeef second' will fail since badbeef is no
longer a unique SHA-1 abbreviation:
error: short SHA1 badbeef is ambiguous.
fatal: Needed a single revision
Invalid commit name: badbeef
Fix this problem by expanding the SHA-1's in the todo list before
performing the operations.
[es: also collapse & expand SHA-1's for --edit-todo; respect
core.commentchar in transform_todo_ids(); compose commit message]
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
t3404: rebase -i: demonstrate short SHA-1 collision
The 'todo' sheet for interactive rebase shows abbreviated SHA-1's and
then performs its operations upon those shortened values. This can lead
to an abort if the SHA-1 of a reworded or edited commit is no longer
unique within the abbreviated SHA-1 space and a subsequent SHA-1 in the
todo list has the same abbreviated value.
If, after editing, the new SHA-1 of "first" also has prefix badbeef,
then the subsequent 'pick badbeef second' will fail since badbeef is no
longer a unique SHA-1 abbreviation:
error: short SHA1 badbeef is ambiguous.
fatal: Needed a single revision
Invalid commit name: badbeef
Demonstrate this problem with a couple of specially crafted commits
which initially have distinct abbreviated SHA-1's, but for which the
abbreviated SHA-1's collide after a simple rewording of the first
commit's message.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
As its very first action, t3404 installs (via set_fake_editor) a
specialized $EDITOR which simplifies automated 'rebase -i' testing. Many
tests rely upon this setting, thus tests which need a different editor
must take extra care upon completion to restore $EDITOR in order to
avoid breaking following tests. This places extra burden upon such tests
and requires that they undesirably have extra knowledge about
surrounding tests. Ease this burden by having each test install the
$EDITOR it requires, rather than relying upon a global setting.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
fetch-pack: do not remove .git/shallow file when --depth is not specified
fetch_pack() can remove .git/shallow file when a shallow repository
becomes a full one again. This behavior is triggered incorrectly when
tags are also fetched because fetch_pack() will be called twice. At
the first fetch_pack() call:
- shallow_lock is set up
- alternate_shallow_file points to shallow_lock.filename, which is
"shallow.lock"
- commit_lock_file is called, which sets shallow_lock.filename to "".
alternate_shallow_file also becomes "" because it points to the
same memory.
At the second call, setup_alternate_shallow() is not called and
alternate_shallow_file remains "". It's mistaken as unshallow case and
.git/shallow is removed. The end result is a broken repository.
Fix this by always initializing alternate_shallow_file when
fetch_pack() is called. As an extra measure, check if args->depth > 0
before commit/rollback shallow file.
Reported-by: Kacper Kornet <kornet@camk.edu.pl> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git commit --author=$name" sets the author to one whose name
matches the given string from existing commits, when $name is not in
the "Name <e-mail>" format. However, it does not honor the mailmap
to use the canonical name for the author found this way.
Fix it by telling the logic to find a matching existing author to
honor the mailmap, and use the name and email after applying the
mailmap.
Signed-off-by: Antoine Pelisse <apelisse@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
dir.c::test_one_path(): work around directory_exists_in_index_icase() breakage
directory_exists_in_index() takes pathname and its length, but its
helper function directory_exists_in_index_icase() reads one byte
beyond the end of the pathname and expects there to be a '/'.
This needs to be fixed, as that one-byte-beyond-the-end location may
not even be readable, possibly by not registering directories to
name hashes with trailing slashes. In the meantime, update the new
caller added recently to treat_one_path() to make sure that the path
buffer it gives the function is one byte longer than the path it is
asking the function about by appending a slash to it.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
contrib/git-prompt.sh: handle missing 'printf -v' more gracefully
Old Bash (3.0) which is distributed with RHEL 4.X and other ancient
platforms that are still in wide use, do not have a printf that
supports -v. Neither does Zsh (which is already handled in the code).
As suggested by Junio, let's test whether printf supports the -v
option and store the result. Then later, we can use it to
determine whether 'printf -v' can be used, or whether printf
must be called in a subshell.
Signed-off-by: Brandon Casey <drafnel@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
t9902-completion.sh: old Bash still does not support array+=('') notation
Old Bash (3.0) which is distributed with RHEL 4.X and other ancient
platforms that are still in wide use, does not understand the
array+=() notation. Let's use an explicit assignment to the new array
element which works everywhere, like:
array[${#array[@]}+1]=''
The right-hand side '' is not strictly necessary, but in this case
I think it is more clear.
Signed-off-by: Brandon Casey <drafnel@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>