The guess_remote_head function tries to figure out where a
remote's HEAD is pointing by comparing the sha1 of the
remote's HEAD with the sha1 of various refs found on the
remote. However, we were too liberal in matching refs, and
would match tags or remote tracking branches, even though
these things could not possibly be referenced by the HEAD
symbolic ref (since git will detach when checking them out).
As a result, a clone of a remote repository with a detached
HEAD might write "refs/tags/*" into our local HEAD, which is
bogus. The resulting HEAD should be detached.
The other related code path is remote.c's get_head_names()
(which is used for, among other things, "set-head -a"). This was
not affected, however, as that function feeds only refs from
refs/heads to guess_remote_head.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
t: add tests for cloning remotes with detached HEAD
We didn't test this setup at all, and doing so reveals a few
bugs:
1. Cloning a repository with an orphaned detached HEAD
(i.e., one that points to history that is not
referenced by any ref) will fail.
2. Cloning a repository with a detached HEAD that points
to a tag will cause us to write a bogus "refs/tags/..."
ref into the HEAD symbolic ref. We should probably
detach instead.
3. Cloning a repository with a detached HEAD that points
to a branch will cause us to checkout that branch. This
is a known limitation of the git protocol (we have to
guess at HEAD's destination, since the symref contents
aren't shown to us). This test serves to document the
desired behavior, which can only be achieved once the
git protocol learns to share symref information.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Merge branch 'jk/git-connection-deadlock-fix' into maint-1.7.4
* jk/git-connection-deadlock-fix:
test core.gitproxy configuration
send-pack: avoid deadlock on git:// push with failed pack-objects
connect: let callers know if connection is a socket
connect: treat generic proxy processes like ssh processes
Merge branch 'js/maint-send-pack-stateless-rpc-deadlock-fix' into maint-1.7.4
* js/maint-send-pack-stateless-rpc-deadlock-fix:
sideband_demux(): fix decl-after-stmt
send-pack: unbreak push over stateless rpc
send-pack: avoid deadlock when pack-object dies early
send-pack: avoid deadlock on git:// push with failed pack-objects
Commit 09c9957c fixes a deadlock in which pack-objects
fails, the remote end is still waiting for pack data, and we
are still waiting for the remote end to say something (see
that commit for a much more in-depth explanation).
We solved the problem there by making sure the output pipe
is closed on error; thus the remote sees EOF, and proceeds
to complain and close its end of the connection.
However, in the special case of push over git://, we don't
have a pipe, but rather a full-duplex socket, with another
dup()-ed descriptor in place of the second half of the pipe.
In this case, closing the second descriptor signals nothing
to the remote end, and we still deadlock.
This patch calls shutdown() explicitly to signal EOF to the
other side.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
connect: let callers know if connection is a socket
They might care because they want to do a half-duplex close.
With pipes, that means simply closing the output descriptor;
with a socket, you must actually call shutdown.
Instead of exposing the magic no_fork child_process struct,
let's encapsulate the test in a function.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
connect: treat generic proxy processes like ssh processes
The git_connect function returns two ends of a pipe for
talking with a remote, plus a struct child_process
representing the other end of the pipe. If we have a direct
socket connection, then this points to a special "no_fork"
child process.
The code path for doing git-over-pipes or git-over-ssh sets
up this child process to point to the child git command or
the ssh process. When we call finish_connect eventually, we
check wait() on the command and report its return value.
The code path for git://, on the other hand, always sets it
to no_fork. In the case of a direct TCP connection, this
makes sense; we have no child process. But in the case of a
proxy command (configured by core.gitproxy), we do have a
child process, but we throw away its pid, and therefore
ignore its return code.
Instead, let's keep that information in the proxy case, and
respect its return code, which can help catch some errors
(though depending on your proxy command, it will be errors
reported by the proxy command itself, and not propagated
from git commands. Still, it is probably better to propagate
such errors than to ignore them).
It also means that the child_process field can reliably be
used to determine whether the returned descriptors are
actually a full-duplex socket, which means we should be
using shutdown() instead of a simple close.
Signed-off-by: Jeff King <peff@peff.net> Helped-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 09c9957 (send-pack: avoid deadlock when pack-object
dies early, 2011-04-25) attempted to fix a hang in the
stateless rpc case by closing a file descriptor early, but
we still need that descriptor.
Basically the deadlock can happen when pack-objects fails,
and the descriptor to upstream is left open. We never send
the pack, so the upstream is left waiting for us to say
something, and we are left waiting for upstream to close the
connection.
In the non-rpc case, our descriptor points straight to the
upstream. We hand it off to run-command, which takes
ownership and closes the descriptor after pack-objects
finishes (whether it succeeds or not).
Commit 09c9957 tried to emulate that in the rpc case. That
isn't right, though. We actually have a descriptor going
back to the remote-helper, and we need to keep using it
after pack-objects is finished. Closing it early completely
breaks pushing via smart-http.
We still need to do something on error to signal the
remote-helper that we won't be sending any pack data
(otherwise we get the deadlock). In an ideal world, we
would send a special packet back that says "Sorry, there was
an error". But the remote-helper doesn't understand any such
packet, so the best we can do is close the descriptor and
let it report that we hung up unexpectedly.
Signed-off-by: Jeff King <peff@peff.net> Acked-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Merge branch 'js/maint-1.6.6-send-pack-stateless-rpc-deadlock-fix' into js/maint-send-pack-stateless-rpc-deadlock-fix
* js/maint-1.6.6-send-pack-stateless-rpc-deadlock-fix:
send-pack: avoid deadlock when pack-object dies early
Evil merge to adjust the way the use of pthreads in sideband-demultiplexor
was decided (earlier it was "if we are not on Windows", now it is "if we
are not using pthreads").
send-pack: avoid deadlock when pack-object dies early
Send-pack deadlocks in two ways when pack-object dies early (for example,
because there is some repo corruption).
The first deadlock happens with the smart push protocol (--stateless-rpc).
After the initial rev-exchange, the remote is waiting for the pack data
to arrive, and the sideband demuxer at the local side continues trying to
stream data from the remote repository until it gets EOF. Meanwhile,
send-pack (in function pack_objects()) has noticed that pack-objects did
not produce output and died. Back in send_pack(), it now tries to clean
up the sideband demuxer using finish_async(). The demuxer, however, waits
for the remote end to close down, the remote waits for pack data, and
the reason that it still waits is that send-pack forgot to close the
outgoing channel. Add the missing close() in pack_objects().
The second deadlock happens in a similar constellation when the sideband
demuxer runs in a forked process (rather than in a thread). Again, the
remote end waits for pack data to arrive, the sideband demuxer waits for
the remote to shut down, and send-pack (in the regular clean-up) waits for
the demuxer to terminate. This time, the send-pack parent process closes
the writable end of the outgoing channel (in start_command() that spawned
pack-objects) so that after the death of the pack-objects process all
writable ends should have been closed and the remote repo should see EOF.
This does not happen, however, because when the sideband demuxer was forked
earlier, it also inherited a writable end; it remains open and keeps the
remote repo from seeing EOF. To break this deadlock, close the writable end
in the demuxer.
Analyzed-by: Jeff King <peff@peff.net> Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
archive: document limitation of tar.umask config setting
The local value of the config variable tar.umask is not passed to the
other side with --remote. We may want to change that, but for now just
document this fact.
Reported-by: Jacek Masiulaniec <jacek.masiulaniec@gmail.com> Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
On systems where the local time and file modification time may be out of
sync (e.g. test directory on NFS) t3306 and t5305 can fail because prune
compares times such as "now" (client time) with file modification times
(server times for remote file systems). I.e., these are spurious test
failures.
Avoid this by setting the relevant modification times to the local time.
Noticed on a system with as little as 2s time skew.
Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove a spurious empty line which prevented asciidoc from recognizing a
list continuation mark ('+'), so that it does not get output literally any
more.
Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
upload-pack: start pack-objects before async rev-list
In a pthread-enabled version of upload-pack, there's a race condition
that can cause a deadlock on the fflush(NULL) we call from run-command.
What happens is this:
1. Upload-pack is informed we are doing a shallow clone.
2. We call start_async() to spawn a thread that will generate rev-list
results to feed to pack-objects. It gets a file descriptor to a
pipe which will eventually hook to pack-objects.
3. The rev-list thread uses fdopen to create a new output stream
around the fd we gave it, called pack_pipe.
4. The thread writes results to pack_pipe. Outside of our control,
libc is doing locking on the stream. We keep writing until the OS
pipe buffer is full, and then we block in write(), still holding
the lock.
5. The main thread now uses start_command to spawn pack-objects.
Before forking, it calls fflush(NULL) to flush every stdio output
buffer. It blocks trying to get the lock on pack_pipe.
And we have a deadlock. The thread will block until somebody starts
reading from the pipe. But nobody will read from the pipe until we
finish flushing to the pipe.
To fix this, we swap the start order: we start the
pack-objects reader first, and then the rev-list writer
after. Thus the problematic fflush(NULL) happens before we
even open the new file descriptor (and even if it didn't,
flushing should no longer block, as the reader at the end of
the pipe is now active).
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Merge branch 'mg/rev-list-n-reverse-doc' into maint
* mg/rev-list-n-reverse-doc:
git-log.txt,rev-list-options.txt: put option blocks in proper order
git-log.txt,rev-list-options.txt: -n/--max-count is commit limiting
gitweb: Fix parsing of negative fractional timezones in JavaScript
Extract converting numerical timezone in the form of '(+|-)HHMM' to
timezoneOffset function, and fix parsing of negative fractional
timezones.
This is used to format timestamps in 'blame_incremental' view; this
complements commit 2b1e172 (gitweb: Fix handling of fractional
timezones in parse_date, 2011-03-25).
pull: do not clobber untracked files on initial pull
For a pull into an unborn branch, we do not use "git merge"
at all. Instead, we call read-tree directly. However, we
used the --reset parameter instead of "-m", which turns off
the safety features.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Merge branch 'jk/format-patch-multiline-header' into maint
* jk/format-patch-multiline-header:
format-patch: rfc2047-encode newlines in headers
format-patch: wrap long header lines
strbuf: add fixed-length version of add_wrapped_text
Starting with commit c793430 (Limit file descriptors used by packs,
2011-02-28), git uses getrlimit to tell how many file descriptors it
can use. Unfortunately it does not include the header declaring that
function, resulting in compilation errors:
sha1_file.c: In function 'open_packed_git_1':
sha1_file.c:718: error: storage size of 'lim' isn't known
sha1_file.c:721: warning: implicit declaration of function 'getrlimit'
sha1_file.c:721: error: 'RLIMIT_NOFILE' undeclared (first use in this function)
sha1_file.c:718: warning: unused variable 'lim'
The standard header to include for this is <sys/resource.h> (which on
some systems itself requires declarations from <sys/types.h> or
<sys/time.h>). Probably the problem was missed until now because in
current glibc sys/resource.h happens to be included by sys/wait.h.
MinGW does not provide sys/resource.h (and compat/mingw takes care of
providing getrlimit some other way), so add the missing #include to
the "#ifndef __MINGW32__" block in git-compat-util.h.
Reported-by: Stefan Sperling <stsp@stsp.name> Tested-by: Stefan Sperling <stsp@stsp.name> [on OpenBSD] Tested-by: Arnaud Lacombe <lacombar@gmail.com> [on FreeBSD 8] Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Doc: mention --delta-base-offset is the default for Porcelain commands
The underlying pack-objects plumbing command still needs an explicit
option from the command line, but these days Porcelain passes the
option, so there is no need for end users to worry about it anymore.
Earlier f98fd43 (git-log.txt,rev-list-options.txt: put option blocks in
proper order, 2011-03-08) moved the text around in the documentation for
options in the rev-list family of commands such as "log". Consequently,
the description of the --cherry-pick option appears way above the
description of the --left-right option now.
But the description of the --cherry-pick option still refers to the
example for the --left-right option, like this:
... with --left-right, like the example ABOVE in the description of
that option.
Rephrase it to clarify that we are making a forward reference.
docs: fix filter-branch subdir example for exotic repo names
The GIT_INDEX_FILE variable we get from git has the full
path to the repo, which may contain spaces. When we use it
in our shell snippet, it needs to be quoted.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
submodule: process conflicting submodules only once
During a merge module_list returns conflicting submodules several times
(stage 1,2,3) which caused the submodules to be used multiple times in
git submodule init, sync, update and status command.
There are 5 callers of module_list; they all read (mode, sha1, stage,
path) tuple, and most of them care only about path. As a first level
approximation, it should be Ok (in the sense that it does not make things
worse than it currently is) to filter the duplicate paths from module_list
output, but some callers should change their behaviour when the merge in
the superproject still has conflicts.
Notice the higher-stage entries, and emit only one record from
module_list, but while doing so, mark the entry with "U" (not [0-3]) in
the $stage field and null out the SHA-1 part, as the object name for the
lowest stage does not give any useful information to the caller, and this
way any caller that uses the object name would hopefully barf. Then
update the codepaths for each subcommands this way:
- "update" should not touch the submodule repository, because we do not
know what commit should be checked out yet.
- "status" reports the conflicting submodules as 'U000...000' and does
not recurse into them (we might later want to make it recurse).
- The command called by "foreach" may want to do whatever it wants to do
by noticing the merged status in the superproject itself, so feed the
path to it from module_list as before, but only once per submodule.
- "init" and "sync" are unlikely things to do while the superproject is
still not merged, but as long as a submodule is there in $path, there
is no point skipping it. It might however want to take the merged
status of .gitmodules into account, but that is outside of the scope of
this topic.
Acked-by: Jens Lehmann <Jens.Lehmann@web.de>
Thanks-to: Junio C Hamano <gitster@pobox.com> Signed-off-by: Nicolas Morey-Chaisemartin <nicolas@morey-chaisemartin.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
HOME must be set before calling git-init when creating test repositories
Otherwise the created test repositories will be affected by users ~/.gitconfig.
For example, setting core.logAllrefupdates in users config will make all
calls to "git config --unset core.logAllrefupdates" fail which will break
the first test which uses the statement and expects it to succeed.
Signed-off-by: Alex Riesen <raa.lkml@gmail.com> Acked-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
gitweb: Fix handling of fractional timezones in parse_date
Fractional timezones, like -0330 (NST used in Canada) or +0430
(Afghanistan, Iran DST), were not handled properly in parse_date; this
means values such as 'minute_local' and 'iso-tz' were not generated
correctly.
This was caused by two mistakes:
* sign of timezone was applied only to hour part of offset, and not
as it should be also to minutes part (this affected only negative
fractional timezones).
* 'int $h + $m/60' is 'int($h + $m/60)' and not 'int($h) + $m/60',
so fractional part was discarded altogether ($h is hours, $m is
minutes, which is always less than 60).
Note that positive fractional timezones +0430, +0530 and +1030 can be
found as authortime in git.git repository itself.
For example http://repo.or.cz/w/git.git/commit/88d50e7 had authortime
of "Fri, 8 Jan 2010 18:48:07 +0000 (23:48 +0530)", which is not marked
with 'atnight', when "git show 88d50e7" gives correct author date of
"Sat Jan 9 00:18:07 2010 +0530".
Signed-off-by: Jakub Narebski <jnareb@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
doc: technical details about the index file format
* Clarify "string of unsigned bytes";
* Blob has two variants (regular file vs symlink), not (blob vs symlink);
* Clarify permission mode bits;
* Clarify ce_namelen() "too long to fit in the length field" case;
* Clarify "." etc are forbidden as path components;
* Match the description with the internal wording "cache-tree";
* All types of extension begin with signature and length as explained in
the first part. Don't repeat the "length" part in the description of
each extension (can be mistaken as if there is a separate 32-bit size
field inside the extension), but state what the signature for each
extension is.
* Don't say "Extension tag", as we have said "Extension signature" in the
first part---be consistent;
* Clarify the invalidation of cache-tree entries;
* Correct description on subtree_nr field in the cache-tree;
git-am.txt: advertise 'git am --abort' instead of 'rm .git/rebase-apply'
'git am --abort' is around for quite a long time now, and users should
normally not poke around inside the .git directory, yet the
documentation of 'git am' still recommends the following:
... if you decide to start over from scratch,
run `rm -f -r .git/rebase-apply` ...
Suggest 'git am --abort' instead.
It's not quite the same as the original, because 'git am --abort' will
restore the original branch, while simply removing '.git/rebase-apply'
won't, but that's rather a thinko in the original wording, because
that won't actually "start over _from scratch_".
Signed-off-by: SZEDER Gábor <szeder@ira.uka.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
update $GIT_INDEX_FILE when there are racily clean entries
Traditional "opportunistic index update" done by read-only "diff" and
"status" was about updating cached lstat(2) information in the index for
the next round. We missed another obvious optimization opportunity: when
there are racily clean entries that will cease to be racily clean by
updating $GIT_INDEX_FILE. Detect that case and write $GIT_INDEX_FILE out
to give it a newer timestamp.
Noticed by Lasse Makholm by stracing "git status" in a fresh checkout and
counting the number of open(2) calls.
When we had to refresh the index internally before running diff or status,
we opportunistically updated the $GIT_INDEX_FILE so that later invocation
of git can use the lstat(2) we already did in this invocation.
bisect: visualize with git-log if gitk is unavailable
If gitk is not available in the PATH, bisect ends up
exiting with the shell's 127 error code, confusing the git
wrapper into thinking that bisect is not a git command.
We already fallback to git-log if there doesn't seem to be a
graphical display available. We should do the same if gitk
is not available in our PATH at all. This not only fixes the
ugly error message, but is a much more sensible default than
failing to show the user anything.
Reported by Maxin John.
Tested-by: Maxin B. John <maxin@maxinbjohn.info> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* sp/maint-fd-limit:
sha1_file.c: Don't retain open fds on small packs
mingw: add minimum getrlimit() compatibility stub
Limit file descriptors used by packs
Merge branch 'so/submodule-no-update-first-time' into maint
* so/submodule-no-update-first-time:
t7406: "git submodule update {--merge|--rebase]" with new submodules
submodule: no [--merge|--rebase] when newly cloned
The test setup in t8006-blame-textconv.sh uses "ln -sf" to
overwrite an existing symlink. Unfortunately, both /usr/bin/ln
and /usr/xpg4/bin/ln on solaris 9 don't properly handle -f and -s
used at the same time. This caused the test setup and subsequent
checks to fail.
Instead, remove the symlink and then create a new one in the
setup code.
The upstream Solaris bug (fixed in 10, but not 9) is documented
here:
stash: copy the index using --index-output instead of cp -p
'git stash create' must operate with a temporary index. For this purpose,
it used 'cp -p' to create a copy. -p is needed to preserve the timestamp
of the index file. Now Jakob Pfender reported a certain combination of
a Linux NFS client, OpenBSD NFS server, and cp implementation where this
operation failed.
Luckily, the first operation in git-stash after copying the index is to
call 'git read-tree'. Therefore, use --index-output instead of 'cp -p'
to write the copy of the index.
--index-output requires that the specified file is on the same volume as
the source index, so that the lock file can be rename()d. For this reason,
the name of the temporary index is constructed in a way different from the
other temporary files. The code path of 'stash -p' also needs a temporary
index, but we do not use the new name because it does not depend on the
same precondition as --index-output.
Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
stash: fix incorrect quoting in cleanup of temporary files
The * was inside the quotes, so that the pattern was never expanded and the
temporary files were never removed. As a consequence, 'stash -p' left a
.git-stash-*-patch file in $GIT_DIR. Other code paths did not leave files
behind because they removed the temporary file themselves, at least in
non-error paths.
Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Merge branch 'lt/rename-no-extra-copy-detection' into maint
* lt/rename-no-extra-copy-detection:
diffcore-rename: improve estimate_similarity() heuristics
diffcore-rename: properly honor the difference between -M and -C
for_each_hash: allow passing a 'void *data' pointer to callback
Merge branch 'mg/placeholders-are-lowercase' into maint
* mg/placeholders-are-lowercase:
Make <identifier> lowercase in Documentation
Make <identifier> lowercase as per CodingGuidelines
Make <identifier> lowercase as per CodingGuidelines
Make <identifier> lowercase as per CodingGuidelines
CodingGuidelines: downcase placeholders in usage messages