- we don't allow "." and ".." as components of a refname. Thus get_sha1()
will not accept "./refname" as being the same as "refname" any more.
- git-rev-parse stops doing revision translation after seeing a pathname,
to match the brhaviour of all the tools (once we see a pathname,
everything else will also be parsed as a pathname).
Basically, if you did
git log *
and "gitk" was somewhere in the "*", we don't want to replace the filename
"gitk" with the SHA1 of the branch with the same name.
Of course, if there is any change of ambiguity, you should always use "--"
to make it explicit what are filenames and what are revisions, but this
makes the normal cases sane. The refname rule also means that instead of
the "--", you can do the same thing we're used to doing with filenames
that start with a slash: use "./filename" instead, and now it's a
filename, not an option (and not a revision).
So "git log ./*.c" is now actually a perfectly valid thing to do, even if
the first C-file might have the same name as a branch.
where the "./gitk" isn't seen as a revision, and the second "gitk" is a
filename simply because we've seen filenames already, and thus stopped
doing revision parsing.
Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
The git philosophy when it comes to disk accesses is "Laugh in the face of
danger".
Notably, since we never modify an existing object, we don't really care
that deeply about flushing things to disk, since even if the machine
crashes in the middle of a git operation, you can never really have lost
any old work. At most, you'd need to figure out the proper heads (which
git-fsck-objects can do for you) and re-do the operation.
However, there's two exceptions to this: pruning and repacking. Those
operations will actually _delete_ old objects that they know about in
other ways (ie that they just repacked, or that they have found in other
places).
However, since they actually modify old state, we should thus be a bit
more careful about them. If the machine crashes and the duplicate new
objects haven't been flushed to disk, you can actually be in trouble.
This is trivially stupid about it by calling "sync" before removing the
objects. Not very smart, but we're talking about special operations than
are usually done once a week if that.
Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
Documentation changes to recursive option for git-diff-tree
Update docs and usages regarding '-r' recursive option for git-diff-tree.
Remove '-r' from common diff options, mention it only for git-diff-tree.
Remove one extraneous use of '-r' with git-diff-files in get-merge.sh.
Sync the synopsis and usage string for git-diff-tree.
Signed-off-by: Chris Shoemaker <c.shoemaker at cox.net> Signed-off-by: Junio C Hamano <junkio@cox.net>
Pavel Roskin wondered what the SHA1 output at the beginning of
git-diff-tree was about. The only consumer of that information
so far is this git-patch-id command, which was inadequately
documented.
This removes the unoptimization. The previous round does not mind
missing fan-out directories, but still makes sure they exist, lest
older versions choke on a repository created/packed by it.
This round does not play that nicely anymore -- empty fan-out
directories are not created by init-db, and will stay removed by
prune-packed. The prune command also removes empty fan-out directories.
[PATCH] Make "gitk" work better with dense revlists
To generate the diff for a commit, gitk used to do
git-diff-tree -p -C $p $id
(and same thing to generate filenames, except using just "-r" there) which
does actually generate the diff from the parent to the $id, exactly like
it meant to do.
However, that really sucks with --dense, where the "parent" information
has all been rewritten to point to the previous commit. The diff actually
works exactly right, but now it's the diff of the _whole_ sequence of
commits all the way to the previous commit that last changed the file(s)
that we are looking at.
And that's really not what we want 99.9% of the time, even if it may be
perfectly sensible. Not only will the diff not actually match the commit
message, but it will usually be _huge_, and all of it will be totally
uninteresting to us, since we were only interested in a particular set of
files.
It also doesn't match what we do when we write the patch to a file.
So this makes gitk just show the diff of _that_ commit.
We might even want to have some way to limit the diff to only the
filenames we're interested in, but it's often nice to see what else
changed at the same time, so that's secondary.
The merge diff handling is left alone, although I think that should also
be changed to only look at what that _particular_ merge did, not what it
did when compared to the faked-out parents.
Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
What happens is that the new logic decides that if it can't look up a
commit reference (ie "get_commit_reference()" returns NULL), the thing
must be a pathname.
Fair enough.
But wrong.
The thing is, it may be a perfectly fine ref that _isn't_ a commit. In
git, you have a tag that points to your PGP key, and in the kernel, I have
a tag that points to a tree (and a direct ref that points to that tree
too, for that matter).
So the rule is (as for all the other programs that mix revs and pathnames)
not that we only accept commit references, but _any_ valid object ref.
If the object then isn't a commit ref, git-rev-list will either ignore it,
or add it to the list of non-commit objects (if using "--objects").
The solution is to move the "get_sha1()" out of get_commit_reference(),
and into the callers. In fact, we already _have_ the SHA1 in the case of
the handle_all() loop, since for_each_ref() will have done it for us, so
this is the correct thing to do anyway.
This patch (on top of the original one) does exactly that.
git-rev-list: make --dense the default (and introduce "--sparse")
This actually does three things:
- make "--dense" the default for git-rev-list. Since dense is a no-op if
no filenames are given, this doesn't actually change any historical
behaviour, but it's logically the right default (if we want to prune on
filenames, do it fully. The sparse "merge-only" thing may be useful,
but it's not what you'd normally expect)
- make "git-rev-parse" show the default revision control before it shows
any pathnames.
This was a real bug, but nobody would ever have noticed, because
the default thing tends to only make sense for git-rev-list, and
git-rev-list didn't use to take pathnames.
- it changes "git-rev-list" to match the other commands that take a mix
of revisions and filenames - it no longer requires the "--" before
filenames (although you still need to do it if a filename could be
confused with a revision name, eg "gitk" in the git archive)
This all just makes for much more pleasant and obvous usage. Just doing a
gitk t/
does the obvious thing: it will show the history as it concerns the "t/"
subdirectory.
Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
git-name-rev tries to find nice symbolic names for commits. It does so by
walking the commits from the refs. When the symbolic name is ambiguous, the
following heuristic is applied: Try to avoid too many ~'s, and if two ambiguous
names have the same count of ~'s, take the one whose last number is smaller.
With "--tags", the names are derived only from tags.
With "--stdin", the stdin is parsed, and after every sha1 for which a name
could be found, the name is appended. (Try "git log | git name-rev --stdin".)
Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Junio C Hamano <junkio@cox.net>
git-pack-objects can reuse pack files stored in $GIT_DIR/pack-cache
directory, when a necessary pack is found. This is hopefully useful
when upload-pack (called from git-daemon) is expected to receive
requests for the same set of objects many times (e.g full cloning
request of any project, or updates from the set of heads previous day
to the latest for a slow moving project).
Currently git-pack-objects does *not* keep pack files it creates for
reusing. It might be useful to add --update-cache option to it,
which would allow it store pack files it created in the pack-cache
directory, and prune rarely used ones from it.
Fix what to do and how to detect when hardlinking fails
Recent FAT workaround caused compilation trouble on OpenBSD;
different platforms use different error codes when we try to
hardlink the temporary file to its final location. Existing
Coda hack also checks its own error code, but the thing is,
the case we care about is if link failed for a reason other
than that the final file has already existed (which would be
normal, or it could mean collision). So just check the error
code against EEXIST.
upload-pack would set create_full_pack=1 if nr_has==0, but would ask later
if nr_needs<MAX_NEEDS. If that proves true, it would ignore create_full_pack,
and arguments would be written into unreserved memory.
Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Junio C Hamano <junkio@cox.net>
This makes sure what the other end asks for are among what we
offered to give them. Otherwise we would end up running
git-rev-list with 20-byte nonsense, only to find it either die
(because the object was not found) or waste time (because we
ended up serving that phony 'client').
Also avoid wasting needs_sha1 pool to record duplicates, and
detect cloning requests better.
[this used to be on top of Johannes fetch-pack enhancements,
which we are rewinding it for further testing for now, so
the commit is rebased.]
Work around missing hard links on FAT formatted media
FAT -- like Coda -- does not like cross-directory hard links. To be
precise, FAT does not like links at all. But links are not needed either.
So get rid of them.
Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Junio C Hamano <junkio@cox.net>
create_symref: if symlink fails, fall back to writing a "symbolic ref"
There are filesystems out there which do not understand symlinks, even if
the OS is perfectly capable of writing them. So, do not fail right away,
but try to write a symbolic ref first. If that fails, you can die().
Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Junio C Hamano <junkio@cox.net>
I can confirm that the following patch lets the current origin
compile on OpenBSD. If you could apply this until you sort out the
rest of the namespace issue, I would be happy. Thanks.
git-fetch-pack: Implement client part of the multi_ack extension
This patch concludes the series, which makes
git-fetch-pack/git-upload-pack negotiate a potentially better set of
common revs. It should make a difference when fetching from a repository
with a few branches.
Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Junio C Hamano <junkio@cox.net>
The code used to call git-rev-list to enumerate the local revisions. A
disadvantage of that method was that git-rev-list, lacking a control apart
from the command line, would happily enumerate ancestors of acknowledged
common commits, which was just taking unnecessary bandwidth.
Therefore, do not use git-rev-list on the fetching side, but rather
construct the list on the go. Send the revisions starting from the local
heads, ignoring the revisions known to be common.
Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Junio C Hamano <junkio@cox.net>
git-upload-pack: Support sending multiple ACK messages
The current fetch/upload protocol works like this:
- client sends revs it wants to have via "want" messages
- client sends a flush message (message with len 0)
- client sends revs it has via "have" messages
- after one window (32 revs), a flush is sent
- after each subsequent window, a flush is sent, and an ACK/NAK is received.
(NAK means that server does not have any of the transmitted revs;
ACK sends also the sha1 of the rev server has)
- when the first ACK is received, client sends "done", and does not expect
any further messages
One special case, though:
- if no ACK is received (only NAK's), and client runs out of revs to send,
"done" is sent, and server sends just one more "NAK"
A smarter scheme, which actually has a chance to detect more than one
common rev, would be to send more than just one ACK. This patch implements
the server side of the following extension to the protocol:
- client sends at least one "want" message with "multi_ack" appended, like
until it has MAX_HAS-1 revs. In this manner, client knows when to
stop sending revs by checking for the substring "continue" (and
further knows that server understands multi_ack)
In this manner, the protocol stays backwards compatible, since both client
must send "want ... multi_ack" and server must answer with "ACK ...
continue" to enable the extension.
Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Junio C Hamano <junkio@cox.net>
This adds a very git specific restricted shell, that can be
added to /etc/shells and set to the pw_shell in the /etc/passwd
file, to give users ability to push into repositories over ssh
without giving them full interactive shell acount.
[jc: I updated Linus' patch to match what the current sq_quote()
does.]
Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
What we list as "Ignored files" are not "ignored". Rather, it
is the list of "not listed in the to-be-ignored files, but
exists -- you may be forgetting to add them".
It supersedes git-rename by adding functionality to move multiple
files, directories or symlinks into another directory. It also
provides according documentation.
The implementation renames multiple files, using the arguments from
the command line to produce an array of sources and destinations. In
a first pass, all requested renames are checked for errors, and
overwriting of existing files is only allowed with '-f'. The actual
renaming is done in a second pass. This ensures that any error
condition is checked before anything is changed.
Signed-off-by: Josef Weidendorfer <Josef.Weidendorfer@gmx.de> Signed-off-by: Junio C Hamano <junkio@cox.net>
Silence confusing and false-positive curl error message
git-http-fetch spits out curl 404 error message when unable to fetch an object,
but that's confusing since no error really happened and the object is usually
found in a pack it tries right after that. And if the object still cannot be
retrieved, it will say another error message anyway. OTOH other HTTP errors
(403 etc) are likely fatal and the user should be still informed about them.
Signed-off-by: Petr Baudis <pasky@suse.cz> Signed-off-by: Junio C Hamano <junkio@cox.net>
This is what the recent git-rev-list changes have all been gearing up for.
When we use a path filter to git-rev-list, the new "--dense" flag asks
git-rev-list to compress the history so that it _only_ contains commits
that change files in the path filter. It also rewrites the parent
information so that tools like "gitk" will see the result as a dense
history tree.
For example, on the current kernel archive:
[torvalds@g5 linux]$ git-rev-list HEAD | wc -l
9904
[torvalds@g5 linux]$ git-rev-list HEAD -- kernel | wc -l
5442
[torvalds@g5 linux]$ git-rev-list --dense HEAD -- kernel | wc -l
356
which shows that while we have almost ten thousand commits, we can prune
down the work to slightly more than half by only following the merges
that are interesting. But further, we can then compress the history to
just 356 entries that actually make changes to the kernel subdirectory.
To see this in action, try something like
gitk --dense -- gitk
to see just the history that affects gitk. Or, to show that true
parallel development still remains parallel, do
gitk --dense -- daemon.c
which shows some parallel commits in the current git tree.
Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
Teach git-rev-list to follow just a specified set of files
This is the first cut at a git-rev-list that knows to ignore commits that
don't change a certain file (or set of files).
NOTE! For now it only prunes _merge_ commits, and follows the parent where
there are no differences in the set of files specified. In the long run,
I'd like to make it re-write the straight-line history too, but for now
the merge simplification is much more fundamentally important (the
rewriting of straight-line history is largely a separate simplification
phase, but the merge simplification needs to happen early if we want to
optimize away unnecessary commit parsing).
If all parents of a merge change some of the files, the merge is left as
is, so the end result is in no way guaranteed to be a linear history, but
it will often be a lot /more/ linear than the full tree, since it prunes
out parents that didn't matter for that set of files.
As an example from the current kernel:
[torvalds@g5 linux]$ git-rev-list HEAD | wc -l
9885
[torvalds@g5 linux]$ git-rev-list HEAD -- Makefile | wc -l
4084
[torvalds@g5 linux]$ git-rev-list HEAD -- drivers/usb | wc -l
5206
and you can also use 'gitk' to more visually see the pruning of the
history tree, with something like
gitk -- drivers/usb
showing a simplified history that tries to follow the first parent in a
merge that is the parent that fully defines drivers/usb/.
Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
Split up tree diff functions into tree-diff.c library
This makes the tree diff functionality independent of the "git-diff-tree"
program, by splitting the core functionality up into a library file.
This will be needed for when we teach git-rev-list to only follow a
specified set of pathnames, rather than the global revision history.
Most of it is a fairly straightforward code move, but it also involves
some calling convention cleanup, and moving some of the static variables
from diff-tree.c into the options structure.
The actual tree change callback routines also become paramterized by the
diff_options structure, allowing the library functionality to do something
else than just show the diff on stdout.
Right now the only user of this functionality remains git-diff-tree
itself.
Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
Martin Langhoff wants to use git-merge from outside git-pull and wants
to do further processing; for this, he wants git-merge no to commit
even when it cleanly merges. I think other script writers would want
something like that as well, so here it is.
Instead of the "merge commit message" parameter (which usually is made
for you by "git-pull" which calls this command), you pass an empty
string to it. Then it will not update your HEAD -- you can do whatever
you want with the resulting index file, which contains the merge results.
Later round would further improve fetch-pack not to send useless "have",
but in the meantime, increase it to help upload-pack to find more common
commits, as discussed on the list.
Update git-daemon's documentation wrt. new options
New options --timeout, --init-timeout, --export-all and whitelist support
were added to git-daemon, but noone bothered to also add the proper
documentation. This patch aims to fix that.
Signed-off-by: Petr Baudis <pasky@suse.cz> Signed-off-by: Junio C Hamano <junkio@cox.net>
Brief documentation for the mysterious git-am script
The git-am script is nowhere called and nowhere (including itself)
explained, and the name isn't helpful either. For those like me who will
wonder what is it about, add some documentation stub for it to the
documentation.
I probably got something wrong and I don't feel like investigating all the
options - this is just kind of "emergency" docs.
Signed-off-by: Petr Baudis <pasky@suse.cz> Signed-off-by: Junio C Hamano <junkio@cox.net>
If rev-parse output includes both flags and files, we should pass on any
"--" marker we see, so that the end result can also tell the difference
between a flag and a filename that begins with '-'.
[jc: merged a later one liner updates from Linus]
Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
Use sensible domain name (the DNS one) when guessing ident information
Currently, the code would use getdomainname() call, which however returns
something usually unset and not necessarily related at all to the DNS
domain name (it seems to be mostly some scary NIS/YP thing).
This patch changes the code to actually use the DNS domain name, which is
also what tends to be used in emails, and we aim at emails with our ident
code.
Signed-off-by: Petr Baudis <pasky@suse.cz> Signed-off-by: Junio C Hamano <junkio@cox.net>
With the '0' timeout given to poll, it returns instantly without any
events on my system, causing git-daemon to consume all the CPU time. Use
-1 as the timeout so poll() only returns in case of EINTR or actually
events being available.
Signed-off-by: Jens Axboe <axboe@suse.de> Signed-off-by: Junio C Hamano <junkio@cox.net>
Merge /pub/scm/git/git to recover lost side branch
Sorry for the mistake of rewinding something already pushed out.
This recovers the side branch lost by that mistake, specifically ea5a65a59916503d2a14369c46b1023384d51645 commit.
Signed-off-by: Junio C Hamano <junio@hera.kernel.org>
Be more careful tangling object chains while marking commits.
Also Johannes noticed we use parse_object to look up if we know that
object already -- we should just ask the in-core object registry with
lookup_object() for that.
The previous round to optimize fetch-pack has a small bug that
feeds SHA1^ ("parent commit") before making sure SHA1 is
actually a commit (or a tag that eventually dereferences to a
commit). Also it did not help culling the known-to-be-common
parents if the common one was a merge.
[PATCH] Do not send "want" lines for complete objects
It was all good and well to check if all remote refs are complete (local
refs or descendants thereof), but we can just as easily use the same
information to avoid sending "want" lines just for the complete objects in
the case that not all remote refs are complete (or their names differ).
Also, git-fetch-pack does not have to ask for descendants of remote refs
which are complete (for now, git-rev-list is told to ignore only the first
parent). That change also eliminates a code path where a popen()ed handle
was not pclose()ed.
Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Junio C Hamano <junkio@cox.net>
On top of optimization by Linus not to ask refs that already match, we
can walk our refs and not issue "want" for things that are known to be
reachable from them.
Support for HTTP transfer timeouts based on transfer speed
Add configuration settings to abort HTTP requests if the transfer rate
drops below a threshold for a specified length of time. Environment
variables override config file settings.
Signed-off-by: Nick Hengeveld <nickh@reactrix.com> Signed-off-by: Junio C Hamano <junkio@cox.net>
It turns out that not only did git-daemon do DWIM, but git-upload-pack
does as well. This is bad; security checks have to be performed *after*
canonicalization, not before.
Additionally, the current git-daemon can be trivially DoSed by spewing
SYNs at the target port.
This patch adds a --strict option to git-upload-pack to disable all
DWIM, a --timeout option to git-daemon and git-upload-pack, and an
--init-timeout option to git-daemon (which is typically set to a much
lower value, since the initial request should come immediately from the
client.)
Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Junio C Hamano <junkio@cox.net>
On top of optimization by Linus not to ask refs that already match, we
can walk our refs and not issue "want" for things that are known to be
reachable from them.
I took a look at webgit, and it looks like at least for the "projects"
page, the most common operation ends up being basically
git-rev-list --header --parents --max-count=1 HEAD
Now, the thing is, the way "git-rev-list" works, it always keeps on
popping the parents and parsing them in order to build the list of
parents, and it turns out that even though we just want a single commit,
git-rev-list will invariably look up _three_ generations of commits.
It will parse:
- the commit we want (it obviously needs this)
- it's parent(s) as part of the "pop_most_recent_commit()" logic
- it will then pop one of the parents before it notices that it doesn't
need any more
- and as part of popping the parent, it will parse the grandparent (again
due to "pop_most_recent_commit()".
Now, I've strace'd it, and it really is pretty efficient on the whole, but
if things aren't nicely cached, and with long-latency IO, doing those two
extra objects (at a minimum - if the parent is a merge it will be more) is
just wasted time, and potentially a lot of it.
So here's a quick special-case for the trivial case of "just one commit,
and no date-limits or other special rules".
Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
revised^2: git-daemon extra paranoia, and path DWIM
This patch adds some extra paranoia to the git-daemon filename test. In
particular, it now rejects pathnames containing //; it also adds a
redundant test for pathname absoluteness (belts and suspenders.)
A single / at the end of the path is still permitted, however, and the
.git and /.git append DWIM stuff is now handled in an integrated manner,
which means the resulting path will always be subjected to pathname checks.
Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Junio C Hamano <junkio@cox.net>
If everything is up-to-date locally, we don't need to even ask for a
pack-file from the remote, or try to unpack it.
This is especially important for tags - since the pack-file common commit
logic is based purely on the commit history, it will never be able to find
a common tag, and will thus always end up re-fetching them.
Especially notably, if the tag points to a non-commit (eg a tagged tree),
the pack-file would be unnecessarily big, just because it cannot any most
recent common point between commits for pruning.
Short-circuiting the case where we already have that reference means that
we avoid a lot of these in the common case.
NOTE! This only matches remote ref names against the same local name,
which works well for tags, but is not as generic as it could be. If we
ever need to, we could match against _any_ local ref (if we have it, we
have it), but this "match against same name" is simpler and more
efficient, and covers the common case.
Renaming of refs is common for branch heads, but since those are always
commits, the pack-file generation can optimize that case.
In some cases we might still end up fetching pack-files unnecessarily, but
this at least avoids the re-fetching of tags over and over if you use a
regular
git fetch --tags ...
which was the main reason behind the change.
Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
git-checkout: revert specific paths to either index or a given tree-ish.
When extra paths arguments are given, git-checkout reverts only those
paths to either the version recorded in the index or the version
recorded in the given tree-ish.
Teach git-add and git-commit to handle filenames starting with '-'.
Recent '--' fixes to "git diff" by Linus made it possible to specify
filenames that start with '-'. But in order to do that, you need to
be able to add and commit such file to begin with.
Teach git-add and git-commit to honor the same '--' convention.
This fixes the default built-in exec() of "diff" to add a "--" before the
filenames, so that if a filename starts with a "-", the diff program won't
think it's an option.
Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
Follow the "encode minimally" principle -- our tools, including
git-apply and git-status, can handle pathnames with embedded SP just
fine. The only problematic ones are TAB and LF, and we need to quote
the metacharacters introduced for quoting.