Support for HTTP transfer timeouts based on transfer speed
Add configuration settings to abort HTTP requests if the transfer rate
drops below a threshold for a specified length of time. Environment
variables override config file settings.
Signed-off-by: Nick Hengeveld <nickh@reactrix.com> Signed-off-by: Junio C Hamano <junkio@cox.net>
It turns out that not only did git-daemon do DWIM, but git-upload-pack
does as well. This is bad; security checks have to be performed *after*
canonicalization, not before.
Additionally, the current git-daemon can be trivially DoSed by spewing
SYNs at the target port.
This patch adds a --strict option to git-upload-pack to disable all
DWIM, a --timeout option to git-daemon and git-upload-pack, and an
--init-timeout option to git-daemon (which is typically set to a much
lower value, since the initial request should come immediately from the
client.)
Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Junio C Hamano <junkio@cox.net>
On top of optimization by Linus not to ask refs that already match, we
can walk our refs and not issue "want" for things that are known to be
reachable from them.
I took a look at webgit, and it looks like at least for the "projects"
page, the most common operation ends up being basically
git-rev-list --header --parents --max-count=1 HEAD
Now, the thing is, the way "git-rev-list" works, it always keeps on
popping the parents and parsing them in order to build the list of
parents, and it turns out that even though we just want a single commit,
git-rev-list will invariably look up _three_ generations of commits.
It will parse:
- the commit we want (it obviously needs this)
- it's parent(s) as part of the "pop_most_recent_commit()" logic
- it will then pop one of the parents before it notices that it doesn't
need any more
- and as part of popping the parent, it will parse the grandparent (again
due to "pop_most_recent_commit()".
Now, I've strace'd it, and it really is pretty efficient on the whole, but
if things aren't nicely cached, and with long-latency IO, doing those two
extra objects (at a minimum - if the parent is a merge it will be more) is
just wasted time, and potentially a lot of it.
So here's a quick special-case for the trivial case of "just one commit,
and no date-limits or other special rules".
Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
revised^2: git-daemon extra paranoia, and path DWIM
This patch adds some extra paranoia to the git-daemon filename test. In
particular, it now rejects pathnames containing //; it also adds a
redundant test for pathname absoluteness (belts and suspenders.)
A single / at the end of the path is still permitted, however, and the
.git and /.git append DWIM stuff is now handled in an integrated manner,
which means the resulting path will always be subjected to pathname checks.
Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Junio C Hamano <junkio@cox.net>
If everything is up-to-date locally, we don't need to even ask for a
pack-file from the remote, or try to unpack it.
This is especially important for tags - since the pack-file common commit
logic is based purely on the commit history, it will never be able to find
a common tag, and will thus always end up re-fetching them.
Especially notably, if the tag points to a non-commit (eg a tagged tree),
the pack-file would be unnecessarily big, just because it cannot any most
recent common point between commits for pruning.
Short-circuiting the case where we already have that reference means that
we avoid a lot of these in the common case.
NOTE! This only matches remote ref names against the same local name,
which works well for tags, but is not as generic as it could be. If we
ever need to, we could match against _any_ local ref (if we have it, we
have it), but this "match against same name" is simpler and more
efficient, and covers the common case.
Renaming of refs is common for branch heads, but since those are always
commits, the pack-file generation can optimize that case.
In some cases we might still end up fetching pack-files unnecessarily, but
this at least avoids the re-fetching of tags over and over if you use a
regular
git fetch --tags ...
which was the main reason behind the change.
Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
git-checkout: revert specific paths to either index or a given tree-ish.
When extra paths arguments are given, git-checkout reverts only those
paths to either the version recorded in the index or the version
recorded in the given tree-ish.
Teach git-add and git-commit to handle filenames starting with '-'.
Recent '--' fixes to "git diff" by Linus made it possible to specify
filenames that start with '-'. But in order to do that, you need to
be able to add and commit such file to begin with.
Teach git-add and git-commit to honor the same '--' convention.
This fixes the default built-in exec() of "diff" to add a "--" before the
filenames, so that if a filename starts with a "-", the diff program won't
think it's an option.
Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
Follow the "encode minimally" principle -- our tools, including
git-apply and git-status, can handle pathnames with embedded SP just
fine. The only problematic ones are TAB and LF, and we need to quote
the metacharacters introduced for quoting.
This makes it possible to add paths that have funny characters (TAB
and LF) in them, and makes adding many paths more efficient in
general.
New flag "--stdin" to update-index was initially added for different
purpose, but it turns out to be a perfect match for feeding "ls-files
--others -z" output to improve "git add".
It also adds "--verbose" flag to update-index for use with "git add"
command.
Functions to quote and unquote pathnames in C-style.
Following the list discussion, define two functions, quote_c_style and
unquote_c_style, to help adopting the proposed way for quoting funny
pathname letters for GNU patch. The rule is described in:
Currently we do not support the leading '!', but we probably should
barf upon seeing it. Rule B4. is interpreted to require always 3
octal digits in \XYZ notation.
This will be removed when merging the second phase of Linus' "Create
object subdirectories on demand" change anyway, but the code to
recreate the empty .git/objects/??/ directory was confused.
Deb packaging claim we depend on patch, but I think we use git-apply
where it matters. When a patch does not apply with git-apply, using
GNU patch still is helpful sometimes. So demote it from "Depends" to
"Suggests".
This patch cleans out all sparse warnings from http-fetch.c
I'm a bit uncomfortable with adding extra #ifdefs to avoid either
'mixing declaration with code' or 'unused variable' warnings, but I
figured that since those functions are already littered with #ifdefs I
might just get away with it. Comments?
[jc: I adjusted Peter's patch to address uncomfortableness issues.]
Signed-off-by: Peter Hagervall <hager@cs.umu.se> Signed-off-by: Junio C Hamano <junkio@cox.net>
whatchanged: document -m option from git-diff-tree.
The documentation for git-whatchanged is meant to describe only
the most frequently used options from git-diff-tree. Because "why
doesn't it show merges" was asked more than once, we'd better
describe '-m' option there.
Show peeled onion from upload-pack and server-info.
This updates git-ls-remote to show SHA1 names of objects that are
referred by tags, in the "ref^{}" notation.
This would make git-findtags (without -t flag) almost trivial.
git-peek-remote . |
sed -ne "s:^$target "'refs/tags/\(.*\)^{}$:\1:p'
Also Pasky could do:
git-ls-remote --tags $remote |
sed -ne 's:\( refs/tags/.*\)^{}$:\1:p'
to find out what object each of the remote tags refers to, and
if he has one locally, run "git-fetch $remote tag $tagname" to
automatically catch up with the upstream tags.
Existing "tagname^0" notation means "dereference tag zero or more
times until you cannot dereference it anymore, and make sure it is a
commit -- otherwise barf". But tags do not necessarily reference
commit objects.
This commit introduces a bit more generalized notation, "ref^{type}".
Existing "ref^0" is a shorthand for "ref^{commit}". If the type
is empty, it just dereferences tags until it hits a non-tag object.
With this, "git-rev-parse --verify 'junio-gpg-pub^{}'" shows the blob
object name -- there is no need to manually read the tag object and
find out the object name anymore.
"git-rev-parse --verify 'HEAD^{tree}'" can be used to find out the
tree object name of the HEAD commit.
This allows the remote side (most notably, upload-pack) to show
additional information without affecting the downloader. Peek-remote
does not ignore them -- this is to make it useful for Pasky's
automatic tag following.
Refuse to create funny refs in clone-pack, git-fetch and receive-pack.
Using git-check-ref-format, make sure we do not create refs with
funny names when cloning from elsewhere (clone-pack), fast forwarding
local heads (git-fetch), or somebody pushes into us (receive-pack).
Update check_ref_format() function to reject ref names that:
* has a path component that begins with a ".", or
* has a double dots "..", or
* has ASCII control character, "~", "^", ":" or SP, anywhere, or
* ends with a "/".
Use it in 'git-checkout -b', 'git-branch', and 'git-tag' to make sure
that newly created refs are well-formed.
> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
> > This patch looks bigger than it really is: The code to get the
> > default handle was refactored into a function, and is called
> > instead of curl_easy_duphandle() if that does not exist.
>
> I'd like to take Nick's config file patch first, which
> unfortunately interferes with your patch. I'd hate to ask you
> this, but could you rebase it on top of Nick's patch, [...]
No need to hate it. Here comes the rebased patch, and this time, I
actually tested it a bit.
Do our own ctype.h, just to get the sane semantics: we want
locale-independence, _and_ we want the right signed behaviour. Plus we
only use a very small subset of ctype.h anyway (isspace, isalpha,
isdigit and isalnum).
git-http-fetch: Remove size limit for objects/info/{packs,alternates}
git-http-fetch received objects/info/packs into a fixed-size buffer
and started to fail when this file became larger than the buffer.
Change it to grow the buffer dynamically, and do the same thing for
objects/info/alternates. Also add missing free() calls for these
buffers.
Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net>
This enhances set of revs you can give format-patch.
Originally, format-patch took either one rev, or two revs:
format-patch rev1
format-patch rev1 rev2
The first format was a short-hand for "format-patch rev1 HEAD"
(i.e. rev2==HEAD). What this meant was to find commits that are
in branch rev2 that has not been merged to branch rev1.
The above notation is still supported, but now it takes sequence
of "from1..to1 from2..to2 ...". In short, the second format has
become a short-hand for "format-patch rev1..rev2". Commits in
to1 but not in from1, to2 but not in from2, ... are formatted as
emailable patches.
With this, cherry-picking from other branch can be written as:
which is generally faster than traditional cherry-pick (which
always did 3-way merge) if patches apply cleanly, and still
falls back on 3-way merge if some of them do not.
This uses the new "--local" flag to git-pack-objects. It currently only
makes a difference together with "-a", since a normal incremental repack
won't pack any packed objects at all (whether local or remote).
Eventually, it might end up skipping any objects that aren't local to
the current object directory, but for now it only knows to skip packed
objects.
Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
This adds the "--local" flag to git-pack-objects, which acts like
"--incremental", except that instead of ignoring all packed objects, it
only ignores objects that are packed and in an alternate object tree.
As a result, it effectively only does a local re-pack: any remote-packed
objects will stay in the alternate object directories.
Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
Lacking reliable symlinks, the instructions in the tutorial did not work
in a cygwin setup. Also, a few outputs were not correct.
This patch fixes these, and adds a test case which follows the
instructions of the tutorial (except git-clone, -fetch and -push, which I
have not done yet).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <junkio@cox.net>
A short perl script that will walk the tag refs, tag objects, and even commit
objects in its quest to figure out whether the given SHA1 (for a commit or
tree) was ever tagged.
This version is reworked incorporating sanity, feature and style fixes from
Junio.
Object references are used in server-info.c:find_pack_info_one() to
find out which objects in the pack are heads, therefore tracking of
references cannot be disabled.
Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net>
clone-pack: new option --keep tells it not to explode the pack.
With new option --keep, or a configuration item clone.keeppack (we
need a better name, or start allowing dash,"clone.keep-pack"), the packed
data downloaded while cloning is saved as a pack in .git/objects/pack/
locally, with index generated for it with git-index-pack.
Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Junio C Hamano <junkio@cox.net>
clone-pack: new option --keep tells it not to explode the pack.
With new option --keep, or a configuration item clone.keeppack (we
need a better name, or start allowing dash,"clone.keep-pack"), the packed
data downloaded while cloning is saved as a pack in .git/objects/pack/
locally, with index generated for it with git-index-pack.
This changes the generation of hash packfiles have in their names, from
"hash of object names as fed to us" to "hash of object names in the
resulting pack, in the order they appear in the index file". The new
"git-index-pack" command is taught to output the computed hash value
to its standard output.
With this, we can store downloaded pack in a temporary file without
knowing its final name, run git-index-pack to generate idx for it
while finding out its final name, and then rename the pack and idx to
their final names.
git-index-pack builds a pack index file for an existing packed
archive. With this utility a packed archive which was transferred
without the corresponding pack index can be added to objects/pack/
without repacking.
Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net>
When feeding patches from standard input, and --interactive is specified,
quit, so that the user can re-run the command, instead of infinitely
looping.
git-fetch --tags: deal with tags with spaces in them.
"git-fetch --tags" can get confused with tags with spaces in their names,
it used to use shell IFS to split the list of tags and also used curl
which insists the URL to be escaped. Fix it so it can work with Martin's
moodle repository http://locke.catalyst.net.nz/git/moodle.git/.
We still reserve characters like leading plus-sign '+' and colon
':' anywhere to represent refspec src-dst pair, and obviously we
cannot use LF (that terminates Pull: line in .git/remotes
files), but now you can have spaces with this patch.
curl_escape ought to do this, but we should not let it quote
slashes (nobody said refs/tags cannot have subdirectories), so
we roll our own safer version. With this, the last part of
git-clone from Martin's moodle repository that used to fail now
works, which reads:
[PATCH] cvsimport: don't pass --cvs-direct if user options contradict us
Detecting if the user passed --no-cvs-direct and don't force the mode.
It allows us to support all the protocol that the standard cvs client
supports at the snail speed you should expect.
This only affects the rlog reading stage.
Signed-off-by: Martin Langhoff <martin@catalyst.net.nz>
They always were meant to be case-insensitive, but I had missed one
"tolower()", making that not true.
The actual _values_ aren't case-insensitive, of course, although some uses
of them may be (ie boolean parsing uses "strcasecmp()" to match against
the strings "true" and "false").
Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
Use git config file for committer name and email info
This starts using the "user.name" and "user.email" config variables if
they exist as the default name and email when committing. This means
that you don't have to use the GIT_COMMITTER_EMAIL environment variable
to override your email - you can just edit the config file instead.
The patch looks bigger than it is because it makes the default name and
email information non-static and renames it appropriately. And it moves
the common git environment variables into a new library file, so that
you can link against libgit.a and get the git environment without having
to link in zlib and libcrypt.
In short, most of it is renaming and moving, the real change core is
just a few new lines in "git_default_config()" that copies the user
config values to the new base.
It also changes "git-var -l" to list the config variables.
Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
If somebody set template_dir in config.mak. Then git-init-db would be
compiled with the correct location but the templates would be installed
in the default location. Fix it.
Signed-off-by: Tom Prince <tom.prince@ualberta.net> Signed-off-by: Junio C Hamano <junkio@cox.net>
With "[core] filemode = false", you can tell git to ignore
differences in the working tree file only in executable bit.
* "git-update-index --refresh" does not say "needs update" if index
entry and working tree file differs only in executable bit.
* "git-update-index" on an existing path takes executable bit
from the existing index entry, if the path and index entry are
both regular files.
* "git-diff-files" and "git-diff-index" without --cached flag
pretend the path on the filesystem has the same executable
bit as the existing index entry, if the path and index entry
are both regular files.
If you are on a filesystem with unreliable mode bits, you may need to
force the executable bit after registering the path in the index.
* "git-update-index --chmod=+x foo" flips the executable bit of the
index file entry for path "foo" on. Use "--chmod=-x" to flip it
off.
Note that --chmod only works in index file and does not look at nor
update the working tree.
So if you are on a filesystem and do not have working executable bit,
you would do:
1. set the appropriate .git/config option;
2. "git-update-index --add new-file.c"
3. "git-ls-files --stage new-file.c" to see if it has the desired
mode bits. If not, e.g. to drop executable bit picked up from the
filesystem, say "git-update-index --chmod=-x new-file.c".
I had meant to disallow unknown escape characters in the config file
parser, but instead an unknown escaped character would silently pass
through as itself. That's correct for some cases (notably '\' itself), but
wasn't correct in general.
This fixes it, and makes the parser write a nice error message if the
config file contains bogus escaped characters.
Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
show-branch: optionally use unique prefix as name.
git-show-branch acquires two new options. --sha1-name to name
commits using the unique prefix of their object names, and
--no-name to not to show names at all.
This was outlined in <7vk6gpyuyr.fsf@assigned-by-dhcp.cox.net>
With this patch, it is possible to store configuration options like
NO_CURL=YesPlease or NO_OPENSSL=YesPlease into a file named
config.mak, which will be included in the Makefile.
[jc: redone with suggestion from Daniel Barkalow to just use -include]
Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Junio C Hamano <junkio@cox.net>
Some SVN repositories that are accessible through HTTP don't like when I
retrieve files using SVN methods ("internal server error").
Therefore, I added an option to get the contents using (persistent) HTTP
directly. This also reduces round-trip time, from two or three requests
down to one.
The http commit walker cannot use the same temporary file
creation code because it needs to use predictable temporary
filename for partial fetch continuation purposes, but the code
to move the temporary file to the final location should be
usable from the ordinary object creation codepath.
Export move_temp_to_file from sha1_file.c and use it, while
losing the custom relink_or_rename function from http-fetch.c.
Also the temporary object file creation part needs to make sure
the leading path exists, in preparation of the really lazy
fan-out directory creation.
Restore functionality to allow proxies to cache objects
The parallel request changes didn't properly implement the previous patch to
allow caching of retrieved objects by proxy servers. Restore the previous
functionality such that by default requests include the "Pragma: no-cache"
header, and this header is removed on requests for pack indexes, packs, and
objects.
Signed-off-by: Nick Hengeveld <nickh@reactrix.com> Signed-off-by: Junio C Hamano <junkio@cox.net>
[PATCH] Don't fetch objects that exist in the local repository
Be sure not to fetch objects that already exist in the local repository.
The main process loop no longer performs this check, http-fetch now checks
prior to starting a new request queue entry and when fetch_object() is called,
and local-fetch now checks when fetch_object() is called.
As discussed in this thread: http://marc.theaimsgroup.com/?t=112854890500001
Signed-off-by: Nick Hengeveld <nickh@reactrix.com>
Set the parallel HTTP request limit via an environment variable
Use an environment variable rather than a command-line argument to set the
parallel HTTP request limit. This allows the setting to work whether
git-http-fetch is run directly or via git-fetch.
Signed-off-by: Nick Hengeveld <nickh@reactrix.com> Signed-off-by: Junio C Hamano <junkio@cox.net>
Add support for parallel HTTP transfers. Prefetch populates a queue of
objects to transfer and starts feeding requests to an active request
queue for processing; fetch_object keeps the active queue moving
while the specified object is being transferred. The size of the active
queue can be restricted using -r and defaults to 5 concurrent transfers.
Requests for objects that are not prefetched are also processed via the
active queue.
Signed-off-by: Nick Hengeveld <nickh@reactrix.com> Signed-off-by: Junio C Hamano <junkio@cox.net>
This is a first cut at a very simple parser for a git config file.
The format of the file is a simple ini-file like thing, with simple
variable/value pairs. You can (and should) make the variables have a
simple single-level scope, ie a valid file looks something like this:
#
# This is the config file, and
# a '#' or ';' character indicates
# a comment
#
which parses into three variables: "core.filemode" is associated with the
string "false", and "diff.external" gets the appropriate quoted value.
Right now we only react to one variable: "core.filemode" is a boolean that
decides if we should care about the 0100 (user-execute) bit of the stat
information. Even that is just a parsing demonstration - this doesn't
actually implement that st_mode compare logic itself.
Different programs can react to different config options, although they
should always fall back to calling "git_default_config()" on any config
option name that they don't recognize.
Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>