upload-pack: drop lookup-before-parse optimization
When we receive a "have" line from the client, we want to
load the object pointed to by the sha1. However, we are
careful to do:
o = lookup_object(sha1);
if (!o || !o->parsed)
o = parse_object(sha1);
to avoid loading the object from disk if we have already
seen it. However, since ccdc603 (parse_object: try internal
cache before reading object db), parse_object already does
this optimization internally. We can just call parse_object
directly.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
When upload-pack advertises refs, we attempt to peel tags
and advertise the peeled version. We currently hand-roll the
tag dereferencing, and use as many optimizations as we can
to avoid loading non-tag objects into memory.
Not only has peel_ref recently learned these optimizations,
too, but it also contains an even more important one: it
has access to the "peeled" data from the pack-refs file.
That means we can avoid not only loading annotated tags
entirely, but also avoid doing any kind of object lookup at
all.
This cut the CPU time to advertise refs by 50% in the
linux-2.6 repo, as measured by:
echo 0000 | git-upload-pack . >/dev/null
best-of-five, warm cache, objects and refs fully packed:
[before] [after]
real 0m0.026s real 0m0.013s
user 0m0.024s user 0m0.008s
sys 0m0.000s sys 0m0.000s
Those numbers are irrelevantly small compared to an actual
fetch. Here's a larger repo (400K refs, of which 12K are
unique, and of which only 107 are unique annotated tags):
[before] [after]
real 0m0.704s real 0m0.596s
user 0m0.600s user 0m0.496s
sys 0m0.096s sys 0m0.092s
This shows only a 15% speedup (mostly because it has fewer
actual tags to parse), but a larger absolute value (100ms,
which isn't a lot compared to a real fetch, but this
advertisement happens on every fetch, even if the client is
just finding out they are completely up to date).
In truly pathological cases, where you have a large number
of unique annotated tags, it can make an even bigger
difference. Here are the numbers for a linux-2.6 repository
that has had every seventh commit tagged (so about 50K
tags):
[before] [after]
real 0m0.443s real 0m0.097s
user 0m0.416s user 0m0.080s
sys 0m0.024s sys 0m0.012s
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The point of peel_ref is to dereference tags; if the base
object is not a tag, then we can return early without even
loading the object into memory.
This patch accomplishes that by checking sha1_object_info
for the type. For a packed object, we can get away with just
looking in the pack index. For a loose object, we only need
to inflate the first couple of header bytes.
This is a bit of a gamble; if we do find a tag object, then
we will end up loading the content anyway, and the extra
lookup will have been wasteful. However, if it is not a tag
object, then we save loading the object entirely. Depending
on the ratio of non-tags to tags in the input, this can be a
minor win or minor loss.
However, it does give us one potential major win: if a ref
points to a large blob (e.g., via an unannotated tag), then
we can avoid looking at it entirely.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The idea of the peel_ref function is to dereference tag
objects recursively until we hit a non-tag, and return the
sha1. Conceptually, it should return 0 if it is successful
(and fill in the sha1), or -1 if there was nothing to peel.
However, the current behavior is much more confusing. For a
regular loose ref, the behavior is as described above. But
there is an optimization to reuse the peeled-ref value for a
ref that came from a packed-refs file. If we have such a
ref, we return its peeled value, even if that peeled value
is null (indicating that we know the ref definitely does
_not_ peel).
It might seem like such information is useful to the caller,
who would then know not to bother loading and trying to peel
the object. Except that they should not bother loading and
trying to peel the object _anyway_, because that fallback is
already handled by peel_ref. In other words, the whole point
of calling this function is that it handles those details
internally, and you either get a sha1, or you know that it
is not peel-able.
This patch catches the null sha1 case internally and
converts it into a -1 return value (i.e., there is nothing
to peel). This simplifies callers, which do not need to
bother checking themselves.
Two callers are worth noting:
- in pack-objects, a comment indicates that there is a
difference between non-peelable tags and unannotated
tags. But that is not the case (before or after this
patch). Whether you get a null sha1 has to do with
internal details of how peel_ref operated.
- in show-ref, if peel_ref returns a failure, the caller
tries to decide whether to try peeling manually based on
whether the REF_ISPACKED flag is set. But this doesn't
make any sense. If the flag is set, that does not
necessarily mean the ref came from a packed-refs file
with the "peeled" extension. But it doesn't matter,
because even if it didn't, there's no point in trying to
peel it ourselves, as peel_ref would already have done
so. In other words, the fallback peeling is guaranteed
to fail.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we are asked to peel a ref to a sha1, we internally call
deref_tag, which will recursively parse each tagged object
until we reach a non-tag. This has the benefit that we will
verify our ability to load and parse the pointed-to object.
However, there is a performance downside: we may not need to
load that object at all (e.g., if we are listing peeled
simply listing peeled refs), or it may be a large object
that should follow a streaming code path (e.g., an annotated
tag of a large blob).
It makes more sense for peel_ref to choose the fast thing
rather than performing the extra check, for two reasons:
1. We will already sometimes short-circuit the tag parsing
in favor of a peeled entry from a packed-refs file. So
we are already favoring speed in some cases, and it is
not wise for a caller to rely on peel_ref to detect
corruption.
2. We already silently ignore much larger corruptions,
like a ref that points to a non-existent object, or a
tag object that exists but is corrupted.
2. peel_ref is not the right place to check for such a
database corruption. It is returning only the sha1
anyway, not the actual object. Any callers which use
that sha1 to load an object will soon discover the
corruption anyway, so we are really just pushing back
the discovery to later in the program.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Merge branch 'rr/maint-submodule-unknown-cmd' into maint
"git submodule frotz" was not diagnosed as "frotz" being an unknown
subcommand to "git submodule"; the user instead got a complaint that
"git submodule status" was run with an unknown path "frotz".
* rr/maint-submodule-unknown-cmd:
submodule: if $command was not matched, don't parse other args
Merge branch 'sp/maint-http-enable-gzip' into maint
"git fetch" over http advertised that it supports "deflate", which
is much less common, and did not advertise more common "gzip" on its
Accept-Encoding header.
* sp/maint-http-enable-gzip:
Enable info/refs gzip decompression in HTTP client
Merge branch 'sp/maint-http-info-refs-no-retry' into maint
"git fetch" over http had an old workaround for an unlikely server
misconfiguration; it turns out that this hurts debuggability of the
configuration in general, and has been reverted.
* sp/maint-http-info-refs-no-retry:
Revert "retry request without query when info/refs?query fails"
Merge branch 'maint' of git://github.com/git-l10n/git-po into maint
Update German and Simplified Chinese translations.
* 'maint' of git://github.com/git-l10n/git-po:
l10n: de.po: correct translation of a 'rebase' message
l10n: Improve many translation for zh_CN
l10n: Unify the translation for '(un)expected'
Merge branch 'jc/maint-log-grep-all-match-1' into maint
* jc/maint-log-grep-all-match-1:
grep.c: make two symbols really file-scope static this time
t7810-grep: test --all-match with multiple --grep and --author options
t7810-grep: test interaction of multiple --grep and --author options
t7810-grep: test multiple --author with --all-match
t7810-grep: test multiple --grep with and without --all-match
t7810-grep: bring log --grep tests in common form
grep.c: mark private file-scope symbols as static
log: document use of multiple commit limiting options
log --grep/--author: honor --all-match honored for multiple --grep patterns
grep: show --debug output only once
grep: teach --debug option to dump the parse tree
submodule: if $command was not matched, don't parse other args
"git submodule" command DWIMs the command line and assumes a
unspecified action word for 'status' action. This is a UI mistake
that leads to a confusing behaviour. A mistyped command name is
instead treated as a request for 'status' of the submodule with that
name, e.g.
$ git submodule show
error: pathspec 'show' did not match any file(s) known to git.
Did you forget to 'git add'?
Stop DWIMming an unknown or mistyped subcommand name as pathspec
given to unspelled "status" subcommand. "git submodule" without any
argument is still interpreted as "git submodule status", but its
value is questionable.
Adjust t7400 to match, and stop advertising the default subcommand
being 'status' which does not help much in practice, other than
promoting laziness and confusion.
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
That patch does fix expansion of weird variables in some
simple tests, but it also seems to break other things, like
expansion of refs by "git checkout".
While we're sorting out the correct solution, we are much
better with the original bug (people with metacharacters in
their completions occasionally see an error message) than
the current bug (ref completion does not work at all).
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Merge branch 'jc/maint-blame-no-such-path' into maint
Even during a conflicted merge, "git blame $path" always meant to
blame uncommitted changes to the "working tree" version; make it
more useful by showing cleanly merged parts as coming from the other
branch that is being merged.
This incidentally fixes an unrelated problem on a case insensitive
filesystem, where "git blame MAKEFILE" run in a history that has
"Makefile" but not "MAKEFILE" did not say "No such file MAKEFILE in
HEAD" but pretended as if "MAKEFILE" was a newly added file.
* jc/maint-blame-no-such-path:
blame: allow "blame file" in the middle of a conflicted merge
blame $path: avoid getting fooled by case insensitive filesystems
"git fetch --all", when passed "--no-tags", did not honor the
"--no-tags" option while fetching from individual remotes (the same
issue existed with "--tags", but combination "--all --tags" makes
much less sense than "--all --no-tags").
* dj/fetch-all-tags:
fetch --all: pass --tags/--no-tags through to each remote
submodule: use argv_array instead of hand-building arrays
fetch: use argv_array instead of hand-building arrays
argv-array: fix bogus cast when freeing array
argv-array: add pop function
The pretty formats for GPG signatures were introduced but never
documented. Use the documentation from the commit that introduced them.
Do the same for the --show-signature option added to git log and
friends.
Signed-off-by: Stephen Boyd <bebarino@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Enable info/refs gzip decompression in HTTP client
Some HTTP servers try to use gzip compression on the /info/refs
request to save transfer bandwidth. Repositories with many tags
may find the /info/refs request can be gzipped to be 50% of the
original size due to the few but often repeated bytes used (hex
SHA-1 and commonly digits in tag names).
For most HTTP requests enable "Accept-Encoding: gzip" ensuring
the /info/refs payload can use this encoding format.
Only request gzip encoding from servers. Although deflate is
supported by libcurl, most servers have standardized on gzip
encoding for compression as that is what most browsers support.
Asking for deflate increases request sizes by a few bytes, but is
unlikely to ever be used by a server.
Disable the Accept-Encoding header on probe RPCs as response bodies
are supposed to be exactly 4 bytes long, "0000". The HTTP headers
requesting and indicating compression use more space than the data
transferred in the body.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Retrying without the query parameter was added as a workaround
for a single broken HTTP server at git.debian.org[1]. The server
was misconfigured to route every request with a query parameter
into gitweb.cgi. Admins fixed the server's configuration within
16 hours of the bug report to the Git mailing list, but we still
patched Git with this fallback and have been paying for it since.
Most Git hosting services configure the smart HTTP protocol and the
retry logic confuses users when there is a transient HTTP error as
Git dropped the real error from the smart HTTP request. Removing the
retry makes root causes easier to identify.
As reported by Jeroen Meijer[1]; the current code doesn't deal properly
with items (tags, branches, etc.) that have ${} in them because they get
expaned by bash while using compgen.
A simple solution is to quote the items so they get expanded properly
(\$\{\}).
In order to achieve that I took bash-completion's quote() function,
which is rather simple, and renamed it to __git_quote() as per Jeff
King's suggestion.
Merge branch 'jk/config-warn-on-inaccessible-paths' into maint
The attribute system may be asked for a path that itself or its
leading directories no longer exists in the working tree, and it is
fine if we cannot open .gitattribute file in such a case. Failure
to open per-directory .gitattributes with error status other than
ENOENT and ENOTDIR should be diagnosed.
* jk/config-warn-on-inaccessible-paths:
attr: failure to open a .gitattributes file is OK with ENOTDIR
warn_on_inaccessible(): a helper to warn on inaccessible paths
attr: warn on inaccessible attribute files
gitignore: report access errors of exclude files
config: warn on inaccessible files
Documentation/git-filter-branch: Move note about effect of removing commits
The note that explains that changes introduced by removed commits are
preserved should be placed directly after the paragraph that describes
such commits removal. Otherwise the reference to "the commits" appears
out of context.
Also the big example that follows "Consider this history" is about
rewriting part of the history DAG. Move the paragraph that
describes the operation close to it.
Signed-off-by: Andreas Schwab <schwab@linux-m68k.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
mailinfo: do not concatenate charset= attribute values from mime headers
"Content-type: text/plain; charset=UTF-8" header should not appear
twice in the input, but it is always better to gracefully deal with
such a case. The current code concatenates the value to the values
we have seen previously, producing nonsense such as "utf8UTF-8".
Instead of concatenating, forget the previous value and use the last
value we see.
Documentation: indent-with-non-tab uses "equivalent tabs" not 8
Update the documentation of the core.whitespace option
"indent-with-non-tab" to correctly reflect that it catches the use of
spaces instead of the equivalent tabs, rather than a fixed number.
Signed-off-by: Wesley J. Landaker <wjl@icecavern.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The log --grep tests generate the expected out in different ways.
Make them all use command blocks so that subshells are avoided and the
expected output is easier to grasp visually.
Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Merge branch 'mz/cherry-pick-cmdline-order' into maint
* mz/cherry-pick-cmdline-order:
cherry-pick/revert: respect order of revisions to pick
demonstrate broken 'git cherry-pick three one two'
teach log --no-walk=unsorted, which avoids sorting
Merge branch 'jc/maint-checkout-fileglob-doc' into maint-1.7.11
* jc/maint-checkout-fileglob-doc:
gitcli: contrast wildcard given to shell and to git
gitcli: formatting fix
Document file-glob for "git checkout -- '*.c'"
log --grep/--author: honor --all-match honored for multiple --grep patterns
When we have both header expression (which has to be an OR node by
construction) and a pattern expression (which could be anything), we
create a new top-level OR node to bind them together, and the
resulting expression structure looks like this:
OR
/ \
/ \
pattern OR
/ \ / \
..... committer OR
/ \
author TRUE
The three elements on the top-level backbone that are inspected by
the "all-match" logic are "pattern", "committer" and "author". When
there are more than one elements in the "pattern", the top-level
node of the "pattern" part of the subtree is an OR, and that node is
inspected by "all-match".
The result ends up ignoring the "--all-match" given from the command
line. A match on either side of the pattern is considered a match,
hence:
git log --grep=A --grep=B --author=C --all-match
shows the same "authored by C and has either A or B" that is correct
only when run without "--all-match".
Fix this by turning the resulting expression around when "--all-match"
is in effect, like this:
OR
/ \
/ \
/ OR
committer / \
author \
pattern
The set of nodes on the top-level backbone in the resulting
expression becomes "committer", "author", and the nodes that are on
the top-level backbone of the "pattern" subexpression. This makes
the "all-match" logic inspect the same nodes in "pattern" as the
case without the author and/or the committer restriction, and makes
the earlier "log" example to show "authored by C and has A and has
B", which is what the command line expects.
When threaded grep is in effect, the patterns are duplicated and
recompiled for each thread. Avoid "--debug" output during the
recompilation so that the output is given once instead of "1+nthreads"
times.
Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our "grep" allows complex boolean expressions to be formed to match
each individual line with operators like --and, '(', ')' and --not.
Introduce the "--debug" option to show the parse tree to help people
who want to debug and enhance it.
Also "log" learns "--grep-debug" option to do the same. The command
line parser to the log family is a lot more limited than the general
"git grep" parser, but it has special handling for header matching
(e.g. "--author"), and a parse tree is valuable when working on it.
Note that "--all-match" is *not* any individual node in the parse
tree. It is an instruction to the evaluator to check all the nodes
in the top-level backbone have matched and reject a document as
non-matching otherwise.
This reverts the i18n part of 7f81463 (Use correct grammar in diffstat
summary line - 2012-02-01) but still keeps the grammar correctness for
English. It also reverts b354f11 (Fix tests under GETTEXT_POISON on
diffstat - 2012-08-27). The result is diffstat always in English
for all commands.
This helps stop users from accidentally sending localized
format-patch'd patches.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
attr: failure to open a .gitattributes file is OK with ENOTDIR
Often we consult an in-tree .gitattributes file that exists per
directory. Majority of directories do not usually have such a file,
and it is perfectly fine if we cannot open it because there is no
such file, but we do want to know when there is an I/O or permission
error. Earlier, we made the codepath warn when we fail to open it
for reasons other than ENOENT for that reason.
We however sometimes have to attempt to open the .gitattributes file
from a directory that does not exist in the commit that is currently
checked out. "git pack-objects" wants to know if a path is marked
with "-delta" attributes, and "git archive" wants to know about
export-ignore and export-subst attributes. Both commands may and do
need to ask the attributes system about paths in an arbitrary
commit. "git diff", after removing an entire directory, may want to
know textconv on paths that used to be in that directory.
Make sure we also ignore a failure to open per-directory attributes
file due to ENOTDIR.
The discussion of email subject throughout the documentation is
misleading; it indicates that the first line will always become
the subject. In fact, the subject is generally all lines up until
the first full blank line.
This patch refines that, and makes more use of the concept of a
commit title, with the title being all text up to the first blank line.
Signed-off-by: Jeremy White <jwhite@codeweavers.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Merge branch 'jc/apply-binary-p0' into maint-1.7.11
"git apply -p0" did not parse pathnames on "diff --git" line
correctly. This caused patches that had pathnames in no other
places to be mistakenly rejected (most notably, binary patch that
does not rename nor change mode). Textual patches, renames or mode
changes have preimage and postimage pathnames in different places in
a form that can be parsed unambiguously and did not suffer from this
problem.
* jc/apply-binary-p0:
apply: compute patch->def_name correctly under -p0
Merge branch 'jc/dotdot-is-parent-directory' into maint-1.7.11
"git log .." errored out saying it is both rev range and a path when
there is no disambiguating "--" is on the command line. Update the
command line parser to interpret ".." as a path in such a case.
* jc/dotdot-is-parent-directory:
specifying ranges: we did not mean to make ".." an empty set
Merge branch 'jc/maint-doc-checkout-b-always-takes-branch-name' into maint-1.7.11
The synopsis said "checkout [-B branch]" to make it clear the
branch name is a parameter to the option, but the heading for the
option description was "-B::", not "-B branch::", making the
documentation misleading.
* jc/maint-doc-checkout-b-always-takes-branch-name:
doc: "git checkout -b/-B/--orphan" always takes a branch name
Merge branch 'jk/maint-http-half-auth-push' into maint-1.7.11
Pushing to smart HTTP server with recent Git fails without having
the username in the URL to force authentication, if the server is
configured to allow GET anonymously, while requiring authentication
for POST.
* jk/maint-http-half-auth-push:
http: prompt for credentials on failed POST
http: factor out http error code handling
t: test http access to "half-auth" repositories
t: test basic smart-http authentication
t/lib-httpd: recognize */smart/* repos as smart-http
t/lib-httpd: only route auth/dumb to dumb repos
t5550: factor out http auth setup
t5550: put auth-required repo in auth/dumb
blame: allow "blame file" in the middle of a conflicted merge
"git blame file" has always meant "find the origin of each line of
the file in the history leading to HEAD, oh by the way, blame the
lines that are modified locally to the working tree".
This teaches "git blame" that during a conflicted merge, some
uncommitted changes may have come from the other history that is
being merged.
The verify_working_tree_path() function introduced in the previous
patch to notice a typo in the filename (primarily on case insensitive
filesystems) has been updated to allow a filename that does not exist
in HEAD (i.e. the tip of our history) as long as it exists one of the
commits being merged, so that a "we deleted, the other side modified"
case tracks the history of the file in the history of the other side.
* jc/capabilities:
fetch-pack: mention server version with verbose output
parse_feature_request: make it easier to see feature values
fetch-pack: do not ask for unadvertised capabilities
do not send client agent unless server does first
send-pack: fix capability-sending logic
include agent identifier in capability string
* jk/check-docs-update:
check-docs: get documented command list from Makefile
check-docs: drop git-help special-case
check-docs: list git-gui as a command
check-docs: factor out command-list
command-list: mention git-credential-* helpers
command-list: add git-sh-i18n
check-docs: update non-command documentation list
check-docs: mention gitweb specially