When pack-objects collects the list of objects to pack
(either from stdin, or via its internal rev-list), it
filters each one through want_object_in_pack().
This function loops through each existing packfile, looking
for the object. When we find it, we mark the pack/offset
combo for later use. However, we can't just return "yes, we
want it" at that point. If --honor-pack-keep is in effect,
we must keep looking to find it in _all_ packs, to make sure
none of them has a .keep. Likewise, if --local is in effect,
we must make sure it is not present in any non-local pack.
As a result, the sum effort of these calls is effectively
O(nr_objects * nr_packs). In an ordinary repository, we have
only a handful of packs, and this doesn't make a big
difference. But in pathological cases, it can slow the
counting phase to a crawl.
This patch notices the case that we have neither "--local"
nor "--honor-pack-keep" in effect and breaks out of the loop
early, after finding the first instance. Note that our worst
case is still "objects * packs" (i.e., we might find each
object in the last pack we look in), but in practice we will
often break out early. On an "average" repo, my git.git with
8 packs, this shows a modest 2% (a few dozen milliseconds)
improvement in the counting-objects phase of "git
pack-objects --all <foo" (hackily instrumented by sticking
exit(0) right after list_objects).
But in a much more pathological case, it makes a bigger
difference. I ran the same command on a real-world example
with ~9 million objects across 1300 packs. The counting time
dropped from 413s to 45s, an improvement of about 89%.
Note that this patch won't do anything by itself for a
normal "git gc", as it uses both --honor-pack-keep and
--local.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
find_pack_entry: replace last_found_pack with MRU cache
Each pack has an index for looking up entries in O(log n)
time, but if we have multiple packs, we have to scan through
them linearly. This can produce a measurable overhead for
some operations.
We dealt with this long ago in f7c22cc (always start looking
up objects in the last used pack first, 2007-05-30), which
keeps what is essentially a 1-element most-recently-used
cache. In theory, we should be able to do better by keeping
a similar but longer cache, that is the same length as the
pack-list itself.
Since we now have a convenient generic MRU structure, we can
plug it in and measure. Here are the numbers for running
p5303 against linux.git:
The main numbers of interest here are the rev-list ones
(since that is exercising the normal object lookup code
path). The single-pack case shouldn't improve at all; the
260ms speedup there is just part of the run-to-run noise
(but it's important to note that we didn't make anything
worse with the overhead of maintaining our cache). In the
50-pack case, we see similar results. There may be a slight
improvement, but it's mostly within the noise.
The 1000-pack case does show a big improvement, though. That
carries over to the repack case, as well. Even though we
haven't touched its pack-search loop yet, it does still do a
lot of normal object lookups (e.g., for the internal
revision walk), and so improves.
As a point of reference, I also ran the 1000-pack test
against a version of HEAD^ with the last_found_pack
optimization disabled. It takes ~60s, so that gives an
indication of how much even the single-element cache is
helping.
For comparison, here's a smaller repository, git.git:
You can see that the percentage improvement is similar.
That's because the lookup we are optimizing is roughly
O(nr_objects * nr_packs). Since the number of packs is
constant in both tests, we'd expect the improvement to be
linear in the number of objects. But the whole process is
also linear in the number of objects, so the improvement
is a constant factor.
The exact improvement does also depend on the contents of
the packs. In p5303, the extra packs all have 5 first-parent
commits in them, which is a reasonable simulation of a
pushed-to repository. But it also means that only 250
first-parent commits are in those packs (compared to almost
50,000 total in linux.git), and the rest are in the huge
"base" pack. So once we start looking at history in taht big
pack, that's where we'll find most everything, and even the
1-element cache gets close to 100% cache hits. You could
almost certainly show better numbers with a more
pathological case (e.g., distributing the objects more
evenly across the packs). But that's simply not that
realistic a scenario, so it makes more sense to focus on
these numbers.
The implementation itself is a straightforward application
of the MRU code. We provide an MRU-ordered list of packs
that shadows the packed_git list. This is easy to do because
we only create and revise the pack list in one place. The
"reprepare" code path actually drops the whole MRU and
replaces it for simplicity. It would be more efficient to
just add new entries, but there's not much point in
optimizing here; repreparing happens rarely, and only after
doing a lot of other expensive work. The key things to keep
optimized are traversal (which is just a normal linked list,
albeit with one extra level of indirection over the regular
packed_git list), and marking (which is a constant number of
pointer assignments, though slightly more than the old
last_found_pack was; it doesn't seem to create a measurable
slowdown, though).
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are a few places in Git that would benefit from a fast
most-recently-used cache (e.g., the list of packs, which we
search linearly but would like to order based on locality).
This patch introduces a generic list that can be used to
store arbitrary pointers in most-recently-used order.
The implementation is just a doubly-linked list, where
"marking" an item as used moves it to the front of the list.
Insertion and marking are O(1), and iteration is O(n).
There's no lookup support provided; if you need fast
lookups, you are better off with a different data structure
in the first place.
There is also no deletion support. This would not be hard to
do, but it's not necessary for handling pack structs, which
are created and never removed.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The point of this function is to drop an entry from the
"packed_git" cache that points to a file we might be
overwriting, because our contents may not be the same (and
hence the only caller was pack-objects as it moved a
temporary packfile into place).
In older versions of git, this could happen because the
names of packfiles were derived from the set of objects they
contained, not the actual bits on disk. But since 1190a1a
(pack-objects: name pack files after trailer hash,
2013-12-05), the name reflects the actual bits on disk, and
any two packfiles with the same name can be used
interchangeably.
Dropping this function not only saves a few lines of code,
it makes the lifetime of "struct packed_git" much easier to
reason about: namely, we now do not ever free these structs.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Git's pack storage does efficient (log n) lookups in a
single packfile's index, but if we have multiple packfiles,
we have to linearly search each for a given object. This
patch introduces some timing tests for cases where we have a
large number of packs, so that we can measure any
improvements we make in the following patches.
The main thing we want to time is object lookup. To do this,
we measure "git rev-list --objects --all", which does a
fairly large number of object lookups (essentially one per
object in the repository).
However, we also measure the time to do a full repack, which
is interesting for two reasons. One is that in addition to
the usual pack lookup, it has its own linear iteration over
the list of packs. And two is that because it it is the tool
one uses to go from an inefficient many-pack situation back
to a single pack, we care about its performance not only at
marginal numbers of packs, but at the extreme cases (e.g.,
if you somehow end up with 5,000 packs, it is the only way
to get back to 1 pack, so we need to make sure it performs
well).
We measure the performance of each command in three
scenarios: 1 pack, 50 packs, and 1,000 packs.
The 1-pack case is a baseline; any optimizations we do to
handle multiple packs cannot possibly perform better than
this.
The 50-pack case is as far as Git should generally allow
your repository to go, if you have auto-gc enabled with the
default settings. So this represents the maximum performance
improvement we would expect under normal circumstances.
The 1,000-pack case is hopefully rare, though I have seen it
in the wild where automatic maintenance was broken for some
time (and the repository continued to receive pushes). This
represents cases where we care less about general
performance, but want to make sure that a full repack
command does not take excessively long.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
difftool: use Git::* functions instead of passing around state
Call Git::command() and friends directly wherever possible.
This makes it clear that these operations can be invoked directly
without needing to manage the current directory and related GIT_*
environment variables.
Eliminate find_repository() since we can now use wc_path() and
not worry about side-effects involving environment variables.
Signed-off-by: David Aguilar <davvid@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Environment variables are global and hard to reason about.
Use the `--git-dir` and `--work-tree` arguments when invoking `git`
instead of relying on the environment.
Add a test to ensure that difftool's dir-diff feature works when these
variables are present in the environment.
Signed-off-by: David Aguilar <davvid@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
rebase-interactive: trim leading whitespace from progress count
Interactive rebase uses 'wc -l' to write the current patch number
in a progress report. Some implementations of 'wc -l' produce spaces
before the number, leading to ugly output such as
Rebasing ( 3/8)
Remove the spaces using a trivial arithmetic evaluation.
Before 9588c52 (i18n: rebase-interactive: mark strings for
translation) this was not a problem because printf was used to
generate the text. Since that commit, the count is interpolated
directly from a shell variable into the text, where the spaces
remain. The total number of patches does not have this problem
even though it is interpolated from a shell variable in the same
manner, because the variable is set by an arithmetic evaluation.
Later in the script, there is a virtually identical case where
leading spaces are trimmed, but it uses a pattern substitution:
I did not choose this idiom because it adds a line of code, and
there is already an arithmetic evaluation in the vicinity of the
line that is changed here.
Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
submodule-config: passing name reference for .gitmodule blobs
Commit 959b5455 (submodule: implement a config API for lookup of
.gitmodules values, 2015-08-18) implemented the initial version of the
submodule config cache. During development of that initial version we
extracted the function gitmodule_sha1_from_commit(). During that process
we missed that the strbuf rev was still used in config_from() and now is
left empty. Lets fix this by also returning this string.
This means that now when reading .gitmodules from revisions, the error
messages also contain a reference to the blob they are from.
Signed-off-by: Heiko Voigt <hvoigt@hvoigt.net> Reviewed-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git grep -i" has been taught to fold case in non-ascii locales
correctly.
* nd/icase:
grep.c: reuse "icase" variable
diffcore-pickaxe: support case insensitive match on non-ascii
diffcore-pickaxe: Add regcomp_or_die()
grep/pcre: support utf-8
gettext: add is_utf8_locale()
grep/pcre: prepare locale-dependent tables for icase matching
grep: rewrite an if/else condition to avoid duplicate expression
grep/icase: avoid kwsset when -F is specified
grep/icase: avoid kwsset on literal non-ascii strings
test-regex: expose full regcomp() to the command line
test-regex: isolate the bug test code
grep: break down an "if" stmt in preparation for next changes
Merge branch 'sb/submodule-parallel-fetch' into maint
Fix recently introduced codepaths that are involved in parallel
submodule operations, which gave up on reading too early, and
could have wasted CPU while attempting to write under a corner
case condition.
* sb/submodule-parallel-fetch:
hoist out handle_nonblock function for xread and xwrite
xwrite: poll on non-blocking FDs
xread: retry after poll on EAGAIN/EWOULDBLOCK
The test framework learned a new helper test_match_signal to
check an exit code from getting killed by an expected signal.
* jk/test-match-signal:
t/lib-git-daemon: use test_match_signal
test_must_fail: use test_match_signal
t0005: use test_match_signal as appropriate
tests: factor portable signal check out of t0005
Merge branch 'js/am-call-theirs-theirs-in-fallback-3way' into maint
One part of "git am" had an oddball helper function that called
stuff from outside "his" as opposed to calling what we have "ours",
which was not gender-neutral and also inconsistent with the rest of
the system where outside stuff is usuall called "theirs" in
contrast to "ours".
Merge branch 'js/color-on-windows-comment' into maint
For a long time, we carried an in-code comment that said our
colored output would work only when we use fprintf/fputs on
Windows, which no longer is the case for the past few years.
* js/color-on-windows-comment:
color.h: remove obsolete comment about limitations on Windows
More mark-up updates to typeset strings that are expected to
literally typed by the end user in fixed-width font.
* mm/doc-tt:
doc: typeset HEAD and variants as literal
CodingGuidelines: formatting HEAD in documentation
doc: typeset long options with argument as literal
doc: typeset '--' as literal
doc: typeset long command-line options as literal
doc: typeset short command-line options as literal
Documentation/git-mv.txt: fix whitespace indentation
Merge branch 'js/sign-empty-commit-fix' into maint
"git commit --amend --allow-empty-message -S" for a commit without
any message body could have misidentified where the header of the
commit object ends.
* js/sign-empty-commit-fix:
commit -S: avoid invalid pointer with empty message
Git does not know what the contents in the index should be for a
path added with "git add -N" yet, so "git grep --cached" should not
show hits (or show lack of hits, with -L) in such a path, but that
logic does not apply to "git grep", i.e. searching in the working
tree files. But we did so by mistake, which has been corrected.
* nd/ita-cleanup:
grep: fix grepping for "intent to add" files
t7810-grep.sh: fix a whitespace inconsistency
t7810-grep.sh: fix duplicated test name
Merge branch 'js/find-commit-subject-ignore-leading-blanks' into maint
A helper function that takes the contents of a commit object and
finds its subject line did not ignore leading blank lines, as is
commonly done by other codepaths. Make it ignore leading blank
lines to match.
* js/find-commit-subject-ignore-leading-blanks:
reset --hard: skip blank lines when reporting the commit subject
sequencer: use skip_blank_lines() to find the commit subject
commit -C: skip blank lines at the beginning of the message
commit.c: make find_commit_subject() more robust
pretty: make the skip_blank_lines() function public
Recent FreeBSD stopped making perl available at /usr/bin/perl;
switch the default the built-in path to /usr/local/bin/perl on not
too ancient FreeBSD releases.
* ew/find-perl-on-freebsd-in-local:
config.mak.uname: correct perl path on FreeBSD
Recent update to "git daemon" tries to enable the socket-level
KEEPALIVE, but when it is spawned via inetd, the standard input
file descriptor may not necessarily be connected to a socket.
Suppress an ENOTSOCK error from setsockopt().
* ew/daemon-socket-keepalive:
Windows: add missing definition of ENOTSOCK
daemon: ignore ENOTSOCK from setsockopt
"git pack-objects" and "git index-pack" mostly operate with off_t
when talking about the offset of objects in a packfile, but there
were a handful of places that used "unsigned long" to hold that
value, leading to an unintended truncation.
* nd/pack-ofs-4gb-limit:
fsck: use streaming interface for large blobs in pack
pack-objects: do not truncate result in-pack object size on 32-bit systems
index-pack: correct "offset" type in unpack_entry_data()
index-pack: report correct bad object offsets even if they are large
index-pack: correct "len" type in unpack_data()
sha1_file.c: use type off_t* for object_info->disk_sizep
pack-objects: pass length to check_pack_crc() without truncation
"git worktree prune" protected worktrees that are marked as
"locked" by creating a file in a known location. "git worktree"
command learned a dedicated command pair to create and remove such
a file, so that the users do not have to do this with editor.
"git notes merge" had a code to see if a path exists (and fails if
it does) and then open the path for writing (when it doesn't).
Replace it with open with O_EXCL.
* rs/notes-merge-no-toctou:
notes-merge: use O_EXCL to avoid overwriting existing files
A few tests that specifically target "git rebase -i" have been
added.
* js/rebase-i-tests:
rebase -i: we allow extra spaces after fixup!/squash!
rebase -i: demonstrate a bug with --autosquash
t3404: add a test for the --gpg-sign option
i18n: config: unfold error messages marked for translation
Introduced in 473166b ("config: add 'origin_type' to config_source
struct", 2016-02-19), Git can inform the user about the origin of a
config error, but the implementation does not allow translators to
translate the keywords 'file', 'blob, 'standard input', and
'submodule-blob'. Moreover, for the second message, a reason for the
error is appended to the message, not allowing translators to translate
that reason either.
Unfold the message into several templates for each known origin_type.
That would result in better translation at the expense of code
verbosity.
Add enum config_oringin_type to ease management of the various
configuration origin types (blob, file, etc). Previously origin type
was considered from command line if cf->origin_type == NULL, i.e.,
uninitialized. Now we set origin_type to CONFIG_ORIGIN_CMDLINE in
git_config_from_parameters() and configset_add_value().
For error message in git_parse_source(), use xstrfmt() function to
prepare the message string, instead of doing something like it's done
for die_bad_number(), because intelligibility and code conciseness are
improved for that instance.
Signed-off-by: Vasco Almeida <vascomalmeida@sapo.pt> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Prefer "test" over "[ ... ]", use double-quotes around variables, break
long lines, and properly indent "case" statements.
Helped-by: Johannes Sixt <j6t@kdbg.org> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: David Aguilar <davvid@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
"... in the internal raw Git format `%s %z` format." was clunky in
repeating "format" twice, and would not have helped those who do not
immediately get that these are strftime(3) conversion specifiers.
Explain them with words, and demote the mention of `%s %z` to a
hint to help those who know them.
Helped-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
We already have "--date=raw", which is a Unix epoch
timestamp plus a contextual timezone (either the author's or
the local). But one may not care about the timezone and just
want the epoch timestamp by itself. It's not hard to parse
the two apart, but if you are using a pretty-print format,
you may want git to show the "finished" form that the user
will see.
We can accomodate this by adding a new date format, "unix",
which is basically "raw" without the timezone.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "raw" format shows a Unix epoch timestamp, but with a
timezone tacked on. The timestamp is not _in_ that zone, but
it is extra information about the time (by default, the zone
the author was in).
The documentation claims that "raw-local" does not work. It
does, but the end result is rather subtle. Let's describe it
in better detail, and test to make sure it works (namely,
the epoch time doesn't change, but the zone does).
While we are rewording the documentation in this area, let's
not use the phrase "does not work" for the remaining option,
"--date=relative". It's vague; do we accept it or not? We do
accept it, but it has no effect (which is a reasonable
outcome). We should also refer to the option not as
"--relative" (which is the historical synonym, and does not
take "-local" at all), but as "--date=relative".
Helped-by: Jakub Narębski <jnareb@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our usual style in the test scripts is to indent here
documents with tabs, and use "<<-" to strip the tabs. The
result is easier to read.
This old test script did not do so in its inception, and
further tests added onto it followed the local style. Let's
bring it in line with our usual style.
Some of the tests actually care quite a bit about
whitespace, but none of them do so at the beginning of the
line (because they use things like qz_to_tab_space to avoid
depending on the literal whitespace), so we can do a fairly
mechanical conversion.
Most of the here-docs also use interpolation, so they have
been left as "<<-EOF". In a few cases, though, where
interpolation was not in use, I've converted them to
"<<-\EOF" to match our usual "don't interpolate unless you
need to" style.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
We generally avoid performing actions at the top-level of
the script (outside of a test_expect block) for two reasons:
1. The test harness is not checking and reporting if they
fail.
2. Their output is not handled correctly (not hidden by
default, nor shown with "-v").
Using &&-chains seems like it should help with (1), but it
doesn't. If either of the commands fails, we simply skip
running the follow-on test entirely, and the test harness
has no idea.
We can fix this by pushing that setup into its own block.
It _could_ go into the following test block, but since the
result in this case is used by multiple tests, it's more
clear to mark it explicitly as a distinct setup step.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git merge" in v2.9 prevents merging unrelated histories.
"git subtree split --rejoin" creates unrelated histories when
creating a split repo from a raw sub-directory that did not
originate from an invocation of "git subtree add".
Restore the original behavior by passing --allow-unrelated-histories
when merging subtrees. This ensures that the synthetic history
created by "git subtree split" can be merged.
Add a test to ensure that this feature works as advertised.
Reported-by: Brett Cundal <brett.cundal@iugome.com> Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: David Aguilar <davvid@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
push: allow pushing new branches with --force-with-lease
If there is no upstream information for a branch, it is likely that it
is newly created and can safely be pushed under the normal fast-forward
rules. Relax the --force-with-lease check so that we do not reject
these branches immediately but rather attempt to push them as new
branches, using the null SHA-1 as the expected value.
In fact, it is already possible to push new branches using the explicit
--force-with-lease=<branch>:<expect> syntax, so all we do here is make
this behaviour the default if no explicit "expect" value is specified.
Signed-off-by: John Keeping <john@keeping.me.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Last October, we had to change this code to run `git merge-recursive`
in a child process: git-am wants to print some helpful advice when the
merge failed, but the code in question was not prepared to return, it
die()d instead.
We are finally at a point when the code *is* prepared to return errors,
and can avoid the child process again.
This reverts commit c63d4b2 (am -3: do not let failed merge from
completing the error codepath, 2015-10-09), with the necessary changes
to adjust for the fact that Git's source code changed in the meantime
(such as: using OIDs instead of hashes in the recursive merge, and a
removed gender bias).
Note: the code now calls merge_recursive_generic() again. Unlike
merge_trees() and merge_recursive(), this function returns 0 upon success,
as most of Git's functions. Therefore, the error value -1 naturally is
handled correctly, and we do not have to take care of it specifically.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
merge-recursive: switch to returning errors instead of dying
The recursive merge machinery is supposed to be a library function, i.e.
it should return an error when it fails. Originally the functions were
part of the builtin "merge-recursive", though, where it was simpler to
call die() and be done with error handling.
The existing callers were already prepared to detect negative return
values to indicate errors and to behave as previously: exit with code 128
(which is the same thing that die() does, after printing the message).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
We are about to libify the recursive merge machinery, where we only
die() in case of a bug or memory contention. To that end, we must heed
negative return values as indicating errors.
This requires our functions to be careful to pass through error
conditions in call chains, and for quite a few functions this means
that they have to return values to begin with.
The next step will be to convert the places where we currently die() to
return negative values (read: -1) instead.
Note that we ignore errors reported by make_room_for_path(), consistent
with the previous behavior (update_file_flags() used the return value of
make_room_for_path() only to indicate an early return, but not a fatal
error): if the error is really a fatal error, we will notice later; If
not, it was not that serious a problem to begin with. (Witnesses in
favor of this reasoning are t4151-am-abort and t7610-mergetool, which
would start failing if we stopped on errors reported by
make_room_for_path()).
Also note: while this patch makes the code slightly less readable in
update_file_flags() (we introduce a new "goto free_buf;" instead of
an explicit "free(buf); return;"), it is a preparatory change for
the next patch where we will convert all of the die() calls in the same
function to go through the free_buf return path instead.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
merge-recursive: allow write_tree_from_memory() to error out
It is possible that a tree cannot be written (think: disk full). We
will want to give the caller a chance to clean up instead of letting
the program die() in such a case.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
merge-recursive: avoid returning a wholesale struct
It is technically allowed, as per C89, for functions' return type to
be complete structs (i.e. *not* just pointers to structs).
However, it was just an oversight of this developer when converting
Python code to C code in 6d297f8 (Status update on merge-recursive in
C, 2006-07-08) which introduced such a return type.
Besides, by converting this construct to pass in the struct, we can now
start returning a value that can indicate errors in future patches. This
will help the current effort to libify merge-recursive.c.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
prepare the builtins for a libified merge_recursive()
Previously, callers of merge_trees() or merge_recursive() expected that
code to die() with an error message. This used to be okay because we
called those commands from scripts, and had a chance to print out a
message in case the command failed fatally (read: with exit code 128).
As scripting incurs its own set of problems (portability, speed,
idiosyncrasies of different shells, limited data structures leading to
inefficient code), we are converting more and more of these scripts into
builtins, using library functions directly.
We already tried to use merge_recursive() directly in the builtin
git-am, for example. Unfortunately, we had to roll it back temporarily
because some of the code in merge-recursive.c still deemed it okay to
call die(), when the builtin am code really wanted to print out a useful
advice after the merge failed fatally. In the next commits, we want to
fix that.
The code touched by this commit expected merge_trees() to die() with
some useful message when there is an error condition, but merge_trees()
is going to be improved by converting all die() calls to return error()
instead (i.e. return value -1 after printing out the message as before),
so that the caller can react more flexibly.
This is a step to prepare for the version of merge_trees() that no
longer dies, even if we just imitate the previous behavior by calling
exit(128): this is what callers of e.g. `git merge` have come to expect.
Note that the callers of the sequencer (revert and cherry-pick) already
fail fast even for the return value -1; The only difference is that they
now get a chance to say "<command> failed".
A caller of merge_trees() might want handle error messages themselves
(or even suppress them). As this patch is already complex enough, we
leave that change for a later patch.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
It can be puzzling to see that was_tracked() asks to get an index entry
by name, but does not take a negative return value for an answer.
The reason we have to do this is that cache_name_pos() only looks for
entries in stage 0, even if nobody asked for any stage in particular.
Let's rewrite the logic a little bit, to handle the easy case early: if
cache_name_pos() returned a non-negative position, we know it is a match,
and we do not even have to compare the name again (cache_name_pos() did
that for us already). We can say right away: yes, this file was tracked.
Only if there was no exact match do we need to look harder for any
matching entry in stage 2.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
While working on the patch series that avoids die()ing in recursive
merges, the issue came up that bug reports (i.e. die("BUG: ...")
constructs) should never be translated, as the target audience is the
Git developer community, not necessarily the current user, and hence
a translated message would make it *harder* to address the problem.
So let's stop translating the obvious ones. As it is really, really
outside the purview of this patch series to see whether there are more
die() statements that report bugs and are currently translated, that
task is left for another day and patch.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The vast majority of error messages in Git's source code which report a
bug use the convention to prefix the message with "BUG:".
As part of cleaning up merge-recursive to stop die()ing except in case of
detected bugs, let's just make the remainder of the bug reports consistent
with the de facto rule.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
t5520: verify that `pull --rebase` shows the helpful advice when failing
It was noticed by Brendan Forster last October that the builtin `git am`
regressed on that. Our hot fix reverted to spawning the recursive merge
instead of using it as a library function.
As we are about to revert that hot fix, after making the recursive merge a
true library function (i.e. a function that does not die() in case of
"normal" errors), let's add a test that verifies that we do not regress on
the same problem which made the hot fix necessary in the first place.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Existing autoconf generated test for the need to link with pthread
library did not check all the functions from pthread libraries;
recent FreeBSD has some functions in libc but not others, and we
mistakenly thought linking with libc is enough when it is not.
* ew/autoconf-pthread:
configure.ac: stronger test for pthread linkage
"git blame file" allowed the lineage of lines in the uncommitted,
unadded contents of "file" to be inspected, but it refused when
"file" did not appear in the current commit. When "file" was
created by renaming an existing file (but the change has not been
committed), this restriction was unnecessarily tight.
* mh/blame-worktree:
t/t8003-blame-corner-cases.sh: Use here documents
blame: allow to blame paths freshly added to the index
When "git fsck" reports a broken link (e.g. a tree object contains
a blob that does not exist), both containing object and the object
that is referred to were reported with their 40-hex object names.
The command learned the "--name-objects" option to show the path to
the containing object from existing refs (e.g. "HEAD~24^2:file.txt").
* js/fsck-name-object:
fsck: optionally show more helpful info for broken links
fsck: give the error function a chance to see the fsck_options
fsck_walk(): optionally name objects on the go
fsck: refactor how to describe objects
"git add -N dir/file && git write-tree" produced an incorrect tree
when there are other paths in the same directory that sorts after
"file".
* nd/cache-tree-ita:
cache-tree: do not generate empty trees as a result of all i-t-a subentries
cache-tree.c: fix i-t-a entry skipping directory updates sometimes
test-lib.sh: introduce and use $EMPTY_BLOB
test-lib.sh: introduce and use $EMPTY_TREE
* nd/test-helpers:
t/test-lib.sh: fix running tests with --valgrind
Makefile: use VCSSVN_LIB to refer to svn library
Makefile: drop extra dependencies for test helpers
Makefile assumed that -lrt is always available on platforms that
want to use clock_gettime() and CLOCK_MONOTONIC, which is not a
case for recent Mac OS X. The necessary symbols are often found in
libc on many modern systems and having -lrt on the command line, as
long as the library exists, had no effect, but when the platform
removes librt.a that is a different matter--having -lrt will break
the linkage.
This change could be seen as a regression for those who do need to
specify -lrt, as they now specifically ask for NEEDS_LIBRT when
building. Hopefully they are in the minority these days.
* rw/make-needs-librt:
config.mak.uname: define NEEDS_LIBRT under Linux, for now
Makefile: add NEEDS_LIBRT to optionally link with librt
The API to iterate over all the refs (i.e. for_each_ref(), etc.)
has been revamped.
* mh/ref-iterators:
for_each_reflog(): reimplement using iterators
dir_iterator: new API for iterating over a directory tree
for_each_reflog(): don't abort for bad references
do_for_each_ref(): reimplement using reference iteration
refs: introduce an iterator interface
ref_resolves_to_object(): new function
entry_resolves_to_object(): rename function from ref_resolves_to_object()
get_ref_cache(): only create an instance if there is a submodule
remote rm: handle symbolic refs correctly
delete_refs(): add a flags argument
refs: use name "prefix" consistently
do_for_each_ref(): move docstring to the header file
refs: remove unnecessary "extern" keywords
Error handling in the codepaths that updates refs has been
improved.
* mh/update-ref-errors:
lock_ref_for_update(): avoid a symref resolution
lock_ref_for_update(): make error handling more uniform
t1404: add more tests of update-ref error handling
t1404: document function test_update_rejected
t1404: remove "prefix" argument to test_update_rejected
t1404: rename file to t1404-update-ref-errors.sh
This allows us to use common test infrastructure and parallelize
the tests. For now, GIT_SVN_TEST_HTTPD=true needs to be set to
enable the SVN HTTP tests because we reuse the same test cases
for both file:// and http:// SVN repositories. SVN_HTTPD_PORT
is no longer honored.
Tested under Apache 2.2 and 2.4 on Debian 7.x (wheezy) and
8.x (jessie), respectively.
Cc: Clemens Buchacher <drizzd@aon.at> Cc: Michael J Gruber <git@drmicha.warpmail.net> Signed-off-by: Eric Wong <e@80x24.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
When c5c31d33 (grep: move pattern-type bits support to top-level
grep.[ch], 2012-10-03) introduced grep_commit_pattern_type() helper
function, the intention was to allow the users of grep API to having
to fiddle only with .pattern_type_option (which can be set to "fixed",
"basic", "extended", and "pcre"), and then immediately before compiling
the pattern strings for use, call grep_commit_pattern_type() to have
it prepare various bits in the grep_opt structure (like .fixed,
.regflags, etc.).
However, grep_set_pattern_type_option() helper function the grep API
internally uses were left as an external function by mistake. This
function shouldn't have been made callable by the users of the API.
Later when the grep API was used in revision traversal machinery,
the caller then mistakenly started calling the function around 34a4ae55 (log --grep: use the same helper to set -E/-F options as
"git grep", 2012-10-03), instead of setting the .pattern_type_option
field and letting the grep_commit_pattern_type() to take care of the
details.
This caused an unnecessary bug that made a configured
grep.patternType take precedence over the command line options
(e.g. --basic-regexp, --fixed-strings) in "git log" family of
commands.
The actual shortening rules aren't that interesting and
probably not worth getting into (I gloss over them here as
"shortened for human readability"). But the fact that %gD
shows whatever you gave on the command line is subtle and
worth mentioning. Since most people will feed a shortened
refname in the first place, it otherwise makes it hard to
understand the difference between the two.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
doc/pretty-formats: describe index/time formats for %gd
The "reflog selector" format changes based on a series of
heuristics, and that applies equally to both stock "log -g"
output, as well as "--format=%gd". The documentation for
"%gd" doesn't cover this. Let's mention the multiple formats
and refer the user back to the "-g" section for the complete
rules.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
We document that asking for HEAD@{now} will switch the
output to show HEAD@{timestamp}, but not that specifying
`--date` has a similar effect, or that it can be overridden
with HEAD@{0}. Let's do so.
These rules come from 794151e (reflog-walk: always make
HEAD@{0} show indexed selectors, 2012-05-04), though that is
simply the culmination of years of these heuristics growing
organically.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
doc/rev-list-options: clarify "commit@{Nth}" for "-g" option
When "log -g" shows "HEAD@{1}", "HEAD@{2}", etc, calling
that "commit@{Nth}" is not really accurate. The "HEAD" part
is really the refname. By saying "commit", a reader may
misunderstand that to mean something related to the specific
commit we are showing, not the ref whose reflog we are
traversing.
While we're here, let's also switch these instances to use
literal backticks, as our style guide recommends. As a
bonus, that lets us drop some asciidoc quoting.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
submodule-helper: fix indexing in clone retry error reporting path
'git submodule--helper update-clone' has logic to retry failed clones
a second time. For this purpose, there is a list of submodules to clone,
and a second list that is filled with the submodules to retry. Within
these lists, the submodules are identified by an index as if both lists
were just appended.
This works nicely except when the second clone attempt fails as well. To
report an error, the identifying index must be adjusted by an offset so
that it can be used as an index into the second list. However, the
calculation uses the logical total length of the lists so that the result
always points one past the end of the second list.
Pick the correct index.
Signed-off-by: Johannes Sixt <j6t@kdbg.org> Acked-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
git-submodule: forward exit code of git-submodule--helper more faithfully
git-submodule--helper is invoked as the upstream of a pipe in several
places. Usually, the failure of a program in this position is not
detected by the shell. For this reason, the code inserts a token in the
output stream when git-submodule--helper fails that is detected
downstream, where the shell script is quit with exit code 1.
There happens to be a bug in git-submodule--helper that leads to a
segmentation fault. The test suite triggers the crash in several places,
all of which are protected by 'test_must_fail'. But due to the inspecific
exit code 1, the crash remains undiagnosed.
Extend the failure protocol such that git-submodule--helper's exit code
is passed downstream (only in the case of failure). This enables the
downstream to use it as its own exit code, and 'test_must_fail' to
identify the segmentation fault as an unexpected failure.
The bug itself is fixed in the next commit.
Signed-off-by: Johannes Sixt <j6t@kdbg.org> Acked-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the transport protocol we use NAK to signal the non existence of a
common base, so fix the documentation. This helps readers of the document,
as they don't have to wonder about the difference between NAK and NACK.
As NACK is used in git archive and upload-archive, this is easy to get
wrong.
Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
If you have whitespace errors in lines you've introduced, it
can be convenient to be able to jump directly to them for
fixing. You can't quite use "git jump diff" for this,
because though it passes arbitrary options to "git diff", it
expects to see an actual unified diff in the output.
Whereas "git diff --check" actually produces lines that look
like compiler quickfix lines already, meaning we just need
to run it and feed the output directly to the editor.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>