gitweb.git
fetch: use free_refs()Jeff King Sat, 13 Apr 2019 05:54:09 +0000 (01:54 -0400)

fetch: use free_refs()

There's no need for us to write this loop manually when a helper
function can already do it.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

pkt-line: prepare buffer before handling ERR packetsJeff King Sat, 13 Apr 2019 05:54:02 +0000 (01:54 -0400)

pkt-line: prepare buffer before handling ERR packets

Since 2d103c31c2 (pack-protocol.txt: accept error packets in any
context, 2018-12-29), the pktline code will detect an ERR packet and die
automatically, saving the caller from dealing with it. But we do so too
early in the function, before we have terminated the buffer with a NUL.

As a result, passing the ERR message to die() may result in us printing
random cruft from a previous packet. This doesn't trigger memory tools
like ASan because we reuse the same buffer over and over (so the
contents are valid and initialized; they're just stale).

We can see demonstrate this by tightening the regex we use to match the
error message in t5516; without this patch, git-fetch will accidentally
print the capabilities from the (much longer) initial packet we
received.

By moving the ERR code later in the function we get a few other
benefits, too:

- we'll now chomp any newline sent by the other side (which is what we
want, since die() will add its own newline)

- we'll now mention the ERR packet with GIT_TRACE_PACKET

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

upload-pack: send ERR packet for non-tip objectsJeff King Sat, 13 Apr 2019 05:53:34 +0000 (01:53 -0400)

upload-pack: send ERR packet for non-tip objects

Commit bdb31eada7 (upload-pack: report "not our ref" to client,
2017-02-23) catches the case where a client asks for an object we don't
have, and issues a message that the client can show to the user (in
addition to dying and writing to stderr).

There's a similar case (with the same message) when the client asks for
an object which we _do_ have, but which isn't a ref tip (or isn't
reachable, when uploadpack.allowReachableSHA1InWant is true). Let's give
that one the same treatment, for the same reason (namely that it's more
informative to the client than just hanging up, since they won't see our
stderr over some protocols).

There are two tests here. We cover it most directly in t5530 by invoking
upload-pack, which matches the existing "not our ref" test.

But a more end-to-end check is that "git fetch" actually shows the
message to the client. We're already checking in t5516 that this case
fails, so we can just check stderr there, too. Note that even after we
started ignoring SIGPIPE in 8bf4becf0c, this could in theory still be
racy as described in that commit (because we die() on write failures
before pumping the connection for any ERR packets).

In practice this should be OK for this case. The server will not
actually check reachability until it has received our whole group of
"want" lines. And since we have no objects in the repository, we won't
send any "have" lines, meaning we're always waiting to read the server
response.

Note also that this case cannot happen in the v2 protocol, since it
allows any available object to be requested. However, we don't have to
take any steps to protect against the upcoming GIT_TEST_PROTOCOL_VERSION
in our tests:

- the tests in t5516 would already need to be skipped under v2, and
that is covered by ab0c5f5096 (tests: always test fetch of
unreachable with v0, 2019-02-25)

- the tests in t5530 invoke upload-pack directly, which will continue
to default to v0. Eventually we may have a test setting which uses
v2 even for bare upload-pack calls, but we can't override it here
until we know what the setting looks like.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t5530: check protocol response for "not our ref"Jeff King Sat, 13 Apr 2019 05:53:09 +0000 (01:53 -0400)

t5530: check protocol response for "not our ref"

Back in 9f9aa76130 (upload-pack: Improve error message when bad ref
requested, 2010-07-31), we added a test to make sure that we die with a
sensible message when the client asks for an object we don't have.

Much later, in bdb31eada7 (upload-pack: report "not our ref" to client,
2017-02-23), we started reporting that information via an "ERR" line in
the protocol. Let's check that part, as well.

While we're touching this test, let's drop the "-q" on the grep calls.
Our usual test style just relies on --verbose to control output.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t5516: drop ok=sigpipe from unreachable-want testsJeff King Sat, 13 Apr 2019 05:52:18 +0000 (01:52 -0400)

t5516: drop ok=sigpipe from unreachable-want tests

We annotated our test_must_fail calls in 8bf4becf0c (add "ok=sigpipe" to
test_must_fail and use it to fix flaky tests, 2015-11-27) because the
abrupt hangup of the server meant that we'd sometimes fail on read() and
sometimes get SIGPIPE on write().

But since 143588949c (fetch: ignore SIGPIPE during network operation,
2019-03-03), we make sure that we end up with a real die(), and our
tests no longer need to work around the race.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

gettext tests: export the restored GIT_TEST_GETTEXT_POISONJunio C Hamano Mon, 15 Apr 2019 04:48:31 +0000 (13:48 +0900)

gettext tests: export the restored GIT_TEST_GETTEXT_POISON

6cdccfce ("i18n: make GETTEXT_POISON a runtime option", 2018-11-08)
made the gettext-poison test a runtime option (which was a good
move) and adjusted the test framework so that Git commands we run as
part of the framework, as opposed to the ones that are part of the
test proper, are not affected by the setting. The original value
for the GIT_TEST_GETTEXT_POISON environment variable is saved away
in another variable and gets unset, and then later the saved value
is restored to the environment variable.

But the code forgot to export the variable again, which is necessary
to restore the "export" bit that was lost when the variable was unset.

Signed-off-by: Junio C Hamano <gitster@pobox.com>

progress: break too long progress bar linesSZEDER Gábor Fri, 12 Apr 2019 19:45:15 +0000 (21:45 +0200)

progress: break too long progress bar lines

Some of the recently added progress indicators have quite long titles,
which might be even longer when translated to some languages, and when
they are shown while operating on bigger repositories, then the
progress bar grows longer than the default 80 column terminal width.

When the progress bar exceeds the width of the terminal it gets
line-wrapped, and after that the CR at the end doesn't return to the
beginning of the progress bar, but to the first column of its last
line. Consequently, the first line of the previously shown progress
bar is not overwritten by the next, and we end up with a bunch of
truncated progress bar lines scrolling past:

$ LANG=es_ES.UTF-8 git commit-graph write
Encontrando commits para commit graph entre los objetos empaquetados: 2% (1599
Encontrando commits para commit graph entre los objetos empaquetados: 3% (1975
Encontrando commits para commit graph entre los objetos empaquetados: 4% (2633
Encontrando commits para commit graph entre los objetos empaquetados: 5% (3292
[...]

Prevent this by breaking progress bars after the title once they
exceed the width of the terminal, so the counter and optional
percentage and throughput, i.e. all changing parts, are on the last
line. Subsequent updates will from then on only refresh the changing
parts, but not the title, and it will look like this:

$ LANG=es_ES.UTF-8 ~/src/git/git commit-graph write
Encontrando commits para commit graph entre los objetos empaquetados:
100% (6584502/6584502), listo.
Calculando números de generación de commit graph: 100% (824705/824705), listo.
Escribiendo commit graph en 4 pasos: 100% (3298820/3298820), listo.

Note that the number of columns in the terminal is cached by
term_columns(), so this might not kick in when it should when a
terminal window is resized while the operation is running.
Furthermore, this change won't help if the terminal is so narrow that
the counters don't fit on one line, but I would put this in the "If it
hurts, don't do it" box.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

progress: clear previous progress update dynamicallySZEDER Gábor Fri, 12 Apr 2019 19:45:14 +0000 (21:45 +0200)

progress: clear previous progress update dynamically

When the progress bar includes throughput, its length can shorten as
the unit of display changes from KiB to MiB. To cover up remnants of
the previous progress bar when such a change of units happens we
always print three spaces at the end of the progress bar.

Alas, covering only three characters is not quite enough: when both
the total and the throughput happen to change units from KiB to MiB in
the same update, then the progress bar's length is shortened by four
characters (or maybe even more!):

Receiving objects: 25% (2901/11603), 772.01 KiB | 733.00 KiB/s
Receiving objects: 27% (3133/11603), 1.43 MiB | 1.16 MiB/s s

and a stray 's' is left behind.

So instead of hard-coding the three characters to cover, let's compare
the length of the current progress bar with the previous one, and
cover up as many characters as needed.

Sure, it would be much simpler to just print more spaces at the end of
the progress bar, but this approach is more future-proof, and it won't
print extra spaces when none are needed, notably when the progress bar
doesn't show throughput and thus never shrinks.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

macOS: make sure that gettext is foundJohannes Schindelin Sun, 14 Apr 2019 21:19:29 +0000 (14:19 -0700)

macOS: make sure that gettext is found

Due to reasons (some XCode versions seem to include gettext, some
don't?), Homebrew does not expose the libraries and headers in
/usr/local/ by default anymore.

Let's help find them again.

Note: for some reason, this is a change of behavior caused by the
upgrade to Mojave, identified in our Azure Pipeline; it seems that
Homebrew used to add the /usr/local/ directories to the include and link
search path before, but now it no longer does.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t9822: skip tests if file names cannot be ISO-8859... Johannes Schindelin Sun, 14 Apr 2019 21:19:29 +0000 (14:19 -0700)

t9822: skip tests if file names cannot be ISO-8859-1 encoded

Most notably, it seems that macOS' APFS does not allow that.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

submodule foreach: fix "<command> --quiet" not being... Nguyễn Thái Ngọc Duy Fri, 12 Apr 2019 10:08:19 +0000 (17:08 +0700)

submodule foreach: fix "<command> --quiet" not being respected

Robin reported that

git submodule foreach --quiet git pull --quiet origin

is not really quiet anymore [1]. "git pull" behaves as if --quiet is not
given.

This happens because parseopt in submodule--helper will try to parse
both --quiet options as if they are foreach's options, not git-pull's.
The parsed options are removed from the command line. So when we do
pull later, we execute just this

git pull origin

When calling submodule helper, adding "--" in front of "git pull" will
stop parseopt for parsing options that do not really belong to
submodule--helper foreach.

PARSE_OPT_KEEP_UNKNOWN is removed as a safety measure. parseopt should
never see unknown options or something has gone wrong. There are also
a couple usage string update while I'm looking at them.

While at it, I also add "--" to other subcommands that pass "$@" to
submodule--helper. "$@" in these cases are paths and less likely to be
--something-like-this. But the point still stands, git-submodule has
parsed and classified what are options, what are paths. submodule--helper
should never consider paths passed by git-submodule to be options even
if they look like one.

The test case is also contributed by Robin.

[1] it should be quiet before fc1b9243cd (submodule: port submodule
subcommand 'foreach' from shell to C, 2018-05-10) because parseopt
can't accidentally eat options then.

Reported-by: Robin H. Johnson <robbat2@gentoo.org>
Tested-by: Robin H. Johnson <robbat2@gentoo.org>
Signed-off-by: Robin H. Johnson <robbat2@gentoo.org>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

tests: disallow the use of abbreviated options (by... Johannes Schindelin Fri, 12 Apr 2019 09:37:24 +0000 (02:37 -0700)

tests: disallow the use of abbreviated options (by default)

Git's command-line parsers support uniquely abbreviated options, e.g.
`git init --ba` would automatically expand `--ba` to `--bare`.

This is a very convenient feature in every day life for Git users, in
particular when tab completion is not available.

However, it is not a good idea to rely on that in Git's test suite, as
something that is a unique abbreviation of a command line option today
might no longer be a unique abbreviation tomorrow.

For example, if a future contribution added a new mode
`git init --babyproofing` and a previously-introduced test case used the
fact that `git init --ba` expanded to `git init --bare`, that future
contribution would now have to touch seemingly unrelated tests just to
keep the test suite from failing.

So let's disallow abbreviated options in the test suite by default.

Note: for ease of implementation, this patch really only touches the
`parse-options` machinery: more and more hand-rolled option parsers are
converted to use that internal API, and more and more scripts are
converted to built-ins (naturally using the parse-options API, too), so
in practice this catches most issues, and is definitely the biggest bang
for the buck.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

replace: peel tag when passing a tag first to --graftChristian Couder Sun, 31 Mar 2019 13:46:59 +0000 (15:46 +0200)

replace: peel tag when passing a tag first to --graft

When passing a tag as the first argument to `git replace --graft`,
it can be useful to accept it and use the underlying commit as a
the commit that will be replaced.

This already works for lightweight tags, but unfortunately
for annotated tags we have been using the hash of the tag object
instead of the hash of the underlying commit.

Especially we would pass the hash of the tag object to
replace_object_oid() where we would likely fail with an error
like:

"error: Objects must be of the same type.
'annotated_replaced_object' points to a replaced object of type 'tag'
while 'replacement' points to a replacement object of type 'commit'."

This patch fixes that by using the hash of the underlying commit
when an annotated tag is passed.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

replace: peel tag when passing a tag as parent to ... Christian Couder Sun, 31 Mar 2019 13:46:58 +0000 (15:46 +0200)

replace: peel tag when passing a tag as parent to --graft

When passing a tag as a parent argument to `git replace --graft`,
it can be useful to accept it and use the underlying commit as a
parent.

This already works for lightweight tags, but unfortunately
for annotated tags we have been using the hash of the tag object
instead of the hash of the underlying commit as a parent in the
replacement object we create.

This created invalid objects, but the replace succeeded even if
it showed an error like:

error: object A is a tag, not a commit

This patch fixes that by using the hash of the underlying commit
when an annotated tag is passed.

While at it, let's also update an error message to make it
clearer.

Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

send-email: default to quoted-printable when CR is... brian m. carlson Sat, 13 Apr 2019 22:45:51 +0000 (22:45 +0000)

send-email: default to quoted-printable when CR is present

In 7a36987fff ("send-email: add an auto option for transfer encoding",
2018-07-08), git send-email learned how to automatically determine the
transfer encoding for a patch. However, the only criterion considered
was the length of the lines.

Another case we need to consider is that of carriage returns. Because
emails have CRLF endings when canonicalized, we don't want to write raw
carriage returns into a patch, lest they be stripped off as an artifact
of the transport. Ensure that we choose quoted-printable encoding if the
patch we're sending contains carriage returns.

Note that we are guaranteed to always correctly encode carriage returns
when writing quoted-printable since we explicitly specify the line
ending as "\n", forcing MIME::QuotedPrint to encode our carriage return
as "=0D".

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

pack-objects: write objects packed to trace2Jonathan Tan Thu, 11 Apr 2019 17:36:26 +0000 (10:36 -0700)

pack-objects: write objects packed to trace2

This is useful when investigating performance of pushes, and other times
when no progress information is written (because the pack is written to
stdout).

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

progress: use xmalloc/xcallocJeff King Thu, 11 Apr 2019 13:49:57 +0000 (09:49 -0400)

progress: use xmalloc/xcalloc

Since the early days of Git, the progress code allocates its struct with
a bare malloc(), not xmalloc(). If the allocation fails, we just avoid
showing progress at all.

While perhaps a noble goal not to fail the whole operation because of
optional progress, in practice:

1. Any failure to allocate a few dozen bytes here means critical path
allocations are likely to fail, too.

2. These days we use a strbuf for throughput progress (and there's a
patch under discussion to do the same for non-throughput cases,
too). And that uses xmalloc() under the hood, which means we'd
still die on some allocation failures.

Let's switch to xmalloc(). That makes us consistent with the rest of Git
and makes it easier to audit for other (less careful) bare mallocs.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

xdiff: use xmalloc/xreallocJeff King Thu, 11 Apr 2019 13:49:25 +0000 (09:49 -0400)

xdiff: use xmalloc/xrealloc

Most of xdiff uses a bare malloc() to allocate memory, and returns an
error when we get NULL. However, there are a few spots which don't check
the return value and may segfault, including at least xdl_merge() and
xpatience.c's find_longest_common_sequence().

Let's use xmalloc() everywhere instead, so that we get a graceful die()
for these cases, without having to do further auditing. This does mean
the existing cases which check errors will now die() instead of
returning an error up the stack. But:

- that's how the rest of Git behaves already for malloc errors

- all of the callers of xdi_diff(), etc, die upon seeing an error

So while we might one day want to fully lib-ify the diff code and make
it possible to use as part of a long-running process, we're not close to
that now. And because we're just tweaking the xdl_malloc() macro here,
we're not really moving ourselves any further away from that. We
could, for example, simplify some of the functions which handle malloc()
errors which can no longer occur. But that would probably be taking us
in the wrong direction.

This also makes our malloc handling more consistent with the rest of
Git, including enforcing GIT_ALLOC_LIMIT and trying to reclaim pack
memory when needed.

Reported-by: 王健强 <jianqiang.wang@securitygossip.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

xdiff: use git-compat-utilJeff King Thu, 11 Apr 2019 13:48:33 +0000 (09:48 -0400)

xdiff: use git-compat-util

Since the xdiff library was not originally part of Git, it does its own
system includes. Let's instead use git-compat-util, which has two
benefits:

1. It adjusts for any system-specific quirks in how or what we should
include (though xdiff's needs are light enough that this hasn't
been a problem in the past).

2. It lets us use wrapper functions like xmalloc().

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

test-prio-queue: use xmallocJeff King Thu, 11 Apr 2019 13:48:14 +0000 (09:48 -0400)

test-prio-queue: use xmalloc

test-prio-queue.c doesn't check the return value of malloc, and could
segfault.

It's unlikely for this to matter in practice; it's a small allocation,
and this code isn't even installed alongside the rest of Git. But let's
use xmalloc(), which makes auditing for other accidental uses of bare
malloc() easier.

Reported-by: 王健强 <jianqiang.wang@securitygossip.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

tag: advise on nested tagsDenton Liu Thu, 4 Apr 2019 18:25:15 +0000 (11:25 -0700)

tag: advise on nested tags

Robert Dailey reported confusion on the mailing list about a nested
tag which was most likely created by mistake. Jeff King noted that
this isn't a very common case and creating a tag-to-a-tag can be a
user-error.

Suggest that it may be a mistake with an advice message when
creating such a tag. Those who do want to create a tag that point
at another tag regularly can turn it off with the usual advice
mechanism.

Reported-by: Robert Dailey <rcdailey.lists@gmail.com>
Helped-by: Jeff King <peff@peff.net>
Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Denton Liu <liu.denton@gmail.com>
[jc: fixed test style and tweaked the log message]
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t3000 (ls-files -o): widen description to reflect curre... Kyle Meyer Wed, 10 Apr 2019 15:28:57 +0000 (11:28 -0400)

t3000 (ls-files -o): widen description to reflect current tests

Remove the mention of symlinks from the test description because
several tests that are not related to symlinks have been added since
this file was introduced long ago.

Signed-off-by: Kyle Meyer <kyle@kyleam.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

untracked cache: fix off-by-oneJohannes Schindelin Wed, 10 Apr 2019 12:56:48 +0000 (05:56 -0700)

untracked cache: fix off-by-one

In f9e6c649589e (untracked cache: load from UNTR index extension,
2015-03-08), code was added to read back the untracked cache from an
index extension.

Probably in the endeavor to avoid the `calloc()` implied by
`FLEX_ALLOC_STR()` (it is hard to know why exactly, the commit message
of that commit is a bit parsimonious with information), it calls
`malloc()` manually and then `memcpy()`s the bits and pieces into place.

It allocates the size of `struct untracked_cache_dir` plus the string
length of the untracked file name, then copies the information in two
steps: first the fixed-size metadata, then the name. And here lies the
rub: it includes the trailing NUL byte in the name.

If `FLEX_ARRAY` is defined as 0, this results in a buffer overrun.

To fix this, let's just add 1, for the trailing NUL byte. Technically,
this overallocates on platforms where `FLEX_ARRAY` is 1, but it should
not matter much in reality, as `malloc()` usually overallocates anyway,
unless the size to allocate aligns exactly with some internal chunk size
(see below for more on that).

The real strange thing is that neither valgrind nor DrMemory catches
this bug. In this developer's tests, a `memcpy()` (but not a
`memset()`!) could write up to 4 bytes after the allocated memory range
before valgrind would start reporting an issue.

However, when running Git built with nedmalloc as allocator, under rare
conditions (and inconsistently at that), this bug triggered an `abort()`
because nedmalloc rounds up the size to be `malloc()`ed to a multiple of
a certain chunk size, then adds a few bytes to be used for storing some
internal state. If there is no rounding up to do (because the size is
already a multiple of that chunk size), and if the buffer is overrun as
in the code patched in this commit, the internal state is corrupted.

The scenario that triggered this here bug fix entailed a git.git
checkout with an extra copy of the source code in an untracked
subdirectory, meaning that there was an untracked subdirectory called
"thunderbird-patch-inline" whose name's length is exactly 24 bytes,
which, added to the size of above-mentioned `struct untracked_cache_dir`
that weighs in with 104 bytes on a 64-bit system, amounts to 128,
aligning perfectly with nedmalloc's chunk size.

As there is no obvious way to trigger this bug reliably, on all
platforms supported by Git, and as the bug is obvious enough, this patch
comes without a regression test.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

rev-list: detect broken root treesJeff King Wed, 10 Apr 2019 02:13:25 +0000 (19:13 -0700)

rev-list: detect broken root trees

When the traversal machinery sees a commit without a root tree, it
assumes that the tree was part of a BOUNDARY commit, and quietly ignores
the tree. But it could also be caused by a commit whose root tree is
broken or missing.

Instead, let's die() when we see a NULL root tree. We can differentiate
it from the BOUNDARY case by seeing if the commit was actually parsed.
This covers that case, plus future-proofs us against any others where we
might try to show an unparsed commit.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

rev-list: let traversal die when --missing is not in useJeff King Wed, 10 Apr 2019 02:13:23 +0000 (19:13 -0700)

rev-list: let traversal die when --missing is not in use

Commit 7c0fe330d5 (rev-list: handle missing tree objects properly,
2018-10-05) taught the traversal machinery used by git-rev-list to
ignore missing trees, so that rev-list could handle them itself.

However, it does so only by checking via oid_object_info_extended() that
the object exists at all. This can miss several classes of errors that
were previously detected by rev-list:

- type mismatches (e.g., we expected a tree but got a blob)

- failure to read the object data (e.g., due to bitrot on disk)

This is especially important because we use "rev-list --objects" as our
connectivity check to admit new objects to the repository, and it will
now miss these cases (though the bitrot one is less important here,
because we'd typically have just hashed and stored the object).

There are a few options to fix this:

1. we could check these properties in rev-list when we do the existence
check. This is probably too expensive in practice (perhaps even for
a type check, but definitely for checking the whole content again,
which implies loading each object into memory twice).

2. teach the traversal machinery to differentiate between a missing
object, and one that could not be loaded as expected. This probably
wouldn't be too hard to detect type mismatches, but detecting bitrot
versus a truly missing object would require deep changes to the
object-loading code.

3. have the traversal machinery communicate the failure to the caller,
so that it can decide how to proceed without re-evaluting the object
itself.

Of those, I think (3) is probably the best path forward. However, this
patch does none of them. In the name of expediently fixing the
regression to a normal "rev-list --objects" that we use for connectivity
checks, this simply restores the pre-7c0fe330d5 behavior of having the
traversal die as soon as it fails to load a tree (when --missing is set
to MA_ERROR, which is the default).

Note that we can't get rid of the object-existence check in
finish_object(), because this also handles blobs (which are not
otherwise checked at all by the traversal code).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

get_commit_tree(): return NULL for broken treeJeff King Wed, 10 Apr 2019 02:13:20 +0000 (19:13 -0700)

get_commit_tree(): return NULL for broken tree

Return NULL from 'get_commit_tree()' when a commit's root tree is
corrupt, doesn't exist, or points to an object which is not a tree.

In [1], this situation became a BUG(), but it can certainly occur in
cases which are not a bug in Git, for e.g., if a caller manually crafts
a commit whose tree is corrupt in any of the above ways.

Note that the expect_failure test in t6102 triggers this BUG(), but we
can't flip it to expect_success yet. Solving this problem actually
reveals a second bug.

[1]: 7b8a21dba1 (commit-graph: lazy-load trees for commits, 2018-04-06)

Co-authored-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

list-objects.c: handle unexpected non-tree entriesTaylor Blau Wed, 10 Apr 2019 02:13:19 +0000 (19:13 -0700)

list-objects.c: handle unexpected non-tree entries

Apply similar treatment as the previous commit for non-tree entries,
too.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

list-objects.c: handle unexpected non-blob entriesTaylor Blau Wed, 10 Apr 2019 02:13:17 +0000 (19:13 -0700)

list-objects.c: handle unexpected non-blob entries

Fix one of the cases described in the previous commit where a tree-entry
that is promised to a blob is in fact a non-blob.

When 'lookup_blob()' returns NULL, it is because Git has cached the
requested object as a non-blob. In this case, prevent a SIGSEGV by
'die()'-ing immediately before attempting to dereference the result.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t: introduce tests for unexpected object typesTaylor Blau Wed, 10 Apr 2019 02:13:14 +0000 (19:13 -0700)

t: introduce tests for unexpected object types

Call an object's type "unexpected" when the actual type of an object
does not match Git's contextual expectation. For example, a tree entry
whose mode differs from the object's actual type, or a commit's parent
which is not another commit, and so on.

This can manifest itself in various unfortunate ways, including Git
SIGSEGV-ing under specific conditions. Consider the following example:
Git traverses a blob (say, via `git rev-list`), and then tries to read
out a tree-entry which lists that object as something other than a blob.
In this case, `lookup_blob()` will return NULL, and the subsequent
dereference will result in a SIGSEGV.

Introduce tests that present objects of "unexpected" type in the above
fashion to 'git rev-list'. Mark as failures the combinations that are
already broken (i.e., they exhibit the segfault described above). In the
cases that are not broken (i.e., they have NULL-ness checks or similar),
mark these as expecting success.

We might hit an unexpected type in two different ways (imagine we have a
tree entry that claims to be a tree but actually points to a blob):

- when we call lookup_tree(), we might find that we've already seen
the object referenced as a blob, in which case we'd get NULL. We
can exercise this with "git rev-list --objects $blob $tree", which
guarantees that the blob will have been parsed before we look in
the tree. These tests are marked as "seen" in the test script.

- we call lookup_tree() successfully, but when we try to read the
object, we find out it's something else. We construct our tests
such that $blob is not otherwise mentioned in $tree. These tests
are marked as "lone" in the script.

We should check that we behave sensibly in both cases (especially
because it is easy for a malicious actor to provoke one case or the
other).

Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

add: error appropriately on repository with no commitsKyle Meyer Tue, 9 Apr 2019 23:07:37 +0000 (19:07 -0400)

add: error appropriately on repository with no commits

The previous commit made 'git add' abort when given a repository that
doesn't have a commit checked out. However, the output upon failure
isn't appropriate:

% git add repo
warning: adding embedded git repository: repo
hint: You've added another git repository inside your current repository.
hint: [...]
error: unable to index file 'repo/'
fatal: adding files failed

The hint doesn't apply in this case, and the error message doesn't
tell the user why 'repo' couldn't be added to the index.

Provide better output by teaching add_to_index() to error when given a
git directory where HEAD can't be resolved. To avoid the embedded
repository warning and hint, call check_embedded_repo() only after
add_file_to_index() succeeds because, in general, its output doesn't
make sense if adding to the index fails.

Signed-off-by: Kyle Meyer <kyle@kyleam.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

dir: do not traverse repositories with no commitsKyle Meyer Tue, 9 Apr 2019 23:07:36 +0000 (19:07 -0400)

dir: do not traverse repositories with no commits

When treat_directory() encounters a directory that is not in the index
and DIR_NO_GITLINKS is unset, it calls resolve_gitlink_ref() to decide
if a directory looks like a repository, in which case the directory
won't be traversed. As a result, 'status -uall' and 'ls-files -o'
will show only the directory, even when there are untracked files
within the directory.

For the unusual case where a repository doesn't have a commit checked
out, resolve_gitlink_ref() returns -1 because HEAD cannot be resolved,
and the directory is treated as a normal directory (i.e. traversal
does not stop at the repository boundary). The status and ls-files
commands above list untracked files within the repository rather than
showing only the top-level directory. And if 'git add' is called on a
repository with no commit checked out, any untracked files under the
repository are added as blobs in the top-level project, a behavior
that is unlikely to be what the caller intended.

The above case is a corner case in an already unusual situation of the
working tree containing a repository that is not a tracked submodule,
but we might as well treat anything that looks like a repository
consistently. Loosen the "looks like a repository" criteria in
treat_directory() by replacing resolve_gitlink_ref() with
is_nonbare_repository_dir(), one of the checks that is performed
downstream when resolve_gitlink_ref() is called.

As the required update to t3700-add shows, calling 'git add' on a
repository with no commit checked out will now raise an error. While
this is the desired behavior, note that the output isn't yet
appropriate. The next commit will improve this output.

Signed-off-by: Kyle Meyer <kyle@kyleam.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

submodule: refuse to add repository with no commitsKyle Meyer Tue, 9 Apr 2019 23:07:35 +0000 (19:07 -0400)

submodule: refuse to add repository with no commits

When the path given to 'git submodule add' is an existing repository
that is not in the index, the repository is passed to 'git add'. If
this repository doesn't have a commit checked out, we don't get a
useful result: there is no subproject OID to track, and any untracked
files in the sub-repository are added as blobs in the top-level
repository.

To avoid getting into this state, abort if the path is a repository
that doesn't have a commit checked out. Note that this check must
come before the 'git add --dry-run' check because the next commit will
make 'git add' fail when given a repository that doesn't have a commit
checked out.

Signed-off-by: Kyle Meyer <kyle@kyleam.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

submodule: teach set-branch subcommandDenton Liu Fri, 8 Feb 2019 11:21:34 +0000 (03:21 -0800)

submodule: teach set-branch subcommand

This teaches git-submodule the set-branch subcommand which allows the
branch of a submodule to be set through a porcelain command without
having to manually manipulate the .gitmodules file.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Documentation/git-show-branch: avoid literal {apostrophe}Todd Zullinger Wed, 10 Apr 2019 00:37:33 +0000 (20:37 -0400)

Documentation/git-show-branch: avoid literal {apostrophe}

The {apostrophe} was needed at the time of a521845800 ("Documentation:
remove stray backslash in show-branch discussion", 2010-08-20). All
other uses of {apostrophe} were removed in 6cf378f0cb ("docs: stop using
asciidoc no-inline-literal", 2012-04-26).

Unfortunately, the {apostrophe} is rendered literally with Asciidoctor
(at least with 1.5.5-2.0.3). Avoid this by using single-quotes.

Escaping the leading single-quote allows the content to render properly
in AsciiDoc and Asciidoctor.

Signed-off-by: Todd Zullinger <tmz@pobox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Documentation/git-svn: improve asciidoctor compatibilityTodd Zullinger Wed, 10 Apr 2019 00:37:34 +0000 (20:37 -0400)

Documentation/git-svn: improve asciidoctor compatibility

The second paragraph in the CONFIGURATION section intends to emphasize
the word 'must' with bold type. It does so by writing it as *must*, and
this works fine with AsciiDoc. It usually works great with Asciidoctor,
too, but in this particular instance, we have another "*" earlier in the
paragraph. We do escape it, and it is rendered literally just like we
want it to, but Asciidoctor then ends up tripping on the second (or
third) of the asterisks in this paragraph.

Since that asterisk is (part of) a literal example, we can set it in
monospace, by giving it as `*`. Adjust the whole paragraph in this way.
There's lots more monospacing to be done in this document, but since our
main motivation is addressing AsciiDoc/Asciidoctor discrepancies like
this one, let's just convert this one paragraph.

Signed-off-by: Todd Zullinger <tmz@pobox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

The fourth batchJunio C Hamano Tue, 9 Apr 2019 17:19:09 +0000 (02:19 +0900)

The fourth batch

Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge branch 'jt/submodule-fetch-errmsg'Junio C Hamano Tue, 9 Apr 2019 17:14:26 +0000 (02:14 +0900)

Merge branch 'jt/submodule-fetch-errmsg'

Error message update.

* jt/submodule-fetch-errmsg:
submodule: explain first attempt failure clearly

Merge branch 'jk/sha1dc'Junio C Hamano Tue, 9 Apr 2019 17:14:26 +0000 (02:14 +0900)

Merge branch 'jk/sha1dc'

Build update for SHA-1 with collision detection.

* jk/sha1dc:
Makefile: fix unaligned loads in sha1dc with UBSan

Merge branch 'jk/promote-ggg'Junio C Hamano Tue, 9 Apr 2019 17:14:25 +0000 (02:14 +0900)

Merge branch 'jk/promote-ggg'

Suggest GitGitGadget instead of submitGit as a way to submit
patches based on GitHub PR to us.

* jk/promote-ggg:
point pull requesters to GitGitGadget

Merge branch 'ar/t4150-remove-cruft'Junio C Hamano Tue, 9 Apr 2019 17:14:25 +0000 (02:14 +0900)

Merge branch 'ar/t4150-remove-cruft'

Test cleanup.

* ar/t4150-remove-cruft:
t4150: remove unused variable

Merge branch 'js/rebase-deprecate-preserve-merges'Junio C Hamano Tue, 9 Apr 2019 17:14:24 +0000 (02:14 +0900)

Merge branch 'js/rebase-deprecate-preserve-merges'

"git rebase --rebase-merges" replaces its old "--preserve-merges"
option; the latter is now marked as deprecated.

* js/rebase-deprecate-preserve-merges:
rebase: deprecate --preserve-merges

Merge branch 'ms/worktree-add-atomic-mkdir'Junio C Hamano Tue, 9 Apr 2019 17:14:24 +0000 (02:14 +0900)

Merge branch 'ms/worktree-add-atomic-mkdir'

"git worktree add" used to do a "find an available name with stat
and then mkdir", which is race-prone. This has been fixed by using
mkdir and reacting to EEXIST in a loop.

* ms/worktree-add-atomic-mkdir:
worktree: fix worktree add race

Merge branch 'jk/line-log-with-patch'Junio C Hamano Tue, 9 Apr 2019 17:14:23 +0000 (02:14 +0900)

Merge branch 'jk/line-log-with-patch'

"git log -L<from>,<to>:<path>" with "-s" did not suppress the patch
output as it should. This has been corrected.

* jk/line-log-with-patch:
line-log: detect unsupported formats
line-log: suppress diff output with "-s"

Merge branch 'ra/t3600-test-path-funcs'Junio C Hamano Tue, 9 Apr 2019 17:14:23 +0000 (02:14 +0900)

Merge branch 'ra/t3600-test-path-funcs'

A GSoC micro.

* ra/t3600-test-path-funcs:
t3600: use helpers to replace test -d/f/e/s <path>
t3600: modernize style
test functions: add function `test_file_not_empty`

Merge branch 'nd/rewritten-ref-is-per-worktree'Junio C Hamano Tue, 9 Apr 2019 17:14:23 +0000 (02:14 +0900)

Merge branch 'nd/rewritten-ref-is-per-worktree'

"git rebase" uses the refs/rewritten/ hierarchy to store its
intermediate states, which inherently makes the hierarchy per
worktree, but it didn't quite work well.

* nd/rewritten-ref-is-per-worktree:
Make sure refs/rewritten/ is per-worktree
files-backend.c: reduce duplication in add_per_worktree_entries_to_dir()
files-backend.c: factor out per-worktree code in loose_fill_ref_dir()

Merge branch 'jh/resize-convert-scratch-buffer'Junio C Hamano Tue, 9 Apr 2019 17:14:22 +0000 (02:14 +0900)

Merge branch 'jh/resize-convert-scratch-buffer'

When the "clean" filter can reduce the size of a huge file in the
working tree down to a small "token" (a la Git LFS), there is no
point in allocating a huge scratch area upfront, but the buffer is
sized based on the original file size. The convert mechanism now
allocates very minimum and reallocates as it receives the output
from the clean filter process.

* jh/resize-convert-scratch-buffer:
convert: avoid malloc of original file size

Merge branch 'dl/ignore-docs'Junio C Hamano Tue, 9 Apr 2019 17:14:22 +0000 (02:14 +0900)

Merge branch 'dl/ignore-docs'

Doc update.

* dl/ignore-docs:
docs: move core.excludesFile from git-add to gitignore
git-clean.txt: clarify ignore pattern files

Merge branch 'ja/dir-rename-doc-markup-fix'Junio C Hamano Tue, 9 Apr 2019 17:14:21 +0000 (02:14 +0900)

Merge branch 'ja/dir-rename-doc-markup-fix'

Doc update.

* ja/dir-rename-doc-markup-fix:
Doc: fix misleading asciidoc formating

Merge branch 'dl/reset-doc-no-wrt-abbrev'Junio C Hamano Tue, 9 Apr 2019 17:14:20 +0000 (02:14 +0900)

Merge branch 'dl/reset-doc-no-wrt-abbrev'

Doc update.

* dl/reset-doc-no-wrt-abbrev:
git-reset.txt: clarify documentation

MSVC: include compat/win32/path-utils.h for MSVC, too... Sven Strickroth Mon, 8 Apr 2019 11:26:16 +0000 (13:26 +0200)

MSVC: include compat/win32/path-utils.h for MSVC, too, for real_path()

A path such as 'c:/somepath/submodule/../.git/modules/submodule' wasn't
resolved correctly any more, because the *nix variant of offset_1st_component
is used instead of the Win32 specific version.

Regression was introduced in commit 1cadad6f6 when mingw_offset_1st_component
was moved from mingw.c which is included by msvc.c to a separate file. Then,
the new file "compat/win32/path-utils.h" was only included for the __CYGWIN__
and __MINGW32__ cases in git-compat-util.h, the case for _MSC_VER was missing.

Signed-off-by: Sven Strickroth <email@cs-ware.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t3301: fix false negativeJohannes Schindelin Tue, 9 Apr 2019 10:41:23 +0000 (03:41 -0700)

t3301: fix false negative

In 6956f858f6 (notes: implement helpers needed for note copying during
rewrite, 2010-03-12), we introduced a test case that verifies that the
config setting `notes.rewriteRef` can be overridden via the environment
variable `GIT_NOTES_REWRITE_REF`.

Back when it was introduced, it relied on a side effect of an earlier
test case that configured `core.noteRef` to point to `refs/notes/other`.

In 908a320363 (t3301: modernize style, 2014-11-12), this side effect was
removed.

The test case *still* passed, but for the wrong reason: we no longer
overrode the rewrite ref, but there simply was nothing to rewrite
anymore, as the overridden notes ref was "modernized" away.

Let's let that test case pass for the correct reason again.

To make sure of that, let's change the idea of the original test case:
it configured `notes.rewriteRef` to point to the actual notes ref,
forced that to be ignored and then verified that the notes were *not*
rewritten.

By turning that idea upside down (configure the `notes.rewriteRef` to
another notes ref, override it via the environment variable to force the
notes to be copied, and then verify that the notes *were* rewritten), we
make it much harder for that test case to pass for the wrong reason.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

refs.c: remove the_repo from read_ref_at()Nguyễn Thái Ngọc Duy Sat, 6 Apr 2019 11:34:30 +0000 (18:34 +0700)

refs.c: remove the_repo from read_ref_at()

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

refs.c: add repo_dwim_log()Nguyễn Thái Ngọc Duy Sat, 6 Apr 2019 11:34:29 +0000 (18:34 +0700)

refs.c: add repo_dwim_log()

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

refs.c: add repo_dwim_ref()Nguyễn Thái Ngọc Duy Sat, 6 Apr 2019 11:34:28 +0000 (18:34 +0700)

refs.c: add repo_dwim_ref()

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

refs.c: remove the_repo from expand_ref()Nguyễn Thái Ngọc Duy Sat, 6 Apr 2019 11:34:27 +0000 (18:34 +0700)

refs.c: remove the_repo from expand_ref()

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

refs.c: remove the_repo from substitute_branch_name()Nguyễn Thái Ngọc Duy Sat, 6 Apr 2019 11:34:26 +0000 (18:34 +0700)

refs.c: remove the_repo from substitute_branch_name()

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

refs.c: add refs_shorten_unambiguous_ref()Nguyễn Thái Ngọc Duy Sat, 6 Apr 2019 11:34:25 +0000 (18:34 +0700)

refs.c: add refs_shorten_unambiguous_ref()

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

refs.c: add refs_ref_exists()Nguyễn Thái Ngọc Duy Sat, 6 Apr 2019 11:34:24 +0000 (18:34 +0700)

refs.c: add refs_ref_exists()

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

packfile.c: add repo_approximate_object_count()Nguyễn Thái Ngọc Duy Sat, 6 Apr 2019 11:34:23 +0000 (18:34 +0700)

packfile.c: add repo_approximate_object_count()

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin rebase: use oideq()SZEDER Gábor Sat, 6 Apr 2019 11:34:22 +0000 (18:34 +0700)

builtin rebase: use oideq()

Use oideq() instead of !oidcmp(), as it is more idiomatic, and might
give the compiler more opportunities to optimize.

Patch generated with 'contrib/coccinelle/free.cocci' and Coccinelle
v1.0.7 (previous Coccinelle versions don't notice this).

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin rebase: use FREE_AND_NULLSZEDER Gábor Sat, 6 Apr 2019 11:34:21 +0000 (18:34 +0700)

builtin rebase: use FREE_AND_NULL

Use the macro FREE_AND_NULL to release memory allocated for
'head_name' and clear its pointer.

Patch generated with 'contrib/coccinelle/free.cocci' and Coccinelle
v1.0.7 (previous Coccinelle versions don't notice this).

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

describe doc: remove '7-char' abbreviation referencePhilip Oakley Sat, 6 Apr 2019 13:27:47 +0000 (14:27 +0100)

describe doc: remove '7-char' abbreviation reference

While the minimum is 7-char, the unambiguous length can be longer.

Signed-off-by: Philip Oakley <philipoakley@iee.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

rerere doc: quote `rerere.enabled`Philip Oakley Sat, 6 Apr 2019 13:27:46 +0000 (14:27 +0100)

rerere doc: quote `rerere.enabled`

Signed-off-by: Philip Oakley <philipoakley@iee.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

blame: default to HEAD in a bare repo when no start... SZEDER Gábor Sun, 7 Apr 2019 23:43:27 +0000 (01:43 +0200)

blame: default to HEAD in a bare repo when no start commit is given

When 'git blame' is invoked without specifying the commit to start
blaming from, it starts from the given file's state in the work tree.
However, when invoked in a bare repository without a start commit,
then there is no work tree state to start from, and it dies with the
following error message:

$ git rev-parse --is-bare-repository
true
$ git blame file.c
fatal: this operation must be run in a work tree

This is misleading, because it implies that 'git blame' doesn't work
in bare repositories at all, but it does, in fact, work just fine when
it is given a commit to start from.

We could improve the error message, of course, but let's just default
to HEAD in a bare repository instead, as most likely that is what the
user wanted anyway (if they wanted to start from an other commit, then
they would have specified that in the first place).

'git annotate' is just a thin wrapper around 'git blame', so in the
same situation it printed the same misleading error message, and this
patch fixes it, too.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

ls-files: use correct format stringThomas Gummerer Sun, 7 Apr 2019 18:47:51 +0000 (19:47 +0100)

ls-files: use correct format string

struct stat_data and struct cache_time both use unsigned ints for all
their members. However the format string for 'git ls-files --debug'
currently uses %d for formatting these numbers. This means that we
potentially print these values incorrectly if they are greater than
INT_MAX.

This has been the case since the --debug option was introduced in 'git
ls-files' in 8497421715 ("ls-files: learn a debugging dump format",
2010-07-31).

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

gc docs: remove incorrect reference to gc.auto=0Ævar Arnfjörð Bjarmason Sun, 7 Apr 2019 19:52:17 +0000 (21:52 +0200)

gc docs: remove incorrect reference to gc.auto=0

The chance of a repository being corrupted due to a "gc" has nothing
to do with whether or not that "gc" was invoked via "gc --auto", but
whether there's other concurrent operations happening.

This is already noted earlier in the paragraph, so there's no reason
to suggest this here. The user can infer from the rest of the
documentation that "gc" will run automatically unless gc.auto=0 is
set, and we shouldn't confuse the issue by implying that "gc --auto"
is somehow more prone to produce corruption than a normal "gc".

Well, it is in the sense that a blocking "gc" would stop you from
doing anything else in *that* particular terminal window, but users
are likely to have another window, or to be worried about how
concurrent "gc" on a server might cause corruption.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

gc docs: clarify that "gc" doesn't throw away reference... Ævar Arnfjörð Bjarmason Sun, 7 Apr 2019 19:52:16 +0000 (21:52 +0200)

gc docs: clarify that "gc" doesn't throw away referenced objects

Amend the "NOTES" section to fix up wording that's been with us since
3ffb58be0a ("doc/git-gc: add a note about what is collected",
2008-04-23).

I can't remember when/where anymore (I think Freenode #Git), but at
some point I was having a conversation with someone who was convinced
that "gc" would prune things only referenced by e.g. refs/pull/*, and
pointed to this section as proof.

It turned out that they'd read the "branches and tags" wording here
and thought just refs/{heads,tags}/* and refs/remotes/* etc. would be
kept, which is what we enumerate explicitly.

So let's say "other refs", even though just above we say "objects that
are referenced anywhere in your repository".

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

gc docs: note "gc --aggressive" in "fast-import"Ævar Arnfjörð Bjarmason Sun, 7 Apr 2019 19:52:15 +0000 (21:52 +0200)

gc docs: note "gc --aggressive" in "fast-import"

Amend the "PACKFILE OPTIMIZATION" section in "fast-import" to explain
that simply running "git gc --aggressive" after a "fast-import" should
properly optimize the repository. This is simpler and more effective
than the existing "repack" advice (which I'm keeping as it helps
explain things) because it e.g. also packs the newly imported refs.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

gc docs: downplay the usefulness of --aggressiveÆvar Arnfjörð Bjarmason Sun, 7 Apr 2019 19:52:14 +0000 (21:52 +0200)

gc docs: downplay the usefulness of --aggressive

The existing "gc --aggressive" docs come just short of recommending to
users that they run it regularly. I've personally talked to many users
who've taken these docs as an advice to use this option, and have,
usually it's (mostly) a waste of time.

So let's clarify what it really does, and let the user draw their own
conclusions.

Let's also clarify the "The effects [...] are persistent" to
paraphrase a brief version of Jeff King's explanation at [1].

1. https://public-inbox.org/git/20190318235356.GK29661@sigill.intra.peff.net/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

gc docs: note how --aggressive impacts --window & ... Ævar Arnfjörð Bjarmason Sun, 7 Apr 2019 19:52:13 +0000 (21:52 +0200)

gc docs: note how --aggressive impacts --window & --depth

Since 07e7dbf0db (gc: default aggressive depth to 50, 2016-08-11) we
somewhat confusingly use the same depth under --aggressive as we do by
default.

As noted in that commit that makes sense, it was wrong to make more
depth the default for "aggressive", and thus save disk space at the
expense of runtime performance, which is usually the opposite of
someone who'd like "aggressive gc" wants.

But that's left us with a mostly-redundant configuration variable, so
let's clearly note in its documentation that it doesn't change the
default.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

gc docs: fix formatting for "gc.writeCommitGraph"Ævar Arnfjörð Bjarmason Sun, 7 Apr 2019 19:52:12 +0000 (21:52 +0200)

gc docs: fix formatting for "gc.writeCommitGraph"

Change the AsciiDoc formatting so that an example of "gc --auto" isn't
rendered as "git-gc(1) --auto", but as "git gc --auto". This is
consistent with the rest of the links and command examples in this
documentation.

The formatting I'm changing was initially introduced in
d5d5d7b641 ("gc: automatically write commit-graph files", 2018-06-27).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

gc docs: re-flow the "gc.*" section in "config"Ævar Arnfjörð Bjarmason Sun, 7 Apr 2019 19:52:11 +0000 (21:52 +0200)

gc docs: re-flow the "gc.*" section in "config"

Re-flow the "gc.*" section in "config". A previous commit moved this
over from the "gc" docs, but tried to keep as many of the lines
identical to benefit from diff's move detection.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

gc docs: include the "gc.*" section from "config" in... Ævar Arnfjörð Bjarmason Sun, 7 Apr 2019 19:52:10 +0000 (21:52 +0200)

gc docs: include the "gc.*" section from "config" in "gc"

Rather than duplicating the documentation for the various "gc" options
let's include the "gc" docs from git-config. They were mostly better
already, and now we don't have the same docs in two places with subtly
different wording.

In the cases where the git-gc(1) docs were saying something the "gc"
docs in git-config(1) didn't cover move the relevant section over to
the git-config(1) docs.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

merge-recursive: switch directory rename detection... Elijah Newren Fri, 5 Apr 2019 15:00:26 +0000 (08:00 -0700)

merge-recursive: switch directory rename detection default

When all of x/a, x/b, and x/c have moved to z/a, z/b, and z/c on one
branch, there is a question about whether x/d added on a different
branch should remain at x/d or appear at z/d when the two branches are
merged. There are different possible viewpoints here:

A) The file was placed at x/d; it's unrelated to the other files in
x/ so it doesn't matter that all the files from x/ moved to z/ on
one branch; x/d should still remain at x/d.

B) x/d is related to the other files in x/, and x/ was renamed to z/;
therefore x/d should be moved to z/d.

Since there was no ability to detect directory renames prior to
git-2.18, users experienced (A) regardless of context. Choice (B) was
implemented in git-2.18, with no option to go back to (A), and has been
in use since. However, one user reported that the merge results did not
match their expectations, making the change of default problematic,
especially since there was no notice printed when directory rename
detection moved files.

Note that there is also a third possibility here:

C) There are different answers depending on the context and content
that cannot be determined by git, so this is a conflict. Use a
higher stage in the index to record the conflict and notify the
user of the potential issue instead of silently selecting a
resolution for them.

Add an option for users to specify their preference for whether to use
directory rename detection, and default to (C). Even when directory
rename detection is on, add notice messages about files moved into new
directories.

As a sidenote, x/d did not have to be a new file here; it could have
already existed at some other path and been renamed to x/d, with
directory rename detection just renaming it again to z/d. Thus, it's
not just new files, but also a modification to all rename types (normal
renames, rename/add, rename/delete, rename/rename(1to1),
rename/rename(1to2), and rename/rename(2to1)).

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

merge-recursive: give callers of handle_content_merge... Elijah Newren Fri, 5 Apr 2019 15:00:25 +0000 (08:00 -0700)

merge-recursive: give callers of handle_content_merge() access to contents

Pass a merge_file_info struct to handle_content_merge() so that the
callers can access the oid and mode of the result afterward.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

merge-recursive: track information associated with... Elijah Newren Fri, 5 Apr 2019 15:00:24 +0000 (08:00 -0700)

merge-recursive: track information associated with directory renames

Directory rename detection previously silently applied. In order to
allow printing information about paths that changed or printing a
conflict notification (and only doing so near other potential conflict
messages associated with the paths), save this information inside the
rename struct for later use. A subsequent patch will make use of the
additional information.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t6043: fix copied test description to match its purposeElijah Newren Fri, 5 Apr 2019 15:00:23 +0000 (08:00 -0700)

t6043: fix copied test description to match its purpose

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

merge-recursive: switch from (oid,mode) pairs to a... Elijah Newren Fri, 5 Apr 2019 15:00:22 +0000 (08:00 -0700)

merge-recursive: switch from (oid,mode) pairs to a diff_filespec

There was a significant inconsistency in the various parts of the API
used in merge-recursive; many places used a pair of (oid, mode) to track
file version/contents, while other parts used a diff_filespec (which
have an oid and mode embedded in it). This inconsistency caused lots of
places to need to pack and unpack data to call into other functions.
This has been the subject of some past cleanups (see e.g. commit
0270a07ad0b2 ("merge-recursive: remove final remaining caller of
merge_file_one()", 2018-09-19)), but let's just remove the underlying
mess altogether by switching to use diff_filespec.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

merge-recursive: cleanup handle_rename_* function signa... Elijah Newren Fri, 5 Apr 2019 15:00:21 +0000 (08:00 -0700)

merge-recursive: cleanup handle_rename_* function signatures

Instead of passing various bits and pieces of 'ci', just pass it
directly.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

merge-recursive: track branch where rename occurred... Elijah Newren Fri, 5 Apr 2019 15:00:20 +0000 (08:00 -0700)

merge-recursive: track branch where rename occurred in rename struct

We previously tracked the branch associated with a rename in a separate
field in rename_conflict_info, but since it is directly associated with
the rename it makes more sense to move it into the rename struct.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

merge-recursive: remove ren[12]_other fields from renam... Elijah Newren Fri, 5 Apr 2019 15:00:19 +0000 (08:00 -0700)

merge-recursive: remove ren[12]_other fields from rename_conflict_info

The ren1_other and ren2_other fields were synthesized from information
in ren1->src_entry and ren2->src_entry. Since we already have the
necessary information in ren1 and ren2, just use those.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

merge-recursive: shrink rename_conflict_infoElijah Newren Fri, 5 Apr 2019 15:00:18 +0000 (08:00 -0700)

merge-recursive: shrink rename_conflict_info

The rename_conflict_info struct used both a pair and a stage_data which
were taken from a rename struct. Just use the original rename struct.
This will also allow us to start making other simplifications to the
code.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

merge-recursive: move some struct declarations togetherElijah Newren Fri, 5 Apr 2019 15:00:17 +0000 (08:00 -0700)

merge-recursive: move some struct declarations together

These structs are related and reference each other, so move them
together to make it easier for folks to determine what they hold and
what their purpose is.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

merge-recursive: use 'ci' for rename_conflict_info... Elijah Newren Fri, 5 Apr 2019 15:00:16 +0000 (08:00 -0700)

merge-recursive: use 'ci' for rename_conflict_info variable name

We used a couple different names, but used 'ci' the most. Use the same
variable name throughout for a little extra consistency.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

merge-recursive: rename locals 'o' and 'a' to 'obuf... Elijah Newren Fri, 5 Apr 2019 15:00:15 +0000 (08:00 -0700)

merge-recursive: rename locals 'o' and 'a' to 'obuf' and 'abuf'

Since we want to replace oid,mode pairs with a single diff_filespec,
we will soon want to be able to use the names 'o', 'a', and 'b' for
the three different file versions. Rename some local variables in
blob_unchanged() that would otherwise conflict.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

merge-recursive: rename diff_filespec 'one' to 'o'Elijah Newren Fri, 5 Apr 2019 15:00:14 +0000 (08:00 -0700)

merge-recursive: rename diff_filespec 'one' to 'o'

In the previous commit, we noted that several places throughout merge
recursive both had a reason to use 'o'; some for a merge_options struct,
and others for a diff_filespec struct. Some places had both, forcing
one of the two to be renamed, though the choice was inconsistent. Now
that the merge_options struct has been renamed to 'opt' everywhere, we
can replace the few places that used 'one' for the diff_filespec to 'o'.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

merge-recursive: rename merge_options argument from... Elijah Newren Fri, 5 Apr 2019 15:00:13 +0000 (08:00 -0700)

merge-recursive: rename merge_options argument from 'o' to 'opt'

The name 'o' was used for the merge_options struct pointer taken by many
functions, but in a few places it was named 'opt'. Several functions
that didn't need merge_options instead used 'o' for a diff_filespec
argument or local. Some functions needed both an inconsistently either
renamed the merge_options to 'opt' or the diff_filespec to 'one'. I
want to remove the weird split in the codebase between using a
diff_filespec and a pair of (oid,mode) values in favor of using a
diff_filespec everywhere, but that dramatically increases the number of
cases where we want to use 'o' as a diff_filespec. Rename the
merge_options argument to 'opt' to make room.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Use 'unsigned short' for mode, like diff_filespec doesElijah Newren Fri, 5 Apr 2019 15:00:12 +0000 (08:00 -0700)

Use 'unsigned short' for mode, like diff_filespec does

struct diff_filespec defines mode to be an 'unsigned short'. Several
other places in the API which we'd like to interact with using a
diff_filespec used a plain unsigned (or unsigned int). This caused
problems when taking addresses, so switch to unsigned short.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

diff: batch fetching of missing blobsJonathan Tan Fri, 5 Apr 2019 17:09:34 +0000 (10:09 -0700)

diff: batch fetching of missing blobs

When running a command like "git show" or "git diff" in a partial clone,
batch all missing blobs to be fetched as one request.

This is similar to c0c578b33c ("unpack-trees: batch fetching of missing
blobs", 2017-12-08), but for another command.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t: move 'hex2oct' into test-lib-functions.shTaylor Blau Fri, 5 Apr 2019 03:37:42 +0000 (20:37 -0700)

t: move 'hex2oct' into test-lib-functions.sh

The helper 'hex2oct' is used to convert base-16 encoded data into a
base-8 binary form, and is useful for preparing data for commands that
accept input in a binary format, such as 'git hash-object', via
'printf'.

This helper is defined identically in three separate places throughout
't'. Move the definition to test-lib-function.sh, so that it can be used
in new test suites, and its definition is not redundant.

This will likewise make our job easier in the subsequent commit, which
also uses 'hex2oct'.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

progress: assemble percentage and counters in a strbuf... SZEDER Gábor Fri, 5 Apr 2019 00:45:37 +0000 (02:45 +0200)

progress: assemble percentage and counters in a strbuf before printing

The following patches in this series want to handle the progress bar's
title and changing parts (i.e. the counter and the optional percentage
and throughput combined) differently, and need to know the length
of the changing parts of the previously displayed progress bar.

To prepare for those changes assemble the changing parts in a separate
strbuf kept in 'struct progress' before printing.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

progress: make display_progress() return voidSZEDER Gábor Fri, 5 Apr 2019 00:45:36 +0000 (02:45 +0200)

progress: make display_progress() return void

Ever since the progress infrastructure was introduced in 96a02f8f6d
(common progress display support, 2007-04-18), display_progress() has
returned an int, telling callers whether it updated the progress bar
or not. However, this is:

- useless, because over the last dozen years there has never been a
single caller that cared about that return value.

- not quite true, because it doesn't print a progress bar when
running in the background, yet it returns 1; see 85cb8906f0
(progress: no progress in background, 2015-04-13).

The related display_throughput() function returned void already upon
its introduction in cf84d51c43 (add throughput to progress display,
2007-10-30).

Let's make display_progress() return void, too. While doing so
several return statements in display() become unnecessary, remove
them.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

tag: fix formattingDenton Liu Thu, 4 Apr 2019 18:25:13 +0000 (11:25 -0700)

tag: fix formatting

Wrap usage line at '<tagname>'. Also, wrap strings with '\n' at the end
of string fragments instead of at the beginning of the next string
fragment.

Convert a space-indent into a tab-indent for style.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

ci: fix AsciiDoc/Asciidoctor stderr check in the docume... SZEDER Gábor Fri, 29 Mar 2019 12:35:20 +0000 (13:35 +0100)

ci: fix AsciiDoc/Asciidoctor stderr check in the documentation build job

In 'ci/test-documentation.sh' we save the standard error of 'make
doc', and, in an attempt to make sure that neither AsciiDoc nor
Asciidoctor printed any warnings, we check the emptiness of the
resulting file with '! test -s stderr.log'. This check has never
actually worked, because in our 'ci/*' build scripts we rely on 'set
-e' aborting the build job when a command exits with error, and,
unfortunately, the combination of the two doesn't work as intended.
According to POSIX [1]:

"The -e setting shall be ignored when executing [...] a pipeline
beginning with the ! reserved word" [2]

Watch and learn:

$ echo unexpected >file
$ ( set -e; ! test -s file ; echo "should not reach this" ) ; echo $?
should not reach this
0

This is why we haven't noticed the warnings from Asciidoctor that were
fixed in the first patches of this patch series, though some of them
were already there in the build of v2.18.0-rc0 [3].

Check the emptiness of that file with 'test ! -s' instead, which works
properly with 'set -e':

$ ( set -e; test ! -s file ; echo "should not reach this" ) ; echo $?
1

Furthermore, dump the contents of that file to the log for our
convenience, so if it were to unexpectedly end up being non-empty,
then we wouldn't have to scroll through all that long build log
looking for warnings, but could see them right away near the end of
the log.

Note that we are only really interested in the standard error of
AsciiDoc and Asciidoctor, but by saving the stderr of 'make doc' we
also save any error output from the make rules. Currently there is
only one such line: we build the docs with Asciidoctor right after a
'make clean', meaning that 'make USE_ASCIIDOCTOR=1 doc' always starts
with running 'GIT-VERSION-GEN', which in turn prints the version to
stderr. A 'sed' command was supposed to remove this version line to
prevent it from triggering that (previously defunct) emptiness check,
but, unfortunately, this command doesn't work as intended, either,
because it leaves the file to be checked intact, but that defunct
emptiness check hid this issue, too... Furthermore, in the near
future there will be an other line on stderr, because commit
9a71722b4d (Doc: auto-detect changed build flags, 2019-03-17) in the
currently cooking branch 'ma/doc-diff-doc-vs-doctor-comparison' will
print "* new asciidoc flags" at the beginning of both 'make doc'
invokations.

Extend that 'sed' command to remove this line, too, wrap it in a
helper function so the output of both 'make doc' is filtered the same
way, and change its invokation to actually write the logfile to be
checked.

[1] http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#set

[2] POSIX doesn't discuss the meaning of '! cmd' in case of simple
commands, but it defines that "A pipeline is a sequence of one or
more commands separated by the control operator '|'", so
apparently a simple command is considered as pipeline as well.

http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_09_02

[3] https://travis-ci.org/git/git/jobs/385932007#L1463

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

ci: stick with Asciidoctor v1.5.8 for nowSZEDER Gábor Fri, 29 Mar 2019 19:52:46 +0000 (20:52 +0100)

ci: stick with Asciidoctor v1.5.8 for now

The recent release of Asciidoctor v2.0.0 broke our documentation
build job on Travis CI, where we 'gem install asciidoctor', which
always brings us the latest and (supposedly) greatest. Alas, we are
not ready for that just yet, because it removed support for DocBook
4.5, and we have been requiring that particular DocBook version to
build 'user-manual.xml' with Asciidoctor, resulting in:

ASCIIDOC user-manual.xml
asciidoctor: FAILED: missing converter for backend 'docbook45'. Processing aborted.
Use --trace for backtrace
make[1]: *** [user-manual.xml] Error 1

Unfortunately, we can't simply switch to DocBook 5 right away, as
doing so leads to validation errors from 'xmlto', and working around
those leads to yet another errors... [1]

So let's stick with Asciidoctor v1.5.8 (latest stable release before
v2.0.0) in our documentation build job on Travis CI for now, until we
figure out how to deal with the fallout from Asciidoctor v2.0.0.

[1] https://public-inbox.org/git/20190324162131.GL4047@pobox.com/

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

send-email: don't cc *-by lines with '-' prefixBaruch Siach Sat, 16 Mar 2019 19:26:50 +0000 (21:26 +0200)

send-email: don't cc *-by lines with '-' prefix

Since commit ef0cc1df90f6b ("send-email: also pick up cc addresses from
-by trailers") in git version 2.20, git send-email adds to cc list
addresses from all *-by lines. As a side effect a line with
'-Signed-off-by' is now also added to cc. This makes send-email pick
lines from patches that remove patch files from the git repo. This is
common in the Buildroot project that often removes (and adds) patch
files that have 'Signed-off-by' in their patch description part.

Consider only *-by lines that start with [a-z] (case insensitive) to
avoid unrelated addresses in cc.

Cc: Joe Perches <joe@perches.com>
Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Acked-by: Rasmus Villemoes <rv@rasmusvillemoes.dk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

cocci: FLEX_ALLOC_MEM to FLEX_ALLOC_STRDenton Liu Wed, 3 Apr 2019 22:00:06 +0000 (15:00 -0700)

cocci: FLEX_ALLOC_MEM to FLEX_ALLOC_STR

Ensure that a FLEX_MALLOC_MEM that uses 'strlen' for its 'len' uses
FLEX_ALLOC_STR instead, since these are equivalent forms.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

midx.c: convert FLEX_ALLOC_MEM to FLEX_ALLOC_STRDenton Liu Wed, 3 Apr 2019 22:00:05 +0000 (15:00 -0700)

midx.c: convert FLEX_ALLOC_MEM to FLEX_ALLOC_STR

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

revision: use a prio_queue to hold rewritten parentsJeff King Thu, 4 Apr 2019 01:41:09 +0000 (21:41 -0400)

revision: use a prio_queue to hold rewritten parents

This patch fixes a quadratic list insertion in rewrite_one() when
pathspec limiting is combined with --parents. What happens is something
like this:

1. We see that some commit X touches the path, so we try to rewrite
its parents.

2. rewrite_one() loops forever, rewriting parents, until it finds a
relevant parent (or hits the root and decides there are none). The
heavy lifting is done by process_parent(), which uses
try_to_simplify_commit() to drop parents.

3. process_parent() puts any intermediate parents into the
&revs->commits list, inserting by commit date as usual.

So if commit X is recent, and then there's a large chunk of history that
doesn't touch the path, we may add a lot of commits to &revs->commits.
And insertion by commit date is O(n) in the worst case, making the whole
thing quadratic.

We tried to deal with this long ago in fce87ae538 (Fix quadratic
performance in rewrite_one., 2008-07-12). In that scheme, we cache the
oldest commit in the list; if the new commit to be added is older, we
can start our linear traversal there. This often works well in practice
because parents are older than their descendants, and thus we tend to
add older and older commits as we traverse.

But this isn't guaranteed, and in fact there's a simple case where it is
not: merges. Imagine we look at the first parent of a merge and see a
very old commit (let's say 3 years old). And on the second parent, as we
go back 3 years in history, we might have many commits. That one
first-parent commit has polluted our oldest-commit cache; it will remain
the oldest while we traverse a huge chunk of history, during which we
have to fall back to the slow, linear method of adding to the list.

Naively, one might imagine that instead of caching the oldest commit,
we'd start at the last-added one. But that just makes some cases faster
while making others slower (and indeed, while it made a real-world test
case much faster, it does quite poorly in the perf test include here).
Fundamentally, these are just heuristics; our worst case is still
quadratic, and some cases will approach that.

Instead, let's use a data structure with better worst-case performance.
Swapping out revs->commits for something else would have repercussions
all over the code base, but we can take advantage of one fact: for the
rewrite_one() case, nobody actually needs to see those commits in
revs->commits until we've finished generating the whole list.

That leaves us with two obvious options:

1. We can generate the list _unordered_, which should be O(n), and
then sort it afterwards, which would be O(n log n) total. This is
"sort-after" below.

2. We can insert the commits into a separate data structure, like a
priority queue. This is "prio-queue" below.

I expected that sort-after would be the fastest (since it saves us the
extra step of copying the items into the linked list), but surprisingly
the prio-queue seems to be a bit faster.

Here are timings for the new p0001.6 for all three techniques across a
few repositories, as compared to master:

master cache-last sort-after prio-queue
--------------------------------------------------------------------------------------------
GIT_PERF_REPO=git.git
0.52(0.50+0.02) 0.53(0.51+0.02) +1.9% 0.37(0.33+0.03) -28.8% 0.37(0.32+0.04) -28.8%

GIT_PERF_REPO=linux.git
20.81(20.74+0.07) 20.31(20.24+0.07) -2.4% 0.94(0.86+0.07) -95.5% 0.91(0.82+0.09) -95.6%

GIT_PERF_REPO=llvm-project.git
83.67(83.57+0.09) 4.23(4.15+0.08) -94.9% 3.21(3.15+0.06) -96.2% 2.98(2.91+0.07) -96.4%

A few items to note:

- the cache-list tweak does improve the bad case for llvm-project.git
that started my digging into this problem. But it performs terribly
on linux.git, barely helping at all.

- the sort-after and prio-queue techniques work well. They approach
the timing for running without --parents at all, which is what you'd
expect (see below for more data).

- prio-queue just barely outperforms sort-after. As I said, I'm not
really sure why this is the case, but it is. You can see it even
more prominently in this real-world case on llvm-project.git:

git rev-list --parents 07ef786652e7 -- llvm/test/CodeGen/Generic/bswap.ll

where prio-queue routinely outperforms sort-after by about 7%. One
guess is that the prio-queue may just be more efficient because it
uses a compact array.

There are three new perf tests:

- "rev-list --parents" gives us a baseline for running with --parents.
This isn't sped up meaningfully here, because the bad case is
triggered only with simplification. But it's good to make sure we
don't screw it up (now, or in the future).

- "rev-list -- dummy" gives us a baseline for just traversing with
pathspec limiting. This gives a lower bound for the next test (and
it's also a good thing for us to be checking in general for
regressions, since we don't seem to have any existing tests).

- "rev-list --parents -- dummy" shows off the problem (and our fix)

Here are the timings for those three on llvm-project.git, before and
after the fix:

Test master prio-queue
------------------------------------------------------------------------------
0001.3: rev-list --parents 2.24(2.12+0.12) 2.22(2.11+0.11) -0.9%
0001.5: rev-list -- dummy 2.89(2.82+0.07) 2.92(2.89+0.03) +1.0%
0001.6: rev-list --parents -- dummy 83.67(83.57+0.09) 2.98(2.91+0.07) -96.4%

Changes in the first two are basically noise, and you can see we
approach our lower bound in the final one.

Note that we can't fully get rid of the list argument from
process_parents(). Other callers do have lists, and it would be hard to
convert them. They also don't seem to have this problem (probably
because they actually remove items from the list as they loop, meaning
it doesn't grow so large in the first place). So this basically just
drops the "cache_ptr" parameter (which was used only by the one caller
we're fixing here) and replaces it with a prio_queue. Callers are free
to use either data structure, depending on what they're prepared to
handle.

Reported-by: Björn Pettersson A <bjorn.a.pettersson@ericsson.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

contrib/completion: add smerge to the mergetool complet... David Aguilar Thu, 4 Apr 2019 07:34:39 +0000 (00:34 -0700)

contrib/completion: add smerge to the mergetool completion candidates

Signed-off-by: David Aguilar <davvid@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>