gitweb.git
Merge branch 'rj/submodule-fsck-skip'Junio C Hamano Tue, 24 Jul 2018 21:50:42 +0000 (14:50 -0700)

Merge branch 'rj/submodule-fsck-skip'

"fsck.skipList" did not prevent a blob object listed there from
being inspected for is contents (e.g. we recently started to
inspect the contents of ".gitmodules" for certain malicious
patterns), which has been corrected.

* rj/submodule-fsck-skip:
fsck: check skiplist for object in fsck_blob()

pack-protocol: mention and point to docs for protocol v2Brandon Williams Mon, 23 Jul 2018 17:48:07 +0000 (10:48 -0700)

pack-protocol: mention and point to docs for protocol v2

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

strbuf_humanise: use unsigned variablesJeff King Tue, 24 Jul 2018 10:52:29 +0000 (06:52 -0400)

strbuf_humanise: use unsigned variables

All of the numeric formatting done by this function uses
"%u", but we pass in a signed "int". The actual range
doesn't matter here, since the conditional makes sure we're
always showing reasonably small numbers. And even gcc's
format-checker does not seem to mind. But it's potentially
confusing to a reader of the code to see the mismatch.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

pass st.st_size as hint for strbuf_readlink()Jeff King Tue, 24 Jul 2018 10:51:39 +0000 (06:51 -0400)

pass st.st_size as hint for strbuf_readlink()

When we initially added the strbuf_readlink() function in
b11b7e13f4 (Add generic 'strbuf_readlink()' helper function,
2008-12-17), the point was that we generally have a _guess_
as to the correct size based on the stat information, but we
can't necessarily trust it.

Over the years, a few callers have grown up that simply pass
in 0, even though they have the stat information. Let's have
them pass in their hint for consistency (and in theory
efficiency, since it may avoid an extra resize/syscall loop,
but neither location is probably performance critical).

Note that st.st_size is actually an off_t, so in theory we
need xsize_t() here. But none of the other callsites use it,
and since this is just a hint, it doesn't matter either way
(if we wrap we'll simply start with a too-small hint and
then eventually complain when we cannot allocate the
memory).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

strbuf_readlink: use ssize_tJeff King Tue, 24 Jul 2018 10:51:25 +0000 (06:51 -0400)

strbuf_readlink: use ssize_t

The return type of readlink() is ssize_t, not int. This
probably doesn't matter in practice, as it would require a
2GB symlink destination, but it doesn't hurt to be careful.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

strbuf: use size_t for length in intermediate variablesJeff King Tue, 24 Jul 2018 10:51:08 +0000 (06:51 -0400)

strbuf: use size_t for length in intermediate variables

A few strbuf functions store the length of a strbuf in a
temporary variable. We should always use size_t for this, as
it's possible for a strbuf to exceed an "int" (e.g., a 2GB
string on a 64-bit system). This is unlikely in practice,
but we should try to behave sensibly on silly or malicious
input.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

reencode_string: use size_t for string lengthsJeff King Tue, 24 Jul 2018 10:50:33 +0000 (06:50 -0400)

reencode_string: use size_t for string lengths

The iconv interface takes a size_t, which is the appropriate
type for an in-memory buffer. But our reencode_string_*
functions use integers, meaning we may get confusing results
when the sizes exceed INT_MAX. Let's use size_t
consistently.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

reencode_string: use st_add/st_mult helpersJeff King Tue, 24 Jul 2018 10:50:10 +0000 (06:50 -0400)

reencode_string: use st_add/st_mult helpers

When converting a string with iconv, if the output buffer
isn't big enough, we grow it. But our growth is done without
any concern for integer overflow. So when we add:

outalloc = sofar + insz * 2 + 32;

we may end up wrapping outalloc (which is a size_t), and
allocating a too-small buffer. We then manipulate it
further:

outsz = outalloc - sofar - 1;

and feed outsz back to iconv. If outalloc is wrapped and
smaller than sofar, we'll end up with a small allocation but
feed a very large outsz to iconv, which could result in it
overflowing the buffer.

Can we use this to construct an attack wherein the victim
clones a repository with a very large commit object with an
encoding header, and running "git log" reencodes it into
utf8, causing an overflow?

An attack of this sort is likely impossible in practice.
"sofar" is how many output bytes we've written total, and
"insz" is the number of input bytes remaining. Imagine our
input doubles in size as we output it (which is easy to do
by converting latin1 to utf8, for example), and that we
start with N input bytes. Our initial output buffer also
starts at N bytes, so after the first call we'd have N/2
input bytes remaining (insz), and have written N bytes
(sofar). That means our next allocation will be
(N + N/2 * 2 + 32) bytes, or (2N + 32).

We can therefore overflow a 32-bit size_t with a commit
message that's just under 2^31 bytes, assuming it consists
mostly of "doubling" sequences (e.g., latin1 0xe1 which
becomes utf8 0xc3 0xa1).

But we'll never make it that far with such a message. We'll
be spending 2^31 bytes on the original string. And our
initial output buffer will also be 2^31 bytes. Which is not
going to succeed on a system with a 32-bit size_t, since
there will be other things using the address space, too. The
initial malloc will fail.

If we imagine instead that we can triple the size when
converting, then our second allocation becomes
(N + 2/3N * 2 + 32), or (7/3N + 32). That still requires two
allocations of 3/7 of our address space (6/7 of the total)
to succeed.

If we imagine we can quadruple, it becomes (5/2N + 32); we
need to be able to allocate 4/5 of the address space to
succeed.

This might start to get plausible. But is it possible to get
a 4-to-1 increase in size? Probably if you're converting to
some obscure encoding. But since git defaults to utf8 for
its output, that's the likely destination encoding for an
attack. And while there are 4-character utf8 sequences, it's
unlikely that you'd be able find a single-byte source
sequence in any encoding.

So this is certainly buggy code which should be fixed, but
it is probably not a useful attack vector.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge branch 'sb/blame-color' into jk/banned-functionJunio C Hamano Fri, 20 Jul 2018 21:42:53 +0000 (14:42 -0700)

Merge branch 'sb/blame-color' into jk/banned-function

* sb/blame-color:
blame: prefer xsnprintf to strcpy for colors

fetch: send "refs/tags/" prefix upon CLI refspecsJonathan Tan Tue, 5 Jun 2018 21:40:36 +0000 (14:40 -0700)

fetch: send "refs/tags/" prefix upon CLI refspecs

When performing tag following, in addition to using the server's
"include-tag" capability to send tag objects (and emulating it if the
server does not support that capability), "git fetch" relies upon the
presence of refs/tags/* entries in the initial ref advertisement to
locally create refs pointing to the aforementioned tag objects. When
using protocol v2, refs/tags/* entries in the initial ref advertisement
may be suppressed by a ref-prefix argument, leading to the tag object
being downloaded, but the ref not being created.

Commit dcc73cf7ff ("fetch: generate ref-prefixes when using a configured
refspec", 2018-05-18) ensured that "refs/tags/" is always sent as a ref
prefix when "git fetch" is invoked with no refspecs, but not when "git
fetch" is invoked with refspecs. Extend that functionality to make it
work in both situations.

This also necessitates a change another test which tested ref
advertisement filtering using tag refs - since tag refs are sent by
default now, the test has been switched to using branch refs instead.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t5702: test fetch with multiple refspecs at a timeJonathan Tan Tue, 5 Jun 2018 21:40:35 +0000 (14:40 -0700)

t5702: test fetch with multiple refspecs at a time

Extend the protocol v2 tests to also test fetches with multiple refspecs
specified. This also covers the previously uncovered cases of fetching
with prefix matching and fetching by SHA-1.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

fetch-pack: mark die strings for translationBrandon Williams Mon, 23 Jul 2018 17:56:35 +0000 (10:56 -0700)

fetch-pack: mark die strings for translation

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

format-patch: allow --interdiff to apply to a lone... Eric Sunshine Sun, 22 Jul 2018 09:57:09 +0000 (05:57 -0400)

format-patch: allow --interdiff to apply to a lone-patch

When submitting a revised version of a patch or series, it can be
helpful (to reviewers) to include a summary of changes since the
previous attempt in the form of an interdiff, typically in the cover
letter. However, it is occasionally useful, despite making for a noisy
read, to insert an interdiff into the commentary section of the lone
patch of a 1-patch series.

Therefore, extend "git format-patch --interdiff=<prev>" to insert an
interdiff into the commentary section of a lone patch rather than
requiring a cover letter. The interdiff is indented to avoid confusing
git-am and human readers into considering it part of the patch proper.

Implementation note: Generating an interdiff for insertion into the
commentary section of a patch which itself is currently being generated
requires invoking the diffing machinery recursively. However, the
machinery does not (presently) support this since it uses global state.
Consequently, we need to take care to stash away the state of the
in-progress operation while generating the interdiff, and restore it
after.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

log-tree: show_log: make commentary block delimiting... Eric Sunshine Sun, 22 Jul 2018 09:57:08 +0000 (05:57 -0400)

log-tree: show_log: make commentary block delimiting reusable

In patches generated by git-format-patch, the area below the "---" line
following the commit message and before the actual 'diff' can be used
for commentary which the patch author wants to convey to readers of the
patch itself but not include in the commit message proper.

By default, the commentary area is empty, however, the --notes option
causes it to be populated with notes associated with the commit. In the
future, other options may be added which also insert content into the
commentary section.

To accommodate this, factor out the logic which delimits commentary
blocks from the commit message so that it can be re-used for upcoming
optional inserted content.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

interdiff: teach show_interdiff() to indent interdiffEric Sunshine Sun, 22 Jul 2018 09:57:07 +0000 (05:57 -0400)

interdiff: teach show_interdiff() to indent interdiff

A future change will allow "git format-patch --interdiff=<prev> -1" to
insert an interdiff into the commentary section of the lone patch of a
1-patch series. However, to prevent the inserted interdiff from
confusing git-am, as well as human readers, it needs to be indented.
Therefore, teach show_interdiff() how to indent.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

format-patch: teach --interdiff to respect -v/--reroll... Eric Sunshine Sun, 22 Jul 2018 09:57:06 +0000 (05:57 -0400)

format-patch: teach --interdiff to respect -v/--reroll-count

The --interdiff option introduces the embedded interdiff generically as
"Interdiff:", however, we can do better when --reroll-count is specified
by emitting "Interdiff against v{n}:" instead.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

format-patch: add --interdiff option to embed diff... Eric Sunshine Sun, 22 Jul 2018 09:57:05 +0000 (05:57 -0400)

format-patch: add --interdiff option to embed diff in cover letter

When submitting a revised version of a patch series, it can be helpful
(to reviewers) to include a summary of changes since the previous
attempt in the form of an interdiff, however, doing so involves manually
copy/pasting the diff into the cover letter.

Add an --interdiff option to automate this process. The argument to
--interdiff specifies the tip of the previous attempt against which to
generate the interdiff. For example:

git format-patch --cover-letter --interdiff=v1 -3 v2

The previous attempt and the patch series being formatted must share a
common base.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

format-patch: allow additional generated content in... Eric Sunshine Sun, 22 Jul 2018 09:57:04 +0000 (05:57 -0400)

format-patch: allow additional generated content in make_cover_letter()

make_cover_letter() returns early when it lacks sufficient state to emit
a diffstat, which makes it difficult to extend the function to reliably
emit additional generated content. Work around this shortcoming by
factoring diffstat-printing logic out to its own function and calling it
as needed without otherwise inhibiting normal control flow.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

coccinelle: extract dedicated make target to clean... SZEDER Gábor Mon, 23 Jul 2018 13:51:00 +0000 (15:51 +0200)

coccinelle: extract dedicated make target to clean Coccinelle's results

Sometimes I want to remove only Coccinelle's results, but keep all
other build artifacts left after my usual 'make all man' build. This
new 'cocciclean' make target will allow just that.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

coccinelle: put sane filenames into output patchesSZEDER Gábor Mon, 23 Jul 2018 13:50:59 +0000 (15:50 +0200)

coccinelle: put sane filenames into output patches

Coccinelle outputs its suggested transformations as patches, whose
header looks something like this:

--- commit.c
+++ /tmp/cocci-output-19250-7ae78a-commit.c

Note the lack of 'diff --opts <old> <new>' line, the differing number
of path components on the --- and +++ lines, and the nonsensical
filename on the +++ line. 'patch -p0' can still apply these patches,
as it takes the filename to be modified from the --- line. Alas, 'git
apply' can't, because it takes the filename from the +++ line, and
then complains about the nonexisting file.

Pass the '--patch .' options to Coccinelle via the SPATCH_FLAGS 'make'
variable, as it seems to make it generate proper context diff patches,
with the header starting with a 'diff ...' line and containing sane
filenames. The resulting 'contrib/coccinelle/*.cocci.patch' files
then can be applied both with 'git apply' and 'patch' (even without
'-p0').

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

coccinelle: exclude sha1dc source files from static... SZEDER Gábor Mon, 23 Jul 2018 13:50:58 +0000 (15:50 +0200)

coccinelle: exclude sha1dc source files from static analysis

sha1dc is an external library, that we carry in-tree for convenience
or grab as a submodule, so there is no use in applying our semantic
patches to its source files.

Therefore, exclude sha1dc's source files from Coccinelle's static
analysis.

This change also makes the static analysis somewhat faster: presumably
because of the heavy use of repetitive macro declarations, applying
the semantic patches 'array.cocci' and 'swap.cocci' to 'sha1dc/sha1.c'
takes over half a minute each on my machine, which amounts to about a
third of the runtime of applying these two semantic patches to the
whole git source tree.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

coccinelle: use $(addsuffix) in 'coccicheck' make targetSZEDER Gábor Mon, 23 Jul 2018 13:50:57 +0000 (15:50 +0200)

coccinelle: use $(addsuffix) in 'coccicheck' make target

The dependencies of the 'coccicheck' make target are listed with the
help of the $(patsubst) make function, which in this case doesn't do
any pattern substitution, but only adds the '.patch' suffix.

Use the shorter and more idiomatic $(addsuffix) make function instead.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

coccinelle: mark the 'coccicheck' make target as .PHONYSZEDER Gábor Mon, 23 Jul 2018 13:50:56 +0000 (15:50 +0200)

coccinelle: mark the 'coccicheck' make target as .PHONY

The 'coccicheck' target doesn't create a file with the same name, so
mark it as .PHONY.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t7406: avoid failures solely due to timing issuesJohannes Schindelin Mon, 23 Jul 2018 13:39:42 +0000 (06:39 -0700)

t7406: avoid failures solely due to timing issues

Regression tests are automated tests which try to ensure a specific
behavior. The idea is: if the test case fails, the behavior indicated in
the test case's title regressed.

If a regression test that fails, even occasionally, for any reason other
than to indicate the particular regression(s) it tries to catch, it is
less useful than when it really only fails when there is a bug in the
(non-test) code that needs to be fixed.

In the instance of the test case "submodule update --init --recursive
from subdirectory" of the script t7406-submodule-update.sh, the exact
output of a recursive clone is compared with a pre-generated one. And
this is a racy test because the structure of the submodules only
guarantees a *partial* order. The 'none' and the 'rebasing' submodules
*can* be cloned in any order, which means that a mismatch with the
hard-coded order does not necessarily indicate a bug in the tested code.

See for example:
https://git-for-windows.visualstudio.com/git/_build/results?buildId=14035&view=logs

To prevent such false positives from unnecessarily costing time when
investigating test failures, let's take the exact order of the lines out
of the equation by sorting them before comparing them.

This test script seems not to have any more test cases that try to
verify any specific order in which recursive clones process the
submodules, therefore this is the only test case that is changed in this
manner.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

travis-ci: fail if Coccinelle static analysis found... SZEDER Gábor Mon, 23 Jul 2018 13:02:30 +0000 (15:02 +0200)

travis-ci: fail if Coccinelle static analysis found something to transform

Coccinelle's and in turn 'make coccicheck's exit code only indicates
that Coccinelle managed to finish its analysis without any errors
(e.g. no unknown --options, no missing files, no syntax errors in the
semantic patches, etc.), but it doesn't indicate whether it found any
undesired code patterns to transform or not. To find out the latter,
one has to look closer at 'make coccicheck's standard output and look
for lines like:

SPATCH result: contrib/coccinelle/<something>.cocci.patch

And this only indicates that there is something to transform, but to
see what the suggested transformations are one has to actually look
into those '*.cocci.patch' files.

This makes the automated static analysis build job on Travis CI not
particularly useful, because it neither draws our attention to
Coccinelle's findings, nor shows the actual findings. Consequently,
new topics introducing undesired code patterns graduated to master
on several occasions without anyone noticing.

The only way to draw attention in such an automated setting is to fail
the build job. Therefore, modify the 'ci/run-static-analysis.sh'
build script to check all the resulting '*.cocci.patch' files, and
fail the build job if any of them turns out to be not empty. Include
those files' contents, i.e. Coccinelle's suggested transformations, in
the build job's trace log, so we'll know why it failed.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

travis-ci: run Coccinelle static analysis with two... SZEDER Gábor Mon, 23 Jul 2018 13:02:29 +0000 (15:02 +0200)

travis-ci: run Coccinelle static analysis with two parallel jobs

Currently the static analysis build job runs Coccinelle using a single
'make' job. Using two parallel jobs cuts down the build job's run
time from around 10-12mins to 6-7mins, sometimes even under 6mins
(there is quite large variation between build job runtimes). More
than two parallel jobs don't seem to bring further runtime benefits.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

transport-helper.c: mark more strings for translationNguyễn Thái Ngọc Duy Sat, 21 Jul 2018 07:49:41 +0000 (09:49 +0200)

transport-helper.c: mark more strings for translation

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

transport.c: mark more strings for translationNguyễn Thái Ngọc Duy Sat, 21 Jul 2018 07:49:40 +0000 (09:49 +0200)

transport.c: mark more strings for translation

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

sha1-file.c: mark more strings for translationNguyễn Thái Ngọc Duy Sat, 21 Jul 2018 07:49:39 +0000 (09:49 +0200)

sha1-file.c: mark more strings for translation

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

sequencer.c: mark more strings for translationNguyễn Thái Ngọc Duy Sat, 21 Jul 2018 07:49:38 +0000 (09:49 +0200)

sequencer.c: mark more strings for translation

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

replace-object.c: mark more strings for translationNguyễn Thái Ngọc Duy Sat, 21 Jul 2018 07:49:37 +0000 (09:49 +0200)

replace-object.c: mark more strings for translation

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

refspec.c: mark more strings for translationNguyễn Thái Ngọc Duy Sat, 21 Jul 2018 07:49:36 +0000 (09:49 +0200)

refspec.c: mark more strings for translation

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

refs.c: mark more strings for translationNguyễn Thái Ngọc Duy Sat, 21 Jul 2018 07:49:35 +0000 (09:49 +0200)

refs.c: mark more strings for translation

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

pkt-line.c: mark more strings for translationNguyễn Thái Ngọc Duy Sat, 21 Jul 2018 07:49:34 +0000 (09:49 +0200)

pkt-line.c: mark more strings for translation

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

object.c: mark more strings for translationNguyễn Thái Ngọc Duy Sat, 21 Jul 2018 07:49:33 +0000 (09:49 +0200)

object.c: mark more strings for translation

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

exec-cmd.c: mark more strings for translationNguyễn Thái Ngọc Duy Sat, 21 Jul 2018 07:49:32 +0000 (09:49 +0200)

exec-cmd.c: mark more strings for translation

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

environment.c: mark more strings for translationNguyễn Thái Ngọc Duy Sat, 21 Jul 2018 07:49:31 +0000 (09:49 +0200)

environment.c: mark more strings for translation

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

dir.c: mark more strings for translationNguyễn Thái Ngọc Duy Sat, 21 Jul 2018 07:49:30 +0000 (09:49 +0200)

dir.c: mark more strings for translation

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

convert.c: mark more strings for translationNguyễn Thái Ngọc Duy Sat, 21 Jul 2018 07:49:29 +0000 (09:49 +0200)

convert.c: mark more strings for translation

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

connect.c: mark more strings for translationNguyễn Thái Ngọc Duy Sat, 21 Jul 2018 07:49:28 +0000 (09:49 +0200)

connect.c: mark more strings for translation

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

config.c: mark more strings for translationNguyễn Thái Ngọc Duy Sat, 21 Jul 2018 07:49:27 +0000 (09:49 +0200)

config.c: mark more strings for translation

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit-graph.c: mark more strings for translationNguyễn Thái Ngọc Duy Sat, 21 Jul 2018 07:49:26 +0000 (09:49 +0200)

commit-graph.c: mark more strings for translation

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/replace.c: mark more strings for translationNguyễn Thái Ngọc Duy Sat, 21 Jul 2018 07:49:25 +0000 (09:49 +0200)

builtin/replace.c: mark more strings for translation

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/pack-objects.c: mark more strings for translationNguyễn Thái Ngọc Duy Sat, 21 Jul 2018 07:49:24 +0000 (09:49 +0200)

builtin/pack-objects.c: mark more strings for translation

Most of these are straight forward. GETTEXT_POISON does catch the last
string in cmd_pack_objects(), but since this is --progress output, it's
not supposed to be machine-readable.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/grep.c: mark strings for translationNguyễn Thái Ngọc Duy Sat, 21 Jul 2018 07:49:23 +0000 (09:49 +0200)

builtin/grep.c: mark strings for translation

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/config.c: mark more strings for translationNguyễn Thái Ngọc Duy Sat, 21 Jul 2018 07:49:22 +0000 (09:49 +0200)

builtin/config.c: mark more strings for translation

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

archive-zip.c: mark more strings for translationNguyễn Thái Ngọc Duy Sat, 21 Jul 2018 07:49:21 +0000 (09:49 +0200)

archive-zip.c: mark more strings for translation

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

archive-tar.c: mark more strings for translationNguyễn Thái Ngọc Duy Sat, 21 Jul 2018 07:49:20 +0000 (09:49 +0200)

archive-tar.c: mark more strings for translation

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Update messages in preparation for i18nNguyễn Thái Ngọc Duy Sat, 21 Jul 2018 07:49:19 +0000 (09:49 +0200)

Update messages in preparation for i18n

Many messages will be marked for translation in the following
commits. This commit updates some of them to be more consistent and
reduce diff noise in those commits. Changes are

- keep the first letter of die(), error() and warning() in lowercase
- no full stop in die(), error() or warning() if it's single sentence
messages
- indentation
- some messages are turned to BUG(), or prefixed with "BUG:" and will
not be marked for i18n
- some messages are improved to give more information
- some messages are broken down by sentence to be i18n friendly
(on the same token, combine multiple warning() into one big string)
- the trailing \n is converted to printf_ln if possible, or deleted
if not redundant
- errno_errno() is used instead of explicit strerror()

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

pack-objects: fix performance issues on packing large... Nguyễn Thái Ngọc Duy Sun, 22 Jul 2018 08:04:21 +0000 (10:04 +0200)

pack-objects: fix performance issues on packing large deltas

Let's start with some background about oe_delta_size() and
oe_set_delta_size(). If you already know, skip the next paragraph.

These two are added in 0aca34e826 (pack-objects: shrink delta_size
field in struct object_entry - 2018-04-14) to help reduce 'struct
object_entry' size. The delta size field in this struct is reduced to
only contain max 1MB. So if any new delta is produced and larger than
1MB, it's dropped because we can't really save such a large size
anywhere. Fallback is provided in case existing packfiles already have
large deltas, then we can retrieve it from the pack.

While this should help small machines repacking large repos without
large deltas (i.e. less memory pressure), dropping large deltas during
the delta selection process could end up with worse pack files. And if
existing packfiles already have >1MB delta and pack-objects is
instructed to not reuse deltas, all of them will be dropped on the
floor, and the resulting pack would be definitely bigger.

There is also a regression in terms of CPU/IO if we have large on-disk
deltas because fallback code needs to parse the pack every time the
delta size is needed and just access to the mmap'd pack data is enough
for extra page faults when memory is under pressure.

Both of these issues were reported on the mailing list. Here's some
numbers for comparison.

Version Pack (MB) MaxRSS(kB) Time (s)
------- --------- ---------- --------
2.17.0 5498 43513628 2494.85
2.18.0 10531 40449596 4168.94

This patch provides a better fallback that is

- cheaper in terms of cpu and io because we won't have to read
existing pack files as much

- better in terms of pack size because the pack heuristics is back to
2.17.0 time, we do not drop large deltas at all

If we encounter any delta (on-disk or created during try_delta phase)
that is larger than the 1MB limit, we stop using delta_size_ field for
this because it can't contain such size anyway. A new array of delta
size is dynamically allocated and can hold all the deltas that 2.17.0
can. This array only contains delta sizes that delta_size_ can't
contain.

With this, we do not have to drop deltas in try_delta() anymore. Of
course the downside is we use slightly more memory, even compared to
2.17.0. But since this is considered an uncommon case, a bit more
memory consumption should not be a problem.

Delta size limit is also raised from 1MB to 16MB to better cover
common case and avoid that extra memory consumption (99.999% deltas in
this reported repo are under 12MB; Jeff noted binary artifacts topped
out at about 3MB in some other private repos). Other fields are
shuffled around to keep this struct packed tight. We don't use more
memory in common case even with this limit update.

A note about thread synchronization. Since this code can be run in
parallel during delta searching phase, we need a mutex. The realloc
part in packlist_alloc() is not protected because it only happens
during the object counting phase, which is always single-threaded.

Access to e->delta_size_ (and by extension
pack->delta_size[e - pack->objects]) is unprotected as before, the
thread scheduler in pack-objects must make sure "e" is never updated
by two different threads.

The area under the new lock is as small as possible, avoiding locking
at all in common case, since lock contention with high thread count
could be expensive (most blobs are small enough that delta compute
time is short and we end up taking the lock very often). The previous
attempt to always hold a lock in oe_delta_size() and
oe_set_delta_size() increases execution time by 33% when repacking
linux.git with with 40 threads.

Reported-by: Elijah Newren <newren@gmail.com>
Helped-by: Elijah Newren <newren@gmail.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

xdiff/histogram: remove tail recursionStefan Beller Thu, 19 Jul 2018 22:19:42 +0000 (15:19 -0700)

xdiff/histogram: remove tail recursion

When running the same reproduction script as the previous patch,
it turns out the stack is too small, which can be easily avoided.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit-reach: use can_all_from_reachDerrick Stolee Fri, 20 Jul 2018 16:33:30 +0000 (16:33 +0000)

commit-reach: use can_all_from_reach

The is_descendant_of method previously used in_merge_bases() to check if
the commit can reach any of the commits in the provided list. This had
two performance problems:

1. The performance is quadratic in worst-case.

2. A single in_merge_bases() call requires walking beyond the target
commit in order to find the full set of boundary commits that may be
merge-bases.

The can_all_from_reach method avoids this quadratic behavior and can
limit the search beyond the target commits using generation numbers. It
requires a small prototype adjustment to stop using commit-date as a
cutoff, as that optimization is no longer appropriate here.

Since in_merge_bases() uses paint_down_to_common(), is_descendant_of()
naturally found cutoffs to avoid walking the entire commit graph. Since
we want to always return the correct result, we cannot use the
min_commit_date cutoff in can_all_from_reach. We then rely on generation
numbers to provide the cutoff.

Since not all repos will have a commit-graph file, nor will we always
have generation numbers computed for a commit-graph file, create a new
method, generation_numbers_enabled(), that checks for a commit-graph
file and sees if the first commit in the file has a non-zero generation
number. In the case that we do not have generation numbers, use the old
logic for is_descendant_of().

Performance was meausured on a copy of the Linux repository using the
'test-tool reach is_descendant_of' command using this input:

A:v4.9
X:v4.10
X:v4.11
X:v4.12
X:v4.13
X:v4.14
X:v4.15
X:v4.16
X:v4.17
X.v3.0

Note that this input is tailored to demonstrate the quadratic nature of
the previous method, as it will compute merge-bases for v4.9 versus all
of the later versions before checking against v4.1.

Before: 0.26 s
After: 0.21 s

Since we previously used the is_descendant_of method in the ref_newer
method, we also measured performance there using
'test-tool reach ref_newer' with this input:

A:v4.9
B:v3.19

Before: 0.10 s
After: 0.08 s

By adding a new commit with parent v3.19, we test the non-reachable case
of ref_newer:

Before: 0.09 s
After: 0.08 s

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit-reach: make can_all_from_reach... linearDerrick Stolee Fri, 20 Jul 2018 16:33:28 +0000 (16:33 +0000)

commit-reach: make can_all_from_reach... linear

The can_all_from_reach_with_flags() algorithm is currently quadratic in
the worst case, because it calls the reachable() method for every 'from'
without tracking which commits have already been walked or which can
already reach a commit in 'to'.

Rewrite the algorithm to walk each commit a constant number of times.

We also add some optimizations that should work for the main consumer of
this method: fetch negotitation (haves/wants).

The first step includes using a depth-first-search (DFS) from each
'from' commit, sorted by ascending generation number. We do not walk
beyond the minimum generation number or the minimum commit date. This
DFS is likely to be faster than the existing reachable() method because
we expect previous ref values to be along the first-parent history.

If we find a target commit, then we mark everything in the DFS stack as
a RESULT. This expands the set of targets for the other 'from' commits.
We also mark the visited commits using 'assign_flag' to prevent re-
walking the same commits.

We still need to clear our flags at the end, which is why we will have a
total of three visits to each commit.

Performance was measured on the Linux repository using
'test-tool reach can_all_from_reach'. The input included rows seeded by
tag values. The "small" case included X-rows as v4.[0-9]* and Y-rows as
v3.[0-9]*. This mimics a (very large) fetch that says "I have all major
v3 releases and want all major v4 releases." The "large" case included
X-rows as "v4.*" and Y-rows as "v3.*". This adds all release-candidate
tags to the set, which does not greatly increase the number of objects
that are considered, but does increase the number of 'from' commits,
demonstrating the quadratic nature of the previous code.

Small Case:

Before: 1.52 s
After: 0.26 s

Large Case:

Before: 3.50 s
After: 0.27 s

Note how the time increases between the two cases in the two versions.
The new code increases relative to the number of commits that need to be
walked, but not directly relative to the number of 'from' commits.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit-reach: replace ref_newer logicDerrick Stolee Fri, 20 Jul 2018 16:33:27 +0000 (16:33 +0000)

commit-reach: replace ref_newer logic

The ref_newer method is used by 'git push' to check if a force-push is
required. This method does not use any kind of cutoff when walking, so
in the case of a force-push will walk all reachable commits.

The is_descendant_of method already uses paint_down_to_common along with
cutoffs. By translating the ref_newer arguments into the commit and
commit_list required by is_descendant_of, we can have one fewer commit
walk and also improve our performance!

For a copy of the Linux repository, 'test-tool reach ref_newer' presents
the following improvements with the specified input. In the case that
ref_newer returns 1, there is no improvement. The improvement is in the
second case where ref_newer returns 0.

Input:
A:v4.9
B:v3.19

Before: 0.09 s
After: 0.09 s

To test the negative case, add a new commit with parent v3.19,
regenerate the commit-graph, and then run with B pointing at that
commit.

Before: 0.43 s
After: 0.09 s

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

test-reach: test commit_containsDerrick Stolee Fri, 20 Jul 2018 16:33:25 +0000 (16:33 +0000)

test-reach: test commit_contains

The commit_contains method has two modes which depend on the given
ref_filter struct. We have the "normal" algorithm (which is also the
typically-slow operation) and the "tag" algorithm. This difference is
essentially what changes performance for 'git branch --contains' versus
'git tag --contains'. There are thoughts that the data shapes used by
these two applications justify the different implementations.

Create tests using 'test-tool reach commit_contains [--tag]' to cover
both methods.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

test-reach: test can_all_from_reach_with_flagsDerrick Stolee Fri, 20 Jul 2018 16:33:23 +0000 (16:33 +0000)

test-reach: test can_all_from_reach_with_flags

The can_all_from_reach_with_flags method is used by ok_to_give_up in
upload-pack.c to see if we have done enough negotiation during a fetch.
This method is intentionally created to preserve state between calls to
assist with stateful negotiation, such as over SSH.

To make this method testable, add a new can_all_from_reach method that
does the initial setup and final tear-down. We will later use this
method in production code. Call the method from 'test-tool reach' for
now.

Since this is a many-to-many reachability query, add a new type of input
to the 'test-tool reach' input format. Lines "Y:<committish>" create a
list of commits to be the reachability targets from the commits in the
'X' list. In the context of fetch negotiation, the 'X' commits are the
'want' commits and the 'Y' commits are the 'have' commits.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

test-reach: test reduce_headsDerrick Stolee Fri, 20 Jul 2018 16:33:22 +0000 (16:33 +0000)

test-reach: test reduce_heads

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

test-reach: test get_merge_bases_manyDerrick Stolee Fri, 20 Jul 2018 16:33:20 +0000 (16:33 +0000)

test-reach: test get_merge_bases_many

The get_merge_bases_many method returns a list of merge bases for a
single commit (A) against a list of commits (X). Some care is needed in
constructing the expected behavior because the result is not the
expected merge-base for an octopus merge with those parents but instead
the set of maximal commits that are reachable from A and at least one of
the commits in X.

Add get_merge_bases_many to 'test-tool reach' and create a test that
demonstrates that this output returns multiple results. Specifically, we
select a list of three commits such that we output two commits that are
reachable from one of the first two, respectively, and none are
reachable from the third.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

test-reach: test is_descendant_ofDerrick Stolee Fri, 20 Jul 2018 16:33:18 +0000 (16:33 +0000)

test-reach: test is_descendant_of

The is_descendant_of method takes a single commit as its first parameter
and a list of commits as its second parameter. Extend the input of the
'test-tool reach' command to take multiple lines of the form
"X:<committish>" to construct a list of commits. Pass these to
is_descendant_of and create tests that check each result.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

test-reach: test in_merge_basesDerrick Stolee Fri, 20 Jul 2018 16:33:17 +0000 (16:33 +0000)

test-reach: test in_merge_bases

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

test-reach: create new test tool for ref_newerDerrick Stolee Fri, 20 Jul 2018 16:33:15 +0000 (16:33 +0000)

test-reach: create new test tool for ref_newer

As we prepare to change the behavior of the algorithms in
commit-reach.c, create a new test-tool subcommand 'reach' to test these
methods on interesting commit-graph shapes.

To use the new test-tool, use 'test-tool reach <method>' and provide
input to stdin that describes the inputs to the method. Currently, we
only implement the ref_newer method, which requires two commits. Use
lines "A:<committish>" and "B:<committish>" for the two inputs. We will
expand this input later to accommodate methods that take lists of
commits.

The test t6600-test-reach.sh creates a repo whose commits form a
two-dimensional grid. This grid makes it easy for us to determine
reachability because commit-A-B can reach commit-X-Y if and only if A is
at least X and B is at least Y. This helps create interesting test cases
for each result of the methods in commit-reach.c.

We test all methods in three different states of the commit-graph file:
Non-existent (no generation numbers), fully computed, and mixed (some
commits have generation numbers and others do not).

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit-reach: move can_all_from_reach_with_flagsDerrick Stolee Fri, 20 Jul 2018 16:33:13 +0000 (16:33 +0000)

commit-reach: move can_all_from_reach_with_flags

There are several commit walks in the codebase. Group them together into
a new commit-reach.c file and corresponding header. After we group these
walks into one place, we can reduce duplicate logic by calling
equivalent methods.

The can_all_from_reach_with_flags method is used in a stateful way by
upload-pack.c. The parameters are very flexible, so we will be able to
use its commit walking logic for many other callers.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

upload-pack: generalize commit date cutoffDerrick Stolee Fri, 20 Jul 2018 16:33:12 +0000 (16:33 +0000)

upload-pack: generalize commit date cutoff

The ok_to_give_up() method uses the commit date as a cutoff to avoid
walking the entire reachble set of commits. Before moving the
reachable() method to commit-reach.c, pull out the dependence on the
global constant 'oldest_have' with a 'min_commit_date' parameter.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

upload-pack: refactor ok_to_give_up()Derrick Stolee Fri, 20 Jul 2018 16:33:11 +0000 (16:33 +0000)

upload-pack: refactor ok_to_give_up()

In anticipation of consolidating all commit reachability algorithms,
refactor ok_to_give_up() in order to allow splitting its logic into
an external method.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

upload-pack: make reachable() more genericDerrick Stolee Fri, 20 Jul 2018 16:33:09 +0000 (16:33 +0000)

upload-pack: make reachable() more generic

In anticipation of moving the reachable() method to commit-reach.c,
modify the prototype to be more generic to flags known outside of
upload-pack.c. Also rename 'want' to 'from' to make the statement
more clear outside of the context of haves/wants negotiation.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit-reach: move commit_contains from ref-filterDerrick Stolee Fri, 20 Jul 2018 16:33:08 +0000 (16:33 +0000)

commit-reach: move commit_contains from ref-filter

There are several commit walks in the codebase. Group them together into
a new commit-reach.c file and corresponding header. After we group these
walks into one place, we can reduce duplicate logic by calling
equivalent methods.

All methods are direct moves, except we also make the commit_contains()
method public so its consumers in ref-filter.c can still call it. We can
also test this method in a test-tool in a later commit.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit-reach: move ref_newer from remote.cDerrick Stolee Fri, 20 Jul 2018 16:33:06 +0000 (16:33 +0000)

commit-reach: move ref_newer from remote.c

There are several commit walks in the codebase. Group them together into
a new commit-reach.c file and corresponding header. After we group these
walks into one place, we can reduce duplicate logic by calling
equivalent methods.

The ref_newer() method is used by 'git push -f' to check if a force-push
is necessary. By making the method public, we make it possible to test
the method directly without setting up an envieronment where a 'git
push' call makes sense.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit.h: remove method declarationsDerrick Stolee Fri, 20 Jul 2018 16:33:04 +0000 (16:33 +0000)

commit.h: remove method declarations

These methods are now declared in commit-reach.h. Remove them from
commit.h and add new include statements in all files that require these
declarations.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit-reach: move walk methods from commit.cDerrick Stolee Fri, 20 Jul 2018 16:33:02 +0000 (16:33 +0000)

commit-reach: move walk methods from commit.c

There are several commit walks in the codebase. Group them together into
a new commit-reach.c file and corresponding header. After we group these
walks into one place, we can reduce duplicate logic by calling
equivalent methods.

The method declarations in commit.h are not touched by this commit and
will be moved in a following commit. Many consumers need to point to
commit-reach.h and that would bloat this commit.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

clone: send ref-prefixes when using protocol v2Brandon Williams Fri, 20 Jul 2018 22:07:54 +0000 (15:07 -0700)

clone: send ref-prefixes when using protocol v2

Teach clone to send a list of ref-prefixes, when using protocol v2, to
allow the server to filter out irrelevant references from the
ref-advertisement. This reduces wasted time and bandwidth when cloning
repositories with a larger number of references.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Documentation/git-interpret-trailers: explain possible... Stefan Beller Fri, 20 Jul 2018 21:53:49 +0000 (14:53 -0700)

Documentation/git-interpret-trailers: explain possible values

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

midx: clear midx on repackDerrick Stolee Thu, 12 Jul 2018 19:39:40 +0000 (15:39 -0400)

midx: clear midx on repack

If a 'git repack' command replaces existing packfiles, then we must
clear the existing multi-pack-index before moving the packfiles it
references.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

packfile: skip loading index if in multi-pack-indexDerrick Stolee Thu, 12 Jul 2018 19:39:39 +0000 (15:39 -0400)

packfile: skip loading index if in multi-pack-index

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

midx: prevent duplicate packfile loadsDerrick Stolee Thu, 12 Jul 2018 19:39:38 +0000 (15:39 -0400)

midx: prevent duplicate packfile loads

The multi-pack-index, when present, tracks the existence of objects and
their offsets within a list of packfiles. This allows us to use the
multi-pack-index for object lookups, abbreviations, and object counts.

When the multi-pack-index tracks a packfile, then we do not need to add
that packfile to the packed_git linked list or the MRU list.

We still need to load the packfiles that are not tracked by the
multi-pack-index.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

midx: use midx in approximate_object_countDerrick Stolee Thu, 12 Jul 2018 19:39:37 +0000 (15:39 -0400)

midx: use midx in approximate_object_count

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

midx: use existing midx when writing new oneDerrick Stolee Thu, 12 Jul 2018 19:39:36 +0000 (15:39 -0400)

midx: use existing midx when writing new one

Due to how Windows handles replacing a lockfile when there is an open
handle, create the close_midx() method to close the existing midx before
writing the new one.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

midx: use midx in abbreviation calculationsDerrick Stolee Thu, 12 Jul 2018 19:39:35 +0000 (15:39 -0400)

midx: use midx in abbreviation calculations

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

midx: read objects from multi-pack-indexDerrick Stolee Thu, 12 Jul 2018 19:39:34 +0000 (15:39 -0400)

midx: read objects from multi-pack-index

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

config: create core.multiPackIndex settingDerrick Stolee Thu, 12 Jul 2018 19:39:33 +0000 (15:39 -0400)

config: create core.multiPackIndex setting

The core.multiPackIndex config setting controls the multi-pack-
index (MIDX) feature. If false, the setting will disable all reads
from the multi-pack-index file.

Read this config setting in the new prepare_multi_pack_index_one()
which is called during prepare_packed_git(). This check is run once
per repository.

Add comparison commands in t5319-multi-pack-index.sh to check
typical Git behavior remains the same as the config setting is turned
on and off. This currently includes 'git rev-list' and 'git log'
commands to trigger several object database reads. Currently, these
would only catch an error in the prepare_multi_pack_index_one(), but
with later commits will catch errors in object lookups, abbreviations,
and approximate object counts.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

midx: write object offsetsDerrick Stolee Thu, 12 Jul 2018 19:39:32 +0000 (15:39 -0400)

midx: write object offsets

The final pair of chunks for the multi-pack-index file stores the object
offsets. We default to using 32-bit offsets as in the pack-index version
1 format, but if there exists an offset larger than 32-bits, we use a
trick similar to the pack-index version 2 format by storing all offsets
at least 2^31 in a 64-bit table; we use the 32-bit table to point into
that 64-bit table as necessary.

We only store these 64-bit offsets if necessary, so create a test that
manipulates a version 2 pack-index to fake a large offset. This allows
us to test that the large offset table is created, but the data does not
match the actual packfile offsets. The multi-pack-index offset does match
the (corrupted) pack-index offset, so a future feature will compare these
offsets during a 'verify' step.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

midx: write object id fanout chunkDerrick Stolee Thu, 12 Jul 2018 19:39:31 +0000 (15:39 -0400)

midx: write object id fanout chunk

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

midx: write object ids in a chunkDerrick Stolee Thu, 12 Jul 2018 19:39:30 +0000 (15:39 -0400)

midx: write object ids in a chunk

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

midx: sort and deduplicate objects from packfilesDerrick Stolee Thu, 12 Jul 2018 19:39:29 +0000 (15:39 -0400)

midx: sort and deduplicate objects from packfiles

Before writing a list of objects and their offsets to a multi-pack-index,
we need to collect the list of objects contained in the packfiles. There
may be multiple copies of some objects, so this list must be deduplicated.

It is possible to artificially get into a state where there are many
duplicate copies of objects. That can create high memory pressure if we
are to create a list of all objects before de-duplication. To reduce
this memory pressure without a significant performance drop,
automatically group objects by the first byte of their object id. Use
the IDX fanout tables to group the data, copy to a local array, then
sort.

Copy only the de-duplicated entries. Select the duplicate based on the
most-recent modified time of a packfile containing the object.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

midx: read pack names into arrayDerrick Stolee Thu, 12 Jul 2018 19:39:28 +0000 (15:39 -0400)

midx: read pack names into array

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

multi-pack-index: write pack names in chunkDerrick Stolee Thu, 12 Jul 2018 19:39:27 +0000 (15:39 -0400)

multi-pack-index: write pack names in chunk

The multi-pack-index needs to track which packfiles it indexes. Store
these in our first required chunk. Since filenames are not well
structured, add padding to keep good alignment in later chunks.

Modify the 'git multi-pack-index read' subcommand to output the
existence of the pack-file name chunk. Modify t5319-multi-pack-index.sh
to reflect this new output and the new expected number of chunks.

Defense in depth: A pattern we are using in the multi-pack-index feature
is to verify the data as we write it. We want to ensure we never write
invalid data to the multi-pack-index. There are many checks that verify
that the values we are writing fit the format definitions. This mainly
helps developers while working on the feature, but it can also identify
issues that only appear when dealing with very large data sets. These
large sets are hard to encode into test cases.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

multi-pack-index: read packfile listDerrick Stolee Thu, 12 Jul 2018 19:39:26 +0000 (15:39 -0400)

multi-pack-index: read packfile list

When constructing a multi-pack-index file for a given object directory,
read the files within the enclosed pack directory and find matches that
end with ".idx" and find the correct paired packfile using
add_packed_git().

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

packfile: generalize pack directory listDerrick Stolee Thu, 12 Jul 2018 19:39:25 +0000 (15:39 -0400)

packfile: generalize pack directory list

In anticipation of sharing the pack directory listing with the
multi-pack-index, generalize prepare_packed_git_one() into
for_each_file_in_pack_dir().

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t5319: expand test dataDerrick Stolee Thu, 12 Jul 2018 19:39:24 +0000 (15:39 -0400)

t5319: expand test data

As we build the multi-pack-index file format, we want to test the format
on real repositories. Add tests that create repository data including
multiple packfiles with both version 1 and version 2 formats.

The current 'git multi-pack-index write' command will always write the
same file with no "real" data. This will be expanded in future commits,
along with the test expectations.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

multi-pack-index: load into memoryDerrick Stolee Thu, 12 Jul 2018 19:39:23 +0000 (15:39 -0400)

multi-pack-index: load into memory

Create a new multi_pack_index struct for loading multi-pack-indexes into
memory. Create a test-tool builtin for reading basic information about
that multi-pack-index to verify the correct data is written.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

midx: write header information to lockfileDerrick Stolee Thu, 12 Jul 2018 19:39:22 +0000 (15:39 -0400)

midx: write header information to lockfile

As we begin writing the multi-pack-index format to disk, start with
the basics: the 12-byte header and the 20-byte checksum footer. Start
with these basics so we can add the rest of the format in small
increments.

As we implement the format, we will use a technique to check that our
computed offsets within the multi-pack-index file match what we are
actually writing. Each method that writes to the hashfile will return
the number of bytes written, and we will track that those values match
our expectations.

Currently, write_midx_header() returns 12, but is not checked. We will
check the return value in a later commit.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

multi-pack-index: add 'write' verbDerrick Stolee Thu, 12 Jul 2018 19:39:21 +0000 (15:39 -0400)

multi-pack-index: add 'write' verb

In anticipation of writing multi-pack-indexes, add a skeleton
'git multi-pack-index write' subcommand and send the options to a
write_midx_file() method. Also create a skeleton test script that
tests the 'write' subcommand.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

multi-pack-index: add builtinDerrick Stolee Thu, 12 Jul 2018 19:39:20 +0000 (15:39 -0400)

multi-pack-index: add builtin

This new 'git multi-pack-index' builtin will be the plumbing access
for writing, reading, and checking multi-pack-index files. The
initial implementation is a no-op.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t9300: wait for background fast-import process to die... SZEDER Gábor Thu, 19 Jul 2018 22:26:41 +0000 (00:26 +0200)

t9300: wait for background fast-import process to die after killing it

The five new tests added to 't9300-fast-import.sh' in 30e215a65c
(fast-import: checkpoint: dump branches/tags/marks even if
object_count==0, 2017-09-28), all with the prefix "V:" in their test
description, run 'git fast-import' in the background and then 'kill'
it as part of a 'test_when_finished' cleanup command. When this test
script is executed with Bash, some or even all of these tests tend to
pollute the test script's stderr, and messages about terminated
processes end up on the terminal:

$ bash ./t9300-fast-import.sh
<... snip ...>
ok 179 - V: checkpoint helper does not get stuck with extra output
/<...>/test-lib-functions.sh: line 388: 28383 Terminated git fast-import $options 0<&8 1>&9
ok 180 - V: checkpoint updates refs after reset
./t9300-fast-import.sh: line 3210: 28401 Terminated git fast-import $options 0<&8 1>&9
ok 181 - V: checkpoint updates refs and marks after commit
ok 182 - V: checkpoint updates refs and marks after commit (no new objects)
./test-lib.sh: line 634: line 3250: 28485 Terminated git fast-import $options 0<&8 1>&9
ok 183 - V: checkpoint updates tags after tag
./t9300-fast-import.sh: line 3264: 28510 Terminated git fast-import $options 0<&8 1>&9

After a background child process terminates, its parent Bash process
always outputs a message like those above to stderr, even when in
non-interactive mode.

But how do some of these messages end up on the test script's stderr,
why don't we get them from all five tests, and why do they come from
different file/line locations? Well, after sending the TERM signal to
the background child process, it takes a little while until that
process receives the signal and terminates, and then it takes another
while until the parent process notices it. During this time the
parent Bash process is continuing execution, and by the time it
notices that its child terminated it might have already left
'test_eval_inner_' and its stderr is not redirected to /dev/null
anymore. That's why such a message can appear on the test script's
stderr, while other times, when the child terminates fast and/or the
parent shell is slow enough, the message ends up in /dev/null, just
like any other output of the test does. Bash always adds the file
name and line number of the code location it was about to execute when
it notices the termination of its child process as a prefix to that
message, hence the varying and sometimes totally unrelated location
prefixes in those messages (e.g. line 388 in 'test-lib-functions.sh'
is 'test_verify_prereq', and I saw such a message pointing to
'say_color' as well).

Prevent these messages from appearing on the test script's stderr by
'wait'-ing on the pid of the background 'git fast-import' process
after sending it the TERM signal. This ensures that the executing
shell's stderr is still redirected when the shell notices the
termination of its child process in the background, and that these
messages get a consistent file/line location prefix.

Note that this is not an issue when the test script is run with Bash
and '-v', because then these messages are supposed to go to the test
script's stderr anyway, and indeed all of them do; though the
sometimes seemingly random file/line prefixes could be confusing
still. Similarly, it's not an issue with Bash and '--verbose-log'
either, because then all messages go to the log file as they should.
Finally, it's not an issue with some other shells (I tried dash, ksh,
ksh93 and mksh) even without any of the verbose options, because they
don't print messages like these in non-interactive mode in the first
place.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

gpg-interface t: extend the existing GPG tests with... Henning Schild Fri, 20 Jul 2018 08:28:07 +0000 (10:28 +0200)

gpg-interface t: extend the existing GPG tests with GPGSM

Add test cases to cover the new X509/gpgsm support. Most of them
resemble existing ones. They just switch the format to x509 and set the
signingkey when creating signatures. Validation of signatures does not
need any configuration of git, it does need gpgsm to be configured to
trust the key(-chain).
Several of the testcases build on top of existing gpg testcases.
The commit ships a self-signed key for committer@example.com and
configures gpgsm to trust it.

Signed-off-by: Henning Schild <henning.schild@siemens.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

xdiff/xhistogram: move index allocation into find_lcsStefan Beller Thu, 19 Jul 2018 18:56:20 +0000 (11:56 -0700)

xdiff/xhistogram: move index allocation into find_lcs

This fixes a memory issue when recursing a lot, which can be reproduced as

seq 1 100000 >one
seq 1 4 100000 >two
git diff --no-index --histogram one two

Before this patch, histogram_diff would call itself recursively before
calling free_index, which would mean a lot of memory is allocated during
the recursion and only freed afterwards. By moving the memory allocation
(and its free call) into find_lcs, the memory is free'd before we recurse,
such that memory is reused in the next step of the recursion instead of
using new memory.

This addresses only the memory pressure, not the run time complexity,
that is also awful for the corner case outlined above.

Helpful in understanding the code (in addition to the sparse history of
this file), was https://stackoverflow.com/a/32367597 which reproduces
most of the code comments of the JGit implementation.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

xdiff/xhistogram: factor out memory cleanup into free_i... Stefan Beller Thu, 19 Jul 2018 18:56:19 +0000 (11:56 -0700)

xdiff/xhistogram: factor out memory cleanup into free_index()

This will be useful in the next patch as we'll introduce multiple
callers.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

xdiff/xhistogram: pass arguments directly to fall_back_... Stefan Beller Thu, 19 Jul 2018 18:56:18 +0000 (11:56 -0700)

xdiff/xhistogram: pass arguments directly to fall_back_to_classic_diff

By passing the 'xpp' and 'env' argument directly to the function
'fall_back_to_classic_diff', we eliminate an occurrence of the 'index'
in histogram_diff, which will prove useful in a bit.

While at it, move it up in the file. This will make the diff of
one of the next patches more legible.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

diff.c: offer config option to control ws handling... Stefan Beller Wed, 18 Jul 2018 19:31:56 +0000 (12:31 -0700)

diff.c: offer config option to control ws handling in move detection

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

diff.c: add white space mode to move detection that... Stefan Beller Wed, 18 Jul 2018 19:31:55 +0000 (12:31 -0700)

diff.c: add white space mode to move detection that allows indent changes

The option of --color-moved has proven to be useful as observed on the
mailing list. However when refactoring sometimes the indentation changes,
for example when partitioning a functions into smaller helper functions
the code usually mostly moved around except for a decrease in indentation.

To just review the moved code ignoring the change in indentation, a mode
to ignore spaces in the move detection as implemented in a previous patch
would be enough. However the whole move coloring as motivated in commit
2e2d5ac (diff.c: color moved lines differently, 2017-06-30), brought
up the notion of the reviewer being able to trust the move of a "block".

As there are languages such as python, which depend on proper relative
indentation for the control flow of the program, ignoring any white space
change in a block would not uphold the promises of 2e2d5ac that allows
reviewers to pay less attention to the inside of a block, as inside
the reviewer wants to assume the same program flow.

This new mode of white space ignorance will take this into account and will
only allow the same white space changes per line in each block. This patch
even allows only for the same change at the beginning of the lines.

As this is a white space mode, it is made exclusive to other white space
modes in the move detection.

This patch brings some challenges, related to the detection of blocks.
We need a wide net to catch the possible moved lines, but then need to
narrow down to check if the blocks are still intact. Consider this
example (ignoring block sizes):

- A
- B
- C
+ A
+ B
+ C

At the beginning of a block when checking if there is a counterpart
for A, we have to ignore all space changes. However at the following
lines we have to check if the indent change stayed the same.

Checking if the indentation change did stay the same, is done by computing
the indentation change by the difference in line length, and then assume
the change is only in the beginning of the longer line, the common tail
is the same. That is why the test contains lines like:

- <TAB> A
...
+ A <TAB>
...

As the first line starting a block is caught using a compare function that
ignores white spaces unlike the rest of the block, where the white space
delta is taken into account for the comparison, we also have to think about
the following situation:

- A
- B
- A
- B
+ A
+ B
+ A
+ B

When checking if the first A (both in the + and - lines) is a start of
a block, we have to check all 'A' and record all the white space deltas
such that we can find the example above to be just one block that is
indented.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

add core.usereplacerefs config optionJeff King Wed, 18 Jul 2018 20:45:25 +0000 (16:45 -0400)

add core.usereplacerefs config option

We can already disable replace refs using a command line
option or environment variable, but those are awkward to
apply universally. Let's add a config option to do the same
thing.

That raises the question of why one might want to do so
universally. The answer is that replace refs violate the
immutability of objects. For instance, if you wanted to
cache the diff between commit XYZ and its parent, then in
theory that never changes; the hash XYZ represents the total
state. But replace refs violate that; pushing up a new ref
may create a completely new diff.

The obvious "if it hurts, don't do it" answer is not to
create replace refs if you're doing this kind of caching.
But for a site hosting arbitrary repositories, they may want
to allow users to share replace refs with each other, but
not actually respect them on the site (because the caching
is more important than the replace feature).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>