Andrew's git - gitweb.git/log

Merge branch 'jk/fsck-gitmodules-gently'Junio C Hamano Thu, 2 Aug 2018 22:30:39 +0000 (15:30 -0700)

Merge branch 'jk/fsck-gitmodules-gently'

Recent "security fix" to pay attention to contents of ".gitmodules"
while accepting "git push" was a bit overly strict than necessary,
which has been adjusted.

* jk/fsck-gitmodules-gently:
fsck: downgrade gitmodulesParse default to "info"
fsck: split ".gitmodules too large" error from parse failure
fsck: silence stderr when parsing .gitmodules
config: add options parameter to git_config_from_mem
config: add CONFIG_ERROR_SILENT handler
config: turn die_on_error into caller-facing enum

Merge branch 'bc/object-id'Junio C Hamano Thu, 2 Aug 2018 22:30:39 +0000 (15:30 -0700)

Merge branch 'bc/object-id'

Conversion from uchar[40] to struct object_id continues.

* bc/object-id:
pretty: switch hard-coded constants to the_hash_algo
sha1-file: convert constants to uses of the_hash_algo
log-tree: switch GIT_SHA1_HEXSZ to the_hash_algo->hexsz
diff: switch GIT_SHA1_HEXSZ to use the_hash_algo
builtin/merge-recursive: make hash independent
builtin/merge: switch to use the_hash_algo
builtin/fmt-merge-msg: make hash independent
builtin/update-index: simplify parsing of cacheinfo
builtin/update-index: convert to using the_hash_algo
refs/files-backend: use the_hash_algo for writing refs
sha1-name: use the_hash_algo when parsing object names
strbuf: allocate space with GIT_MAX_HEXSZ
commit: express tree entry constants in terms of the_hash_algo
hex: switch to using the_hash_algo
tree-walk: replace hard-coded constants with the_hash_algo
cache: update object ID functions for the_hash_algo

Merge branch 'en/t6036-recursive-corner-cases'Junio C Hamano Thu, 2 Aug 2018 22:30:39 +0000 (15:30 -0700)

Merge branch 'en/t6036-recursive-corner-cases'

Tests to cover more D/F conflict cases have been added for
merge-recursive.

* en/t6036-recursive-corner-cases:
t6036: fix broken && chain in sub-shell
t6036: add lots of detail for directory/file conflicts in recursive case

Merge branch 'sg/httpd-test-unflake'Junio C Hamano Thu, 2 Aug 2018 22:30:39 +0000 (15:30 -0700)

Merge branch 'sg/httpd-test-unflake'

httpd tests saw occasional breakage due to the way its access log
gets inspected by the tests, which has been updated to make them
less flaky.

* sg/httpd-test-unflake:
t/lib-httpd: avoid occasional failures when checking access.log
t/lib-httpd: add the strip_access_log() helper function
t5541: clean up truncating access log

Merge branch 'bp/test-drop-caches-for-windows'Junio C Hamano Thu, 2 Aug 2018 22:30:38 +0000 (15:30 -0700)

Merge branch 'bp/test-drop-caches-for-windows'

A test helper update for Windows.

* bp/test-drop-caches-for-windows:
handle lower case drive letters on Windows

Merge branch 'jk/has-uncommitted-changes-fix'Junio C Hamano Thu, 2 Aug 2018 22:30:37 +0000 (15:30 -0700)

Merge branch 'jk/has-uncommitted-changes-fix'

"git pull --rebase" on a corrupt HEAD caused a segfault. In
general we substitute an empty tree object when running the in-core
equivalent of the diff-index command, and the codepath has been
corrected to do so as well to fix this issue.

* jk/has-uncommitted-changes-fix:
has_uncommitted_changes(): fall back to empty tree

sha1dc: update from upstreamÆvar Arnfjörð Bjarmason Thu, 2 Aug 2018 20:50:44 +0000 (20:50 +0000)

sha1dc: update from upstream

Update sha1dc from the latest version by the upstream
maintainer[1]. See 2db87328ef ("Merge branch 'ab/sha1dc'", 2017-07-10)
for the last update.

This fixes an issue where AIX was wrongly detected as a Little-endian
instead of a Big-endian system. See [2][3][4].

1. https://github.com/cr-marcstevens/sha1collisiondetection/commit/232357eb2ea0397388254a4b188333a227bf5b10
2. https://github.com/cr-marcstevens/sha1collisiondetection/pull/45
3. https://github.com/cr-marcstevens/sha1collisiondetection/pull/42
4. https://public-inbox.org/git/20180729200623.GF945730@genre.crustytoothpaste.net/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

score_trees(): fix iteration over trees with missing... Jeff King Thu, 2 Aug 2018 18:58:21 +0000 (14:58 -0400)

score_trees(): fix iteration over trees with missing entries

In score_trees(), we walk over two sorted trees to find
which entries are missing or have different content between
the two. So if we have two trees with these entries:

one two
--- ---
a a
b c
c d

we'd expect the loop to:

- compare "a" to "a"

- compare "b" to "c"; because these are sorted lists, we
know that the second tree does not have "b"

- compare "c" to "c"

- compare "d" to end-of-list; we know that the first tree
does not have "d"

And prior to d8febde370 (match-trees: simplify score_trees()
using tree_entry(), 2013-03-24) that worked. But after that
commit, we mistakenly increment the tree pointers for every
loop iteration, even when we've processed the entry for only
one side. As a result, we end up doing this:

- compare "a" to "a"

- compare "b" to "c"; we know that we do not have "b", but
we still increment both tree pointers; at this point
we're out of sync and all further comparisons are wrong

- compare "c" to "d" and mistakenly claim that the second
tree does not have "c"

- exit the loop, mistakenly not realizing that the first
tree does not have "d"

So contrary to the claim in d8febde370, we really do need to
manually use update_tree_entry(), because advancing the tree
pointer depends on the entry comparison.

That means we must stop using tree_entry() to access each
entry, since it auto-advances the pointer. Instead:

- we'll use tree_desc.size directly to know if there's
anything left to look at (which is what tree_entry() was
doing under the hood)

- rather than do an extra struct assignment to "e1" and
"e2", we can just access the "entry" field of tree_desc
directly

That makes us a little more intimate with the tree_desc
code, but that's not uncommon for its callers.

The included test shows off the bug by adding a new entry
"bar.t", which sorts early in the tree and de-syncs the
comparison for "foo.t", which comes after.

Reported-by: George Shammas <georgyo@gmail.com>
Helped-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

remote: make refspec follow the same disambiguation... Junio C Hamano Wed, 1 Aug 2018 16:22:37 +0000 (09:22 -0700)

remote: make refspec follow the same disambiguation rule as local refs

When matching a non-wildcard LHS of a refspec against a list of
refs, find_ref_by_name_abbrev() returns the first ref that matches
using any DWIM rules used by refname_match() in refs.c, even if a
better match occurs later in the list of refs.

This causes unexpected behavior when (for example) fetching using
the refspec "refs/heads/s:<something>" from a remote with both
"refs/heads/refs/heads/s" and "refs/heads/s"; even if the former was
inadvertently created, one would still expect the latter to be
fetched. Similarly, when both a tag T and a branch T exist,
fetching T should favor the tag, just like how local refname
disambiguation rule works. But because the code walks over
ls-remote output from the remote, which happens to be sorted in
alphabetical order and has refs/heads/T before refs/tags/T, a
request to fetch T is (mis)interpreted as fetching refs/heads/T.

Update refname_match(), all of whose current callers care only if it
returns non-zero (i.e. matches) to see if an abbreviated name can
mean the full name being tested, so that it returns a positive
integer whose magnitude can be used to tell the precedence, and fix
the find_ref_by_name_abbrev() function not to stop at the first
match but find the match with the highest precedence.

This is based on an earlier work, which special cased only the exact
matches, by Jonathan Tan.

Helped-by: Jonathan Tan <jonathantanmy@google.com>
Helped-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

fetch-pack: unify ref in and out paramJonathan Tan Wed, 1 Aug 2018 20:13:20 +0000 (13:13 -0700)

fetch-pack: unify ref in and out param

When a user fetches:
- at least one up-to-date ref and at least one non-up-to-date ref,
- using HTTP with protocol v0 (or something else that uses the fetch
command of a remote helper)
some refs might not be updated after the fetch.

This bug was introduced in commit 989b8c4452 ("fetch-pack: put shallow
info in output parameter", 2018-06-28) which allowed transports to
report the refs that they have fetched in a new out-parameter
"fetched_refs". If they do so, transport_fetch_refs() makes this
information available to its caller.

Users of "fetched_refs" rely on the following 3 properties:
(1) it is the complete list of refs that was passed to
transport_fetch_refs(),
(2) it has shallow information (REF_STATUS_REJECT_SHALLOW set if
relevant), and
(3) it has updated OIDs if ref-in-want was used (introduced after
989b8c4452).

In an effort to satisfy (1), whenever transport_fetch_refs()
filters the refs sent to the transport, it re-adds the filtered refs to
whatever the transport supplies before returning it to the user.
However, the implementation in 989b8c4452 unconditionally re-adds the
filtered refs without checking if the transport refrained from reporting
anything in "fetched_refs" (which it is allowed to do), resulting in an
incomplete list, no longer satisfying (1).

An earlier effort to resolve this [1] solved the issue by readding the
filtered refs only if the transport did not refrain from reporting in
"fetched_refs", but after further discussion, it seems that the better
solution is to revert the API change that introduced "fetched_refs".
This API change was first suggested as part of a ref-in-want
implementation that allowed for ref patterns and, thus, there could be
drastic differences between the input refs and the refs actually fetched
[2]; we eventually decided to only allow exact ref names, but this API
change remained even though its necessity was decreased.

Therefore, revert this API change by reverting commit 989b8c4452, and
make receive_wanted_refs() update the OIDs in the sought array (like how
update_shallow() updates shallow information in the sought array)
instead. A test is also included to show that the user-visible bug
discussed at the beginning of this commit message no longer exists.

[1] https://public-inbox.org/git/20180801171806.GA122458@google.com/
[2] https://public-inbox.org/git/86a128c5fb710a41791e7183207c4d64889f9307.1485381677.git.jonathantanmy@google.com/

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

git-p4: add the `p4-pre-submit` hookChen Bin Fri, 27 Jul 2018 11:22:22 +0000 (21:22 +1000)

git-p4: add the `p4-pre-submit` hook

The `p4-pre-submit` hook is executed before git-p4 submits code.
If the hook exits with non-zero value, submit process not start.

Signed-off-by: Chen Bin <chenbin.sh@gmail.com>
Reviewed-by: Luke Diamand <luke@diamand.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

xdiff: reduce indent heuristic overheadStefan Beller Fri, 27 Jul 2018 22:23:56 +0000 (15:23 -0700)

xdiff: reduce indent heuristic overhead

Skip searching for better indentation heuristics if we'd slide a hunk more
than its size. This is the easiest fix proposed in the analysis[1] in
response to a patch that mercurial took for xdiff to limit searching
by a constant. Using a performance test as:

#!python
open('a', 'w').write(" \n" * 1000000)
open('b', 'w').write(" \n" * 1000001)

This patch reduces the execution of "git diff --no-index a b" from
0.70s to 0.31s. However limiting the sliding to the size of the diff hunk,
which was proposed as a solution (that I found easiest to implement for
now) is not optimal for cases like

open('a', 'w').write(" \n" * 1000000)
open('b', 'w').write(" \n" * 2000000)

as then we'd still slide 1000000 times.

In addition to limiting the sliding to size of the hunk, also limit by a
constant. Choose 100 lines as the constant as that fits more than a screen,
which really means that the diff sliding is probably not providing a lot
of benefit anyway.

[1] https://public-inbox.org/git/72ac1ac2-f567-f241-41d6-d0f83072e0b3@alum.mit.edu/

Reported-by: Jun Wu <quark@fb.com>
Analysis-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

fetch doc: cross-link two new negotiation optionsÆvar Arnfjörð Bjarmason Wed, 1 Aug 2018 15:18:35 +0000 (15:18 +0000)

fetch doc: cross-link two new negotiation options

Users interested in the fetch.negotiationAlgorithm variable added in
42cc7485a2 ("negotiator/skipping: skip commits during fetch",
2018-07-16) are probably interested in the related --negotiation-tip
option added in 3390e42adb ("fetch-pack: support negotiation tip
whitelist", 2018-07-02).

Change the documentation for those two to reference one another to
point readers in the right direction.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

negotiator: unknown fetch.negotiationAlgorithm should... Ævar Arnfjörð Bjarmason Wed, 1 Aug 2018 15:18:34 +0000 (15:18 +0000)

negotiator: unknown fetch.negotiationAlgorithm should error out

Change the handling of fetch.negotiationAlgorithm=<str> to error out
on unknown strings, i.e. everything except "default" or "skipping".

This changes the behavior added in 42cc7485a2 ("negotiator/skipping:
skip commits during fetch", 2018-07-16) which would ignore all unknown
values and silently fall back to the "default" value.

For a feature like this it's much better to produce an error than
proceed. We don't want users to debug some amazingly slow fetch that
should benefit from "skipping", only to find that they'd forgotten to
deploy the new git version on that particular machine.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge branch 'jt/fetch-nego-tip' into ab/fetch-negoJunio C Hamano Wed, 1 Aug 2018 18:07:35 +0000 (11:07 -0700)

Merge branch 'jt/fetch-nego-tip' into ab/fetch-nego

* jt/fetch-nego-tip:
fetch-pack: support negotiation tip whitelist

travis-ci: include the trash directories of failed... SZEDER Gábor Tue, 31 Jul 2018 22:56:12 +0000 (00:56 +0200)

travis-ci: include the trash directories of failed tests in the trace log

The trash directory of a failed test might contain invaluable
information about the cause of the failure, but we have no access to
the trash directories of Travis CI build jobs. The only feedback we
get from there is the build job's trace log, so...

Modify 'ci/print-test-failures.sh' to create a tar.gz archive of the
trash directory of each failed test, encode that archive with base64,
and print the resulting block of ASCII text, so it gets embedded in
the trace log. Furthermore, run tests with '--immediate' to
faithfully preserve the failed state.

Extracting the trash directories from the trace log turned out to be a
bit of a hassle, partly because of the size of these logs (usually
resulting in several hundreds or even thousands of lines of
base64-encoded text), and partly because these logs have CRLF, CRCRLF
and occasionally even CRCRCRLF line endings, which cause 'base64 -d'
from coreutils to complain about "invalid input". For convenience add
a small script 'ci/util/extract-trash-dirs.sh', which will extract and
unpack all base64-encoded trash directories embedded in the log fed to
its standard input, and include an example command to be copy-pasted
into a terminal to do it all at the end of the failure report.

A few of our tests create sizeable trash directories, so limit the
size of each included base64-encoded block, let's say, to 1MB. And
just in case something fundamental gets broken and a lot of tests fail
at once, don't include trash directories when the combined size of the
included base64-encoded blocks would exceed 1MB.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

remote: clear string_list after use in mv()René Scharfe Wed, 1 Aug 2018 10:19:07 +0000 (12:19 +0200)

remote: clear string_list after use in mv()

Switch to the _DUP variant of string_list for remote_branches to allow
string_list_clear() to release the allocated memory at the end, and
actually call that function. Free the util pointer as well; it is
allocated in read_remote_branches().

NB: This string_list is empty until read_remote_branches() is called
via for_each_ref(), so there is no need to clean it up when returning
before that point.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

read-cache: fix directory/file conflict handling in... Elijah Newren Tue, 31 Jul 2018 17:12:05 +0000 (10:12 -0700)

read-cache: fix directory/file conflict handling in read_index_unmerged()

read_index_unmerged() has two intended purposes:
* return 1 if there are any unmerged entries, 0 otherwise
* drops any higher-stage entries down to stage #0

There are several callers of read_index_unmerged() that check the return
value to see if it is non-zero, all of which then die() if that condition
is met. For these callers, dropping higher-stage entries down to stage #0
is a waste of resources, and returning immediately on first unmerged entry
would be better. But it's probably only a very minor difference and isn't
the focus of this series.

The remaining callers ignore the return value and call this function for
the side effect of dropping higher-stage entries down to stage #0. As
mentioned in commit e11d7b596970 ("'reset --merge': fix unmerged case",
2009-12-31),

The _only_ reason we want to keep a previously unmerged entry in the
index at stage #0 is so that we don't forget the fact that we have
corresponding file in the work tree in order to be able to remove it
when the tree we are resetting to does not have the path.

In fact, prior to commit d1a43f2aa4bf ("reset --hard/read-tree --reset -u:
remove unmerged new paths", 2008-10-15), read_index_unmerged() did just
remove unmerged entries from the cache immediately but that had the
unwanted effect of leaving around new untracked files in the tree from
aborted merges.

So, that's the intended purpose of this function. The problem is that
when directory/files conflicts are present, trying to add the file to the
index at stage 0 fails (because there is still a directory in the way),
and the function returns early with a -1 return code to signify the error.
As noted above, none of the callers who want the drop-to-stage-0 behavior
check the return status, though, so this means all remaining unmerged
entries remain in the index and the callers proceed assuming otherwise.
Users then see errors of the form:

error: 'DIR-OR-FILE' appears as both a file and as a directory
error: DIR-OR-FILE: cannot drop to stage #0

and potentially also messages about other unmerged entries which came
lexicographically later than whatever pathname was both a file and a
directory. Google finds a few hits searching for those messages,
suggesting there were probably a couple people who hit this besides me.
Luckily, calling `git reset --hard` multiple times would workaround
this bug.

Since the whole purpose here is to just put the entry *temporarily* into
the index so that any associated file in the working copy can be removed,
we can just skip the DFCHECK and allow both the file and directory to
appear in the index. The temporary simultaneous appearance of the
directory and file entries in the index will be removed by the callers
by calling unpack_trees(), which excludes these unmerged entries marked
with CE_CONFLICTED flag from the resulting index, before they attempt to
write the index anywhere.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t1015: demonstrate directory/file conflict recovery... Elijah Newren Tue, 31 Jul 2018 17:12:04 +0000 (10:12 -0700)

t1015: demonstrate directory/file conflict recovery failures

Several "recovery" commands outright fail or do not fully recover
when directory-file conflicts are present. This includes:
* git read-tree --reset HEAD
* git am --skip
* git am --abort
* git merge --abort
* git reset --hard

Add testcases documenting these shortcomings.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

sequencer: don't die() on bogus user-edited timestampEric Sunshine Tue, 31 Jul 2018 07:33:31 +0000 (03:33 -0400)

sequencer: don't die() on bogus user-edited timestamp

read_author_ident() is careful to handle errors "gently" when parsing
"rebase-merge/author-script" by printing a suitable warning and
returning NULL; it never die()'s. One possible reason that parsing might
fail is that "rebase-merge/author-script" has been hand-edited in such a
way which corrupts it or the information it contains.

However, read_author_ident() invokes fmt_ident() which is not so careful
about failing "gently". It will die() if it encounters a malformed
timestamp. Since read_author_ident() doesn't want to die() and since
it's dealing with possibly hand-edited data, take care to avoid passing
a bogus timestamp to fmt_ident().

A more "correctly engineered" fix would be to add a "gentle" version of
fmt_ident(), however, such a change it outside the scope of the bug-fix
series. If fmt_ident() ever does grow a "gentle" cousin, then the manual
timestamp check added here can be retired.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

sequencer: fix "rebase -i --root" corrupting author... Eric Sunshine Tue, 31 Jul 2018 07:33:30 +0000 (03:33 -0400)

sequencer: fix "rebase -i --root" corrupting author header timestamp

When "git rebase -i --root" creates a new root commit, it corrupts the
"author" header's timestamp by prepending a "@":

author A U Thor <author@example.com> @1112912773 -0700

The commit parser is very strict about the format of the "author"
header, and does not allow a "@" in that position.

The "@" comes from GIT_AUTHOR_DATE in "rebase-merge/author-script",
signifying a Unix epoch-based timestamp, however, read_author_ident()
incorrectly allows it to slip into the commit's "author" header, thus
corrupting it.

One possible fix would be simply to filter out the "@" when constructing
the "author" header timestamp, however, a more correct fix is to parse
the GIT_AUTHOR_DATE date (via parse_date()) and format the parsed result
into the "author" header. Since "rebase-merge/author-script" may be
edited by the user, this approach has the extra benefit of catching
other potential timestamp corruption due to hand-editing.

We can do better than calling parse_date() ourselves and constructing
the "author" header manually, however, by instead taking advantage of
fmt_ident() which does this work for us.

The benefits of using fmt_ident() are twofold. First, it simplifies the
logic considerably by allowing us to avoid the complexity of building
the "author" header in parallel with and in the same buffer from which
"rebase-merge/author-script" is being parsed. Instead, fmt_ident() is
invoked to compose the header after parsing is complete.

Second, fmt_ident() is careful to prevent "crud" from polluting the
composed ident. As with validating GIT_AUTHOR_DATE, this "crud"
avoidance prevents other (possibly hand-edited) bogus author information
from "rebase-merge/author-script" from corrupting the commit object.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

sequencer: fix "rebase -i --root" corrupting author... Eric Sunshine Tue, 31 Jul 2018 07:33:29 +0000 (03:33 -0400)

sequencer: fix "rebase -i --root" corrupting author header timezone

When "git rebase -i --root" creates a new root commit, it corrupts the
"author" header's timezone by repeating the last digit:

author A U Thor <author@example.com> @1112912773 -07000

This is due to two bugs.

First, write_author_script() neglects to add the closing quote to the
value of GIT_AUTHOR_DATE when generating "rebase-merge/author-script".

Second, although sq_dequote() correctly diagnoses the missing closing
quote, read_author_ident() ignores sq_dequote()'s return value and
blindly uses the result of the aborted dequote.

sq_dequote() performs dequoting in-place by removing quoting and
shifting content downward. When it detects misquoting (lack of closing
quote, in this case), it gives up and returns an error without inserting
a NUL-terminator at the end of the shifted content, which explains the
duplicated last digit in the timezone.

(Note that the "@" preceding the timestamp is a separate bug which
will be fixed subsequently.)

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

sequencer: fix "rebase -i --root" corrupting author... Eric Sunshine Tue, 31 Jul 2018 07:33:28 +0000 (03:33 -0400)

sequencer: fix "rebase -i --root" corrupting author header

When "git rebase -i --root" creates a new root commit (say, by swapping
in a different commit for the root), it corrupts the commit's "author"
header with trailing garbage:

author A U Thor <author@example.com> @1112912773 -07000or@example.com

This is a result of read_author_ident() neglecting to NUL-terminate the
buffer into which it composes the "author" header.

(Note that the "@" preceding the timestamp and the extra "0" in the
timezone are separate bugs which will be fixed subsequently.)

Security considerations: Construction of the "author" header by
read_author_ident() happens in-place and in parallel with parsing the
content of "rebase-merge/author-script" which occupies the same buffer.
This is possible because the constructed "author" header is always
smaller than the content of "rebase-merge/author-script". Despite
neglecting to NUL-terminate the constructed "author" header, memory is
never accessed (either by read_author_ident() or its caller) beyond the
allocated buffer since a NUL-terminator is present at the end of the
loaded "rebase-merge/author-script" content, and additional NUL's are
inserted as part of the parsing process.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t/chainlint.sed: drop extra spaces from regex character... Eric Sunshine Tue, 31 Jul 2018 05:03:20 +0000 (01:03 -0400)

t/chainlint.sed: drop extra spaces from regex character class

This character class, like many others in this script, matches
horizontal whitespace consisting of spaces and tabs, however, a few
extra, entirely harmless, spaces somehow slipped into the expression.
Removing them is purely a cosmetic fix.

While at it, re-indent three lines with a single TAB each which were
incorrectly indented with six spaces. Also, a purely cosmetic fix.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

mw-to-git/t9360: fix broken &&-chainEric Sunshine Mon, 30 Jul 2018 20:46:46 +0000 (16:46 -0400)

mw-to-git/t9360: fix broken &&-chain

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

subtree test: simplify preparation of expected resultsJonathan Nieder Mon, 30 Jul 2018 19:07:38 +0000 (12:07 -0700)

subtree test: simplify preparation of expected results

This mixture of quoting, pipes, and here-docs to produce expected
results in shell variables is difficult to follow. Simplify by using
simpler constructs that write output to files instead.

Noticed because without this patch, t/chainlint is not able to
understand the script in order to validate that its subshells use an
unbroken &&-chain, causing "make -C contrib/subtree test" to fail with

error: bug in the test script: broken &&-chain or run-away HERE-DOC:

in t7900.21.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

subtree test: add missing && to &&-chainJonathan Nieder Mon, 30 Jul 2018 19:07:02 +0000 (12:07 -0700)

subtree test: add missing && to &&-chain

Detected using t/chainlint.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

vscode: let cSpell work on commit messages, tooJohannes Schindelin Mon, 30 Jul 2018 15:42:58 +0000 (08:42 -0700)

vscode: let cSpell work on commit messages, too

By default, the cSpell extension ignores all files under .git/. That
includes, unfortunately, COMMIT_EDITMSG, i.e. commit messages. However,
spell checking is *quite* useful when writing commit messages... And
since the user hardly ever opens any file inside .git (apart from commit
messages, the config, and sometimes interactive rebase's todo lists),
there is really not much harm in *not* ignoring .git/.

The default also ignores `node_modules/`, but that does not apply to
Git, so let's skip ignoring that, too.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

vscode: add a dictionary for cSpellJohannes Schindelin Mon, 30 Jul 2018 15:42:57 +0000 (08:42 -0700)

vscode: add a dictionary for cSpell

The quite useful cSpell extension allows VS Code to have "squiggly"
lines under spelling mistakes. By default, this would add too much
clutter, though, because so much of Git's source code uses words that
would trigger cSpell.

Let's add a few words to make the spell checking more useful by reducing
the number of false positives.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

vscode: use 8-space tabs, no trailing ws, etc for Git... Johannes Schindelin Mon, 30 Jul 2018 15:42:55 +0000 (08:42 -0700)

vscode: use 8-space tabs, no trailing ws, etc for Git's source code

This adds a couple settings for the .c/.h files so that it is easier to
conform to Git's conventions while editing the source code.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

vscode: wrap commit messages at column 72 by defaultJohannes Schindelin Mon, 30 Jul 2018 15:42:54 +0000 (08:42 -0700)

vscode: wrap commit messages at column 72 by default

When configuring VS Code as core.editor (via `code --wait`), we really
want to adhere to the Git conventions of wrapping commit messages.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

vscode: only overwrite C/C++ settingsJohannes Schindelin Mon, 30 Jul 2018 15:42:52 +0000 (08:42 -0700)

vscode: only overwrite C/C++ settings

The C/C++ settings are special, as they are the only generated VS Code
configurations that *will* change over the course of Git's development,
e.g. when a new constant is defined.

Therefore, let's only update the C/C++ settings, also to prevent user
modifications from being overwritten.

Ideally, we would keep user modifications in the C/C++ settings, but
that would require parsing JSON, a task for which a Unix shell script is
distinctly unsuited. So we write out .new files instead, and warn the
user if they may want to reconcile their changes.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

mingw: define WIN32 explicitlyJohannes Schindelin Mon, 30 Jul 2018 15:42:51 +0000 (08:42 -0700)

mingw: define WIN32 explicitly

This helps VS Code's intellisense to figure out that we want to include
windows.h, and that we want to define the minimum target Windows version
as Windows Vista/2008R2.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

cache.h: extract enum declaration from inside a struct... Johannes Schindelin Mon, 30 Jul 2018 15:42:49 +0000 (08:42 -0700)

cache.h: extract enum declaration from inside a struct declaration

While it is technically possible, it is confusing. Not only the user,
but also VS Code's intellisense.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

vscode: hard-code a couple definesJohannes Schindelin Mon, 30 Jul 2018 15:42:48 +0000 (08:42 -0700)

vscode: hard-code a couple defines

Sadly, we do not get all of the definitions via ALL_CFLAGS. Some defines
are passed to GCC *only* when compiling specific files, such as git.o.

Let's just hard-code them into the script for the time being.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

contrib: add a script to initialize VS Code configurationJohannes Schindelin Mon, 30 Jul 2018 15:42:46 +0000 (08:42 -0700)

contrib: add a script to initialize VS Code configuration

VS Code is a lightweight but powerful source code editor which runs on
your desktop and is available for Windows, macOS and Linux. Among other
languages, it has support for C/C++ via an extension, which offers to
not only build and debug the code, but also Intellisense, i.e.
code-aware completion and similar niceties.

This patch adds a script that helps set up the environment to work
effectively with VS Code: simply run the Unix shell script
contrib/vscode/init.sh, which creates the relevant files, and open the
top level folder of Git's source code in VS Code.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

pack-objects: document about thread synchronizationNguyễn Thái Ngọc Duy Sun, 29 Jul 2018 15:36:05 +0000 (17:36 +0200)

pack-objects: document about thread synchronization

These extra comments should be make it easier to understand how to use
locks in pack-objects delta search code. For reference, see

8ecce684a3 (basic threaded delta search - 2007-09-06)
384b32c09b (pack-objects: fix threaded load balancing - 2007-12-08)
50f22ada52 (threaded pack-objects: Use condition... - 2007-12-16)

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t5562: avoid non-portable "export FOO=bar" constructRamsay Jones Sat, 28 Jul 2018 22:51:28 +0000 (23:51 +0100)

t5562: avoid non-portable "export FOO=bar" construct

Commit 6c213e863a ("http-backend: respect CONTENT_LENGTH for
receive-pack", 2018-07-27) adds a test which uses the non-portable
export construct. Replace it with "FOO=bar && export FOO" instead.

Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

doc: fix want-capability separatorMasaya Suzuki Sat, 28 Jul 2018 21:16:30 +0000 (14:16 -0700)

doc: fix want-capability separator

Unlike ref advertisement, client capabilities and the first want are
separated by SP, not NUL, in the implementation. Fix the documentation
to align with the implementation. pack-protocol.txt is already fixed.

Signed-off-by: Masaya Suzuki <masayasuzuki@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

tests: make use of the test_must_be_empty functionÆvar Arnfjörð Bjarmason Fri, 27 Jul 2018 17:48:11 +0000 (17:48 +0000)

tests: make use of the test_must_be_empty function

Change various tests that use an idiom of the form:

>expect &&
test_cmp expect actual

To instead use:

test_must_be_empty actual

The test_must_be_empty() wrapper was introduced in ca8d148daf ("test:
test_must_be_empty helper", 2013-06-09). Many of these tests have been
added after that time. This was mostly found with, and manually pruned
from:

git grep '^\s+>.*expect.* &&$' t

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

fsck: test and document unknown fsck.<msg-id> valuesÆvar Arnfjörð Bjarmason Fri, 27 Jul 2018 14:37:20 +0000 (14:37 +0000)

fsck: test and document unknown fsck.<msg-id> values

When fsck.<msg-id> is set to an unknown value it'll cause "fsck" to
die, but the same is not true of the "fetch" and "receive"
variants. Document this and test for it.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

fsck: add stress tests for fsck.skipListÆvar Arnfjörð Bjarmason Fri, 27 Jul 2018 14:37:19 +0000 (14:37 +0000)

fsck: add stress tests for fsck.skipList

Stress test the parsing logic shared by fsck.skipList and
{fetch,receive}.fsck.skipList added in cd94c6f91e ("fsck: git
receive-pack: support excluding objects from fsck'ing",
2015-06-22). There were no tests for the work done by the
init_skiplist() routine, e.g. how it dies on invalid input.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

fsck: test & document {fetch,receive}.fsck.* config... Ævar Arnfjörð Bjarmason Fri, 27 Jul 2018 14:37:18 +0000 (14:37 +0000)

fsck: test & document {fetch,receive}.fsck.* config fallback

Test and document that the {fetch,receive}.fsck.* family of variables
doesn't fall back on the corresponding .fsck.* variables.

This was alluded to in the existing documentation by saying that
"receive" looks at receive.fsck.* and "fsck" looks at fsck.* etc., but
it wasn't explicitly stated that there was no fallback, and if you'd
e.g. like to configure the skipList you need to do that for all three.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

fetch: implement fetch.fsck.*Ævar Arnfjörð Bjarmason Fri, 27 Jul 2018 14:37:17 +0000 (14:37 +0000)

fetch: implement fetch.fsck.*

Implement support for fetch.fsck.* corresponding with the existing
receive.fsck.*. This allows for pedantically cloning repositories with
specific issues without turning off fetch.fsckObjects.

One such repository is https://github.com/robbyrussell/oh-my-zsh.git
which before this change will emit this error when cloned with
fetch.fsckObjects:

error: object 2b7227859263b6aabcc28355b0b994995b7148b6: zeroPaddedFilemode: contains zero-padded file modes
fatal: Error in object
fatal: index-pack failed

Now with fetch.fsck.zeroPaddedFilemode=warn we'll warn about that
issue, but the clone will succeed:

warning: object 2b7227859263b6aabcc28355b0b994995b7148b6: zeroPaddedFilemode: contains zero-padded file modes
warning: object a18c4d13c2a5fa2d4ecd5346c50e119b999b807d: zeroPaddedFilemode: contains zero-padded file modes
warning: object 84df066176c8da3fd59b13731a86d90f4f1e5c9d: zeroPaddedFilemode: contains zero-padded file modes

The motivation for this is to be able to turn on fetch.fsckObjects
globally across a fleet of computers but still be able to manually
clone various legacy repositories by either white-listing specific
issues, or better yet whitelist specific objects.

The use of --git-dir=* instead of -C in the tests could be considered
somewhat archaic, but the tests I'm adding here are duplicating the
corresponding receive.* tests with as few changes as possible.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

transfer.fsckObjects tests: untangle confusing setupÆvar Arnfjörð Bjarmason Fri, 27 Jul 2018 14:37:16 +0000 (14:37 +0000)

transfer.fsckObjects tests: untangle confusing setup

The tests for transfer.fsckObjects have grown organically over time to
not make much sense.

Initially when these were added in b10a53583f ("test: fetch/receive
with fsckobjects", 2011-09-04) they were only testing the "corrupt or
missing object" case, but later on in 70a4ae73d8 ("fsck: add a simple
test for receive.fsck.<msg-id>", 2015-06-22) they were expanded to
check for the fsck.<msg-id> feature.

The problem was that we still kept the same corrupt test repo, making
it harder to add new tests that check the entirety of the repository
between operations via "git fsck" to see whether only known issues
that can be ignored with fsck.<msg-id> have occurred.

The tests only did the right thing because such a full "git fsck" was
never done after a certain point, and instead we were only
manipulating specific refs. This makes it harder to add new tests, and
none of the fsck.<msg-id> tests relied on this.

So let's not confuse the two and repair the corrupt repository before
we run the fsck.<msg-id> tests.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

config doc: elaborate on fetch.fsckObjects securityÆvar Arnfjörð Bjarmason Fri, 27 Jul 2018 14:37:15 +0000 (14:37 +0000)

config doc: elaborate on fetch.fsckObjects security

Change the transfer.fsckObjects documentation to explicitly note the
unique security and/or corruption issues fetch.fsckObjects suffers
from, since it doesn't have a quarantine environment.

This was already alluded to in the existing documentation, but let's
spell it out so there's no confusion here, and give a concrete example
of how to work around this limitation.

Let's also prominently note that this is considered to be a limitation
of the current implementation, rather than something that's intended
and by design, since we might change this in the future.

See
https://public-inbox.org/git/20180531060259.GE17344@sigill.intra.peff.net/
for further details.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

config doc: elaborate on what transfer.fsckObjects... Ævar Arnfjörð Bjarmason Fri, 27 Jul 2018 14:37:14 +0000 (14:37 +0000)

config doc: elaborate on what transfer.fsckObjects does

The existing documentation led the user to believe that all we were
doing were basic reachability sanity checks, but that hasn't been true
for a very long time. Update the description to match reality, and
note the caveat that there's a quarantine for accepting pushes, but
not for fetching.

Also mention that the fsck checks for security issues, which was my
initial motivation for writing this fetch.fsck.* series.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

config doc: unify the description of fsck.* and receive... Ævar Arnfjörð Bjarmason Fri, 27 Jul 2018 14:37:13 +0000 (14:37 +0000)

config doc: unify the description of fsck.* and receive.fsck.*

The documentation for the fsck.<msg-id> and receive.fsck.<msg-id>
variables was mostly duplicated in two places, with fsck.<msg-id>
making no mention of the corresponding receive.fsck.<msg-id>, and the
same for fsck.skipList.

I spent quite a lot of time today wondering why setting the
fsck.<msg-id> variant wasn't working to clone a legacy repository (not
that that would have worked anyway, but a subsequent patch implements
fetch.fsck.<msg-id>).

Rectify this situation by describing the feature in general terms
under the fsck.* documentation, and make the receive.fsck.*
documentation refer to those variables instead.

This documentation was initially added in 2becf00ff7 ("fsck: support
demoting errors to warnings", 2015-06-22) and 4b55b9b479 ("fsck:
document the new receive.fsck.<msg-id> options", 2015-06-22).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

config doc: don't describe *.fetchObjects twiceÆvar Arnfjörð Bjarmason Fri, 27 Jul 2018 14:37:12 +0000 (14:37 +0000)

config doc: don't describe *.fetchObjects twice

Refer readers of fetch.fsckObjects and receive.fsckObjects to
transfer.fsckObjects instead of repeating the description at each
location.

I don't think this description of them makes much sense, but for now
I'm just moving the existing documentation around. Making it better
will be done in a later patch.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

receive.fsck.<msg-id> tests: remove dead codeÆvar Arnfjörð Bjarmason Fri, 27 Jul 2018 14:37:11 +0000 (14:37 +0000)

receive.fsck.<msg-id> tests: remove dead code

Remove the setting of a receive.fsck.badDate config variable to
"ignore". This was added in efaba7cc77 ("fsck: optionally ignore
specific fsck issues completely", 2015-06-22) but never did anything,
presumably it was part of some work-in-progress code that never made
it into git.git.

None of these tests will emit the "invalid author/committer line - bad
date" warning. The dates on the commit objects we're setting up are
not invalid.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

merge-recursive: preserve skip_worktree bit when necessaryElijah Newren Fri, 27 Jul 2018 12:59:44 +0000 (12:59 +0000)

merge-recursive: preserve skip_worktree bit when necessary

merge-recursive takes any files marked as unmerged by unpack_trees,
tries to figure out whether they can be resolved (e.g. using renames
or a file-level merge), and then if they can be it will delete the old
cache entries and writes new ones. This means that any ce_flags for
those cache entries are essentially cleared when merging.

Unfortunately, if a file was marked as skip_worktree and it needs a
file-level merge but the merge results in the same version of the file
that was found in HEAD, we skip updating the worktree (because the
file was unchanged) but clear the skip_worktree bit (because of the
delete-cache-entry-and-write-new-one). This makes git treat the file
as having a local change in the working copy, namely a delete, when it
should appear as unchanged despite not being present. Avoid this
problem by copying the skip_worktree flag in this case.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t3507: add a testcase showing failure with sparse checkoutBen Peart Fri, 27 Jul 2018 12:59:42 +0000 (12:59 +0000)

t3507: add a testcase showing failure with sparse checkout

Recent changes in merge_content() induced a bug when merging files that are
not present in the local working directory due to sparse-checkout. Add a
test case to demonstrate the bug so that we can ensure the fix resolves
it and to prevent future regressions.

Signed-off-by: Ben Peart <benpeart@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

http-backend: respect CONTENT_LENGTH for receive-packMax Kirillov Fri, 27 Jul 2018 03:48:59 +0000 (06:48 +0300)

http-backend: respect CONTENT_LENGTH for receive-pack

Push passes to another commands, as described in
https://public-inbox.org/git/20171129032214.GB32345@sigill.intra.peff.net/

As it gets complicated to correctly track the data length, instead transfer
the data through parent process and cut the pipe as the specified length is
reached. Do it only when CONTENT_LENGTH is set, otherwise pass the input
directly to the forked commands.

Add tests for cases:

* CONTENT_LENGTH is set, script's stdin has more data, with all combinations
of variations: fetch or push, plain or compressed body, correct or truncated
input.

* CONTENT_LENGTH is specified to a value which does not fit into ssize_t.

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Max Kirillov <max@max630.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

packfile: ensure that enum object_type is definedBeat Bolli Wed, 25 Jul 2018 21:56:07 +0000 (23:56 +0200)

packfile: ensure that enum object_type is defined

When compiling under Apple LLVM version 9.1.0 (clang-902.0.39.2) with
"make DEVELOPER=1 DEVOPTS=pedantic", the compiler says

error: redeclaration of already-defined enum 'object_type' is a GNU
extension [-Werror,-Wgnu-redeclared-enum]

According to https://en.cppreference.com/w/c/language/declarations
(section "Redeclaration"), a repeated declaration after the definition
is only legal for structs and unions, but not for enums.

Drop the belated declaration of enum object_type and include cache.h
instead to make sure the enum is defined.

Helped-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Beat Bolli <dev+git@drbeat.li>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

banned.h: mark strncpy() as bannedJeff King Tue, 24 Jul 2018 09:28:28 +0000 (05:28 -0400)

banned.h: mark strncpy() as banned

The strncpy() function is less horrible than strcpy(), but
is still pretty easy to misuse because of its funny
termination semantics. Namely, that if it truncates it omits
the NUL terminator, and you must remember to add it
yourself. Even if you use it correctly, it's sometimes hard
for a reader to verify this without hunting through the
code. If you're thinking about using it, consider instead:

- strlcpy() if you really just need a truncated but
NUL-terminated string (we provide a compat version, so
it's always available)

- xsnprintf() if you're sure that what you're copying
should fit

- strbuf or xstrfmt() if you need to handle
arbitrary-length heap-allocated strings

Note that there is one instance of strncpy in
compat/regex/regcomp.c, which is fine (it allocates a
sufficiently large string before copying). But this doesn't
trigger the ban-list even when compiling with NO_REGEX=1,
because:

1. we don't use git-compat-util.h when compiling it
(instead we rely on the system includes from the
upstream library); and

2. It's in an "#ifdef DEBUG" block

Since it's doesn't trigger the banned.h code, we're better
off leaving it to keep our divergence from upstream minimal.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

banned.h: mark sprintf() as bannedJeff King Tue, 24 Jul 2018 09:27:19 +0000 (05:27 -0400)

banned.h: mark sprintf() as banned

The sprintf() function (and its variadic form vsprintf) make
it easy to accidentally introduce a buffer overflow. If
you're thinking of using them, you're better off either
using a dynamic string (strbuf or xstrfmt), or xsnprintf if
you really know that you won't overflow. The last sprintf()
call went away quite a while ago in f0766bf94e (fsck: use
for_each_loose_file_in_objdir, 2015-09-24).

Note that we respect HAVE_VARIADIC_MACROS here, which some
ancient platforms lack. As a fallback, we can just "guess"
that the caller will provide 3 arguments. If they do, then
the macro will work as usual. If not, then they'll get a
slightly less useful error, like:

git.c:718:24: error: macro "sprintf" passed 3 arguments, but takes just 2

That's not ideal, but it at least alerts them to the problem
area. And anyway, we're primarily targeting people adding
new code. Most developers should be on modern enough
platforms to see the normal "good" error message.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

banned.h: mark strcat() as bannedJeff King Tue, 24 Jul 2018 09:26:39 +0000 (05:26 -0400)

banned.h: mark strcat() as banned

The strcat() function has all of the same overflow problems
as strcpy(). And as a bonus, it's easy to end up
accidentally quadratic, as each subsequent call has to walk
through the existing string.

The last strcat() call went away in f063d38b80 (daemon: use
cld->env_array when re-spawning, 2015-09-24). In general,
strcat() can be replaced either with a dynamic string
(strbuf or xstrfmt), or with xsnprintf if you know the
length is bounded.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

automatically ban strcpy()Jeff King Thu, 26 Jul 2018 07:21:05 +0000 (03:21 -0400)

automatically ban strcpy()

There are a few standard C functions (like strcpy) which are
easy to misuse. E.g.:

char path[PATH_MAX];
strcpy(path, arg);

may overflow the "path" buffer. Sometimes there's an earlier
constraint on the size of "arg", but even in such a case
it's hard to verify that the code is correct. If the size
really is unbounded, you're better off using a dynamic
helper like strbuf:

struct strbuf path = STRBUF_INIT;
strbuf_addstr(path, arg);

or if it really is bounded, then use xsnprintf to show your
expectation (and get a run-time assertion):

char path[PATH_MAX];
xsnprintf(path, sizeof(path), "%s", arg);

which makes further auditing easier.

We'd usually catch undesirable code like this in a review,
but there's no automated enforcement. Adding that
enforcement can help us be more consistent and save effort
(and a round-trip) during review.

This patch teaches the compiler to report an error when it
sees strcpy (and will become a model for banning a few other
functions). This has a few advantages over a separate
linting tool:

1. We know it's run as part of a build cycle, so it's
hard to ignore. Whereas an external linter is an extra
step the developer needs to remember to do.

2. Likewise, it's basically free since the compiler is
parsing the code anyway.

3. We know it's robust against false positives (unlike a
grep-based linter).

The two big disadvantages are:

1. We'll only check code that is actually compiled, so it
may miss code that isn't triggered on your particular
system. But since presumably people don't add new code
without compiling it (and if they do, the banned
function list is the least of their worries), we really
only care about failing to clean up old code when
adding new functions to the list. And that's easy
enough to address with a manual audit when adding a new
function (which is what I did for the functions here).

2. If this ends up generating false positives, it's going
to be harder to disable (as opposed to a separate
linter, which may have mechanisms for overriding a
particular case).

But the intent is to only ban functions which are
obviously bad, and for which we accept using an
alternative even when this particular use isn't buggy
(e.g., the xsnprintf alternative above).

The implementation here is simple: we'll define a macro for
the banned function which replaces it with a reference to a
descriptively named but undeclared identifier. Replacing it
with any invalid code would work (since we just want to
break compilation). But ideally we'd meet these goals:

- it should be portable; ideally this would trigger
everywhere, and does not need to be part of a DEVELOPER=1
setup (because unlike warnings which may depend on the
compiler or system, this is a clear indicator of
something wrong in the code).

- it should generate a readable error that gives the
developer a clue what happened

- it should avoid generating too much other cruft that
makes it hard to see the actual error

- it should mention the original callsite in the error

The output with this patch looks like this (using gcc 7, on
a checkout with 022d2ac1f3 reverted, which removed the final
strcpy from blame.c):

CC builtin/blame.o
In file included from ./git-compat-util.h:1246,
from ./cache.h:4,
from builtin/blame.c:8:
builtin/blame.c: In function ‘cmd_blame’:
./banned.h:11:22: error: ‘sorry_strcpy_is_a_banned_function’ undeclared (first use in this function)
#define BANNED(func) sorry_##func##_is_a_banned_function
^~~~~~
./banned.h:14:21: note: in expansion of macro ‘BANNED’
#define strcpy(x,y) BANNED(strcpy)
^~~~~~
builtin/blame.c:1074:4: note: in expansion of macro ‘strcpy’
strcpy(repeated_meta_color, GIT_COLOR_CYAN);
^~~~~~
./banned.h:11:22: note: each undeclared identifier is reported only once for each function it appears in
#define BANNED(func) sorry_##func##_is_a_banned_function
^~~~~~
./banned.h:14:21: note: in expansion of macro ‘BANNED’
#define strcpy(x,y) BANNED(strcpy)
^~~~~~
builtin/blame.c:1074:4: note: in expansion of macro ‘strcpy’
strcpy(repeated_meta_color, GIT_COLOR_CYAN);
^~~~~~

This prominently shows the phrase "strcpy is a banned
function", along with the original callsite in blame.c and
the location of the ban code in banned.h. Which should be
enough to get even a developer seeing this for the first
time pointed in the right direction.

This doesn't match our ideals perfectly, but it's a pretty
good balance. A few alternatives I tried:

1. Instead of using an undeclared variable, using an
undeclared function. This shortens the message, because
the "each undeclared identifier" message is not needed
(and as you can see above, it triggers a separate
mention of each of the expansion points).

But it doesn't actually stop compilation unless you use
-Werror=implicit-function-declaration in your CFLAGS.
This is the case for DEVELOPER=1, but not for a default
build (on the other hand, we'd eventually produce a
link error pointing to the correct source line with the
descriptive name).

2. The linux kernel uses a similar mechanism in its
BUILD_BUG_ON_MSG(), where they actually declare the
function but do so with gcc's error attribute. But
that's not portable to other compilers (and it also
runs afoul of our error() macro).

We could make a gcc-specific technique and fallback on
other compilers, but it's probably not worth the
complexity. It also isn't significantly shorter than
the error message shown above.

3. We could drop the BANNED() macro, which would shorten
the number of lines in the error. But curiously,
removing it (and just expanding strcpy directly to the
bogus identifier) causes gcc _not_ to report the
original line of code.

So this strategy seems to be an acceptable mix of
information, portability, simplicity, and robustness,
without _too_ much extra clutter. I also tested it with
clang, and it looks as good (actually, slightly less
cluttered than with gcc).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

diff: --color-moved: rename "dimmed_zebra" to "dimmed... Eric Sunshine Tue, 24 Jul 2018 21:58:45 +0000 (17:58 -0400)

diff: --color-moved: rename "dimmed_zebra" to "dimmed-zebra"

The --color-moved "dimmed_zebra" mode (with an underscore) is an
anachronism. Most options and modes are hyphenated. It is more difficult
to type and somewhat more difficult to read than those which are
hyphenated. Therefore, rename it to "dimmed-zebra", and nominally
deprecate "dimmed_zebra".

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Makefile: add a DEVOPTS flag to get pedantic compilationBeat Bolli Tue, 24 Jul 2018 19:26:43 +0000 (21:26 +0200)

Makefile: add a DEVOPTS flag to get pedantic compilation

In the interest of code hygiene, make it easier to compile Git with the
flag -pedantic.

Pure pedantic compilation with GCC 7.3 results in one warning per use of
the translation macro `N_`:

warning: array initialized from parenthesized string constant [-Wpedantic]

Therefore also disable the parenthesising of i18n strings with
-DUSE_PARENS_AROUND_GETTEXT_N=0.

Signed-off-by: Beat Bolli <dev+git@drbeat.li>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Fourth batch for 2.19 cycleJunio C Hamano Tue, 24 Jul 2018 21:59:49 +0000 (14:59 -0700)

Fourth batch for 2.19 cycle

Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge branch 'as/sequencer-customizable-comment-char'Junio C Hamano Tue, 24 Jul 2018 21:50:51 +0000 (14:50 -0700)

Merge branch 'as/sequencer-customizable-comment-char'

Honor core.commentchar when preparing the list of commits to replay
in "rebase -i".

* as/sequencer-customizable-comment-char:
sequencer: use configured comment character

Merge branch 'sb/blame-color'Junio C Hamano Tue, 24 Jul 2018 21:50:50 +0000 (14:50 -0700)

Merge branch 'sb/blame-color'

Code clean-up.

* sb/blame-color:
blame: prefer xsnprintf to strcpy for colors

Merge branch 'nd/command-list'Junio C Hamano Tue, 24 Jul 2018 21:50:50 +0000 (14:50 -0700)

Merge branch 'nd/command-list'

Build doc update for Windows.

* nd/command-list:
vcbuild/README: update to accommodate for missing common-cmds.h

Merge branch 'es/test-lint-one-shot-export'Junio C Hamano Tue, 24 Jul 2018 21:50:50 +0000 (14:50 -0700)

Merge branch 'es/test-lint-one-shot-export'

Look for broken use of "VAR=VAL shell_func" in test scripts as part
of test-lint.

* es/test-lint-one-shot-export:
t/check-non-portable-shell: detect "FOO=bar shell_func"
t/check-non-portable-shell: make error messages more compact
t/check-non-portable-shell: stop being so polite
t6046/t9833: fix use of "VAR=VAL cmd" with a shell function

Merge branch 'wc/find-commit-with-pattern-on-detached... Junio C Hamano Tue, 24 Jul 2018 21:50:49 +0000 (14:50 -0700)

Merge branch 'wc/find-commit-with-pattern-on-detached-head'

"git rev-parse ':/substring'" did not consider the history leading
only to HEAD when looking for a commit with the given substring,
when the HEAD is detached. This has been fixed.

* wc/find-commit-with-pattern-on-detached-head:
sha1-name.c: for ":/", find detached HEAD commits

Merge branch 'jc/t3404-one-shot-export-fix'Junio C Hamano Tue, 24 Jul 2018 21:50:49 +0000 (14:50 -0700)

Merge branch 'jc/t3404-one-shot-export-fix'

Correct a broken use of "VAR=VAL shell_func" in a test.

* jc/t3404-one-shot-export-fix:
t3404: fix use of "VAR=VAL cmd" with a shell function

Merge branch 'mk/merge-in-sparse-checkout'Junio C Hamano Tue, 24 Jul 2018 21:50:48 +0000 (14:50 -0700)

Merge branch 'mk/merge-in-sparse-checkout'

"git reset --merge" (hence "git merge ---abort") and "git reset --hard"
had trouble working correctly in a sparsely checked out working
tree after a conflict, which has been corrected.

* mk/merge-in-sparse-checkout:
unpack-trees: do not fail reset because of unmerged skipped entry

Merge branch 'hs/push-cert-check-cleanup'Junio C Hamano Tue, 24 Jul 2018 21:50:48 +0000 (14:50 -0700)

Merge branch 'hs/push-cert-check-cleanup'

Code clean-up.

* hs/push-cert-check-cleanup:
gpg-interface: make parse_gpg_output static and remove from interface header
builtin/receive-pack: use check_signature from gpg-interface

Merge branch 'jk/empty-pick-fix'Junio C Hamano Tue, 24 Jul 2018 21:50:48 +0000 (14:50 -0700)

Merge branch 'jk/empty-pick-fix'

Handling of an empty range by "git cherry-pick" was inconsistent
depending on how the range ended up to be empty, which has been
corrected.

* jk/empty-pick-fix:
sequencer: don't say BUG on bogus input
sequencer: handle empty-set cases consistently

Merge branch 'bp/log-ref-write-fd-with-strbuf'Junio C Hamano Tue, 24 Jul 2018 21:50:47 +0000 (14:50 -0700)

Merge branch 'bp/log-ref-write-fd-with-strbuf'

Code clean-up.

* bp/log-ref-write-fd-with-strbuf:
convert log_ref_write_fd() to use strbuf

Merge branch 'jt/partial-clone-fsck-connectivity'Junio C Hamano Tue, 24 Jul 2018 21:50:47 +0000 (14:50 -0700)

Merge branch 'jt/partial-clone-fsck-connectivity'

Partial clone support of "git clone" has been updated to correctly
validate the objects it receives from the other side. The server
side has been corrected to send objects that are directly
requested, even if they may match the filtering criteria (e.g. when
doing a "lazy blob" partial clone).

* jt/partial-clone-fsck-connectivity:
clone: check connectivity even if clone is partial
upload-pack: send refs' objects despite "filter"

Merge branch 'bc/send-email-auto-cte'Junio C Hamano Tue, 24 Jul 2018 21:50:47 +0000 (14:50 -0700)

Merge branch 'bc/send-email-auto-cte'

The content-transfer-encoding of the message "git send-email" sends
out by default was 8bit, which can cause trouble when there is an
overlong line to bust RFC 5322/2822 limit. A new option 'auto' to
automatically switch to quoted-printable when there is such a line
in the payload has been introduced and is made the default.

* bc/send-email-auto-cte:
docs: correct RFC specifying email line length
send-email: automatically determine transfer-encoding
send-email: accept long lines with suitable transfer encoding
send-email: add an auto option for transfer encoding

Merge branch 'bb/unicode-11-width'Junio C Hamano Tue, 24 Jul 2018 21:50:47 +0000 (14:50 -0700)

Merge branch 'bb/unicode-11-width'

The character display width table has been updated to match the
latest Unicode standard.

* bb/unicode-11-width:
unicode: update the width tables to Unicode 11

Merge branch 'bb/pedantic'Junio C Hamano Tue, 24 Jul 2018 21:50:47 +0000 (14:50 -0700)

Merge branch 'bb/pedantic'

The codebase has been updated to compile cleanly with -pedantic
option.

* bb/pedantic:
utf8.c: avoid char overflow
string-list.c: avoid conversion from void * to function pointer
sequencer.c: avoid empty statements at top level
convert.c: replace "\e" escapes with "\033".
fixup! refs/refs-internal.h: avoid forward declaration of an enum
refs/refs-internal.h: avoid forward declaration of an enum
fixup! connect.h: avoid forward declaration of an enum
connect.h: avoid forward declaration of an enum

Merge branch 'tb/config-default'Junio C Hamano Tue, 24 Jul 2018 21:50:46 +0000 (14:50 -0700)

Merge branch 'tb/config-default'

Compilation fix.

* tb/config-default:
builtin/config: work around an unsized array forward declaration

Merge branch 'mh/fast-import-no-diff-delta-empty'Junio C Hamano Tue, 24 Jul 2018 21:50:46 +0000 (14:50 -0700)

Merge branch 'mh/fast-import-no-diff-delta-empty'

"git fast-import" has been updated to avoid attempting to create
delta against a zero-byte-long string, which is pointless.

* mh/fast-import-no-diff-delta-empty:
fast-import: do not call diff_delta() with empty buffer

Merge branch 'kn/userdiff-php'Junio C Hamano Tue, 24 Jul 2018 21:50:46 +0000 (14:50 -0700)

Merge branch 'kn/userdiff-php'

The userdiff pattern for .php has been updated.

* kn/userdiff-php:
userdiff: support new keywords in PHP hunk header
t4018: add missing test cases for PHP

Merge branch 'jk/fetch-all-peeled-fix'Junio C Hamano Tue, 24 Jul 2018 21:50:45 +0000 (14:50 -0700)

Merge branch 'jk/fetch-all-peeled-fix'

Test modernization.

* jk/fetch-all-peeled-fix:
t5500: prettify non-commit tag tests

Merge branch 'ag/rebase-p'Junio C Hamano Tue, 24 Jul 2018 21:50:44 +0000 (14:50 -0700)

Merge branch 'ag/rebase-p'

The help message shown in the editor to edit todo list in "rebase -p"
has regressed recently, which has been corrected.

* ag/rebase-p:
git-rebase--preserve-merges: fix formatting of todo help message

Merge branch 'jt/connectivity-check-after-unshallow'Junio C Hamano Tue, 24 Jul 2018 21:50:44 +0000 (14:50 -0700)

Merge branch 'jt/connectivity-check-after-unshallow'

"git fetch" failed to correctly validate the set of objects it
received when making a shallow history deeper, which has been
corrected.

* jt/connectivity-check-after-unshallow:
fetch-pack: write shallow, then check connectivity
fetch-pack: implement ref-in-want
fetch-pack: put shallow info in output parameter
fetch: refactor to make function args narrower
fetch: refactor fetch_refs into two functions
fetch: refactor the population of peer ref OIDs
upload-pack: test negotiation with changing repository
upload-pack: implement ref-in-want
test-pkt-line: add unpack-sideband subcommand

Merge branch 'jk/for-each-ref-icase'Junio C Hamano Tue, 24 Jul 2018 21:50:44 +0000 (14:50 -0700)

Merge branch 'jk/for-each-ref-icase'

The "--ignore-case" option of "git for-each-ref" (and its friends)
did not work correctly, which has been fixed.

* jk/for-each-ref-icase:
ref-filter: avoid backend filtering with --ignore-case
for-each-ref: consistently pass WM_IGNORECASE flag
t6300: add a test for --ignore-case

Merge branch 'en/t5407-rebase-m-fix'Junio C Hamano Tue, 24 Jul 2018 21:50:43 +0000 (14:50 -0700)

Merge branch 'en/t5407-rebase-m-fix'

* en/t5407-rebase-m-fix:
t5407: fix test to cover intended arguments

Merge branch 'en/apply-comment-fix'Junio C Hamano Tue, 24 Jul 2018 21:50:43 +0000 (14:50 -0700)

Merge branch 'en/apply-comment-fix'

* en/apply-comment-fix:
apply: fix grammar error in comment

Merge branch 'en/rebase-consistency'Junio C Hamano Tue, 24 Jul 2018 21:50:43 +0000 (14:50 -0700)

Merge branch 'en/rebase-consistency'

"git rebase" behaved slightly differently depending on which one of
the three backends gets used; this has been documented and an
effort to make them more uniform has begun.

* en/rebase-consistency:
git-rebase: make --allow-empty-message the default
t3401: add directory rename testcases for rebase and am
git-rebase.txt: document behavioral differences between modes
directory-rename-detection.txt: technical docs on abilities and limitations
git-rebase.txt: address confusion between --no-ff vs --force-rebase
git-rebase: error out when incompatible options passed
t3422: new testcases for checking when incompatible options passed
git-rebase.sh: update help messages a bit
git-rebase.txt: document incompatible options

Merge branch 'sb/submodule-move-head-error-msg'Junio C Hamano Tue, 24 Jul 2018 21:50:43 +0000 (14:50 -0700)

Merge branch 'sb/submodule-move-head-error-msg'

"git checkout --recurse-submodules another-branch" did not report
in which submodule it failed to update the working tree, which
resulted in an unhelpful error message.

* sb/submodule-move-head-error-msg:
submodule.c: report the submodule that an error occurs in

Merge branch 'rj/submodule-fsck-skip'Junio C Hamano Tue, 24 Jul 2018 21:50:42 +0000 (14:50 -0700)

Merge branch 'rj/submodule-fsck-skip'

"fsck.skipList" did not prevent a blob object listed there from
being inspected for is contents (e.g. we recently started to
inspect the contents of ".gitmodules" for certain malicious
patterns), which has been corrected.

* rj/submodule-fsck-skip:
fsck: check skiplist for object in fsck_blob()

pack-protocol: mention and point to docs for protocol v2Brandon Williams Mon, 23 Jul 2018 17:48:07 +0000 (10:48 -0700)

pack-protocol: mention and point to docs for protocol v2

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

strbuf_humanise: use unsigned variablesJeff King Tue, 24 Jul 2018 10:52:29 +0000 (06:52 -0400)

strbuf_humanise: use unsigned variables

All of the numeric formatting done by this function uses
"%u", but we pass in a signed "int". The actual range
doesn't matter here, since the conditional makes sure we're
always showing reasonably small numbers. And even gcc's
format-checker does not seem to mind. But it's potentially
confusing to a reader of the code to see the mismatch.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

pass st.st_size as hint for strbuf_readlink()Jeff King Tue, 24 Jul 2018 10:51:39 +0000 (06:51 -0400)

pass st.st_size as hint for strbuf_readlink()

When we initially added the strbuf_readlink() function in
b11b7e13f4 (Add generic 'strbuf_readlink()' helper function,
2008-12-17), the point was that we generally have a _guess_
as to the correct size based on the stat information, but we
can't necessarily trust it.

Over the years, a few callers have grown up that simply pass
in 0, even though they have the stat information. Let's have
them pass in their hint for consistency (and in theory
efficiency, since it may avoid an extra resize/syscall loop,
but neither location is probably performance critical).

Note that st.st_size is actually an off_t, so in theory we
need xsize_t() here. But none of the other callsites use it,
and since this is just a hint, it doesn't matter either way
(if we wrap we'll simply start with a too-small hint and
then eventually complain when we cannot allocate the
memory).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

strbuf_readlink: use ssize_tJeff King Tue, 24 Jul 2018 10:51:25 +0000 (06:51 -0400)

strbuf_readlink: use ssize_t

The return type of readlink() is ssize_t, not int. This
probably doesn't matter in practice, as it would require a
2GB symlink destination, but it doesn't hurt to be careful.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

strbuf: use size_t for length in intermediate variablesJeff King Tue, 24 Jul 2018 10:51:08 +0000 (06:51 -0400)

strbuf: use size_t for length in intermediate variables

A few strbuf functions store the length of a strbuf in a
temporary variable. We should always use size_t for this, as
it's possible for a strbuf to exceed an "int" (e.g., a 2GB
string on a 64-bit system). This is unlikely in practice,
but we should try to behave sensibly on silly or malicious
input.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

reencode_string: use size_t for string lengthsJeff King Tue, 24 Jul 2018 10:50:33 +0000 (06:50 -0400)

reencode_string: use size_t for string lengths

The iconv interface takes a size_t, which is the appropriate
type for an in-memory buffer. But our reencode_string_*
functions use integers, meaning we may get confusing results
when the sizes exceed INT_MAX. Let's use size_t
consistently.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

reencode_string: use st_add/st_mult helpersJeff King Tue, 24 Jul 2018 10:50:10 +0000 (06:50 -0400)

reencode_string: use st_add/st_mult helpers

When converting a string with iconv, if the output buffer
isn't big enough, we grow it. But our growth is done without
any concern for integer overflow. So when we add:

outalloc = sofar + insz * 2 + 32;

we may end up wrapping outalloc (which is a size_t), and
allocating a too-small buffer. We then manipulate it
further:

outsz = outalloc - sofar - 1;

and feed outsz back to iconv. If outalloc is wrapped and
smaller than sofar, we'll end up with a small allocation but
feed a very large outsz to iconv, which could result in it
overflowing the buffer.

Can we use this to construct an attack wherein the victim
clones a repository with a very large commit object with an
encoding header, and running "git log" reencodes it into
utf8, causing an overflow?

An attack of this sort is likely impossible in practice.
"sofar" is how many output bytes we've written total, and
"insz" is the number of input bytes remaining. Imagine our
input doubles in size as we output it (which is easy to do
by converting latin1 to utf8, for example), and that we
start with N input bytes. Our initial output buffer also
starts at N bytes, so after the first call we'd have N/2
input bytes remaining (insz), and have written N bytes
(sofar). That means our next allocation will be
(N + N/2 * 2 + 32) bytes, or (2N + 32).

We can therefore overflow a 32-bit size_t with a commit
message that's just under 2^31 bytes, assuming it consists
mostly of "doubling" sequences (e.g., latin1 0xe1 which
becomes utf8 0xc3 0xa1).

But we'll never make it that far with such a message. We'll
be spending 2^31 bytes on the original string. And our
initial output buffer will also be 2^31 bytes. Which is not
going to succeed on a system with a 32-bit size_t, since
there will be other things using the address space, too. The
initial malloc will fail.

If we imagine instead that we can triple the size when
converting, then our second allocation becomes
(N + 2/3N * 2 + 32), or (7/3N + 32). That still requires two
allocations of 3/7 of our address space (6/7 of the total)
to succeed.

If we imagine we can quadruple, it becomes (5/2N + 32); we
need to be able to allocate 4/5 of the address space to
succeed.

This might start to get plausible. But is it possible to get
a 4-to-1 increase in size? Probably if you're converting to
some obscure encoding. But since git defaults to utf8 for
its output, that's the likely destination encoding for an
attack. And while there are 4-character utf8 sequences, it's
unlikely that you'd be able find a single-byte source
sequence in any encoding.

So this is certainly buggy code which should be fixed, but
it is probably not a useful attack vector.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge branch 'sb/blame-color' into jk/banned-functionJunio C Hamano Fri, 20 Jul 2018 21:42:53 +0000 (14:42 -0700)

Merge branch 'sb/blame-color' into jk/banned-function

* sb/blame-color:
blame: prefer xsnprintf to strcpy for colors

fetch: send "refs/tags/" prefix upon CLI refspecsJonathan Tan Tue, 5 Jun 2018 21:40:36 +0000 (14:40 -0700)

fetch: send "refs/tags/" prefix upon CLI refspecs

When performing tag following, in addition to using the server's
"include-tag" capability to send tag objects (and emulating it if the
server does not support that capability), "git fetch" relies upon the
presence of refs/tags/* entries in the initial ref advertisement to
locally create refs pointing to the aforementioned tag objects. When
using protocol v2, refs/tags/* entries in the initial ref advertisement
may be suppressed by a ref-prefix argument, leading to the tag object
being downloaded, but the ref not being created.

Commit dcc73cf7ff ("fetch: generate ref-prefixes when using a configured
refspec", 2018-05-18) ensured that "refs/tags/" is always sent as a ref
prefix when "git fetch" is invoked with no refspecs, but not when "git
fetch" is invoked with refspecs. Extend that functionality to make it
work in both situations.

This also necessitates a change another test which tested ref
advertisement filtering using tag refs - since tag refs are sent by
default now, the test has been switched to using branch refs instead.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t5702: test fetch with multiple refspecs at a timeJonathan Tan Tue, 5 Jun 2018 21:40:35 +0000 (14:40 -0700)

t5702: test fetch with multiple refspecs at a time

Extend the protocol v2 tests to also test fetches with multiple refspecs
specified. This also covers the previously uncovered cases of fetching
with prefix matching and fetching by SHA-1.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

fetch-pack: mark die strings for translationBrandon Williams Mon, 23 Jul 2018 17:56:35 +0000 (10:56 -0700)

fetch-pack: mark die strings for translation

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

coccinelle: extract dedicated make target to clean... SZEDER Gábor Mon, 23 Jul 2018 13:51:00 +0000 (15:51 +0200)

coccinelle: extract dedicated make target to clean Coccinelle's results

Sometimes I want to remove only Coccinelle's results, but keep all
other build artifacts left after my usual 'make all man' build. This
new 'cocciclean' make target will allow just that.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

coccinelle: put sane filenames into output patchesSZEDER Gábor Mon, 23 Jul 2018 13:50:59 +0000 (15:50 +0200)

coccinelle: put sane filenames into output patches

Coccinelle outputs its suggested transformations as patches, whose
header looks something like this:

--- commit.c
+++ /tmp/cocci-output-19250-7ae78a-commit.c

Note the lack of 'diff --opts <old> <new>' line, the differing number
of path components on the --- and +++ lines, and the nonsensical
filename on the +++ line. 'patch -p0' can still apply these patches,
as it takes the filename to be modified from the --- line. Alas, 'git
apply' can't, because it takes the filename from the +++ line, and
then complains about the nonexisting file.

Pass the '--patch .' options to Coccinelle via the SPATCH_FLAGS 'make'
variable, as it seems to make it generate proper context diff patches,
with the header starting with a 'diff ...' line and containing sane
filenames. The resulting 'contrib/coccinelle/*.cocci.patch' files
then can be applied both with 'git apply' and 'patch' (even without
'-p0').

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>