The travis CI scripts have been corrected to build Git with the
compiler(s) of our choice.
* sg/travis-specific-cc:
travis-ci: build with the right compiler
travis-ci: switch to Xcode 10.1 macOS image
travis-ci: don't be '--quiet' when running the tests
.gitignore: ignore external debug symbols from GCC on macOS
"git pack-objects" learned another algorithm to compute the set of
objects to send, that trades the resulting packfile off to save
traversal cost to favor small pushes.
A new date format "--date=human" that morphs its output depending
on how far the time is from the current time has been introduced.
"--date=auto" can be used to use this new format when the output is
going to the pager or to the terminal and otherwise the default
format.
* lt/date-human:
Add `human` date format tests.
Add `human` format to test-tool
Add 'human' date format documentation
Replace the proposed 'auto' mode with 'auto:'
Add 'human' date format
* jk/unused-parameter-cleanup:
convert: drop path parameter from actual conversion functions
convert: drop len parameter from conversion checks
config: drop unused parameter from maybe_remove_section()
show_date_relative(): drop unused "tz" parameter
column: drop unused "opts" parameter in item_length()
create_bundle(): drop unused "header" parameter
apply: drop unused "def" parameter from find_name_gnu()
match-trees: drop unused path parameter from score functions
Instead of going through "git-rebase--am" scriptlet to use the "am"
backend, the built-in version of "git rebase" learned to drive the
"am" backend directly.
* js/rebase-am:
built-in rebase: call `git am` directly
rebase: teach `reset_head()` to optionally skip the worktree
rebase: avoid double reflog entry when switching branches
rebase: move `reset_head()` into a better spot
More code in "git bisect" has been rewritten in C.
* tt/bisect-in-c:
bisect--helper: `bisect_start` shell function partially in C
bisect--helper: `get_terms` & `bisect_terms` shell function in C
bisect--helper: `bisect_next_check` shell function in C
bisect--helper: `check_and_set_terms` shell function in C
wrapper: move is_empty_file() and rename it as is_empty_or_missing_file()
bisect--helper: `bisect_write` shell function in C
bisect--helper: `bisect_reset` shell function in C
A new encoding UTF-16LE-BOM has been invented to force encoding to
UTF-16 with BOM in little endian byte order, which cannot be directly
generated by using iconv.
* tb/utf-16-le-with-explicit-bom:
Support working-tree-encoding "UTF-16LE-BOM"
"git rebase --merge" as been reimplemented by reusing the internal
machinery used for "git rebase -i".
* en/rebase-merge-on-sequencer:
rebase: implement --merge via the interactive machinery
rebase: define linearization ordering and enforce it
git-legacy-rebase: simplify unnecessary triply-nested if
git-rebase, sequencer: extend --quiet option for the interactive machinery
am, rebase--merge: do not overlook --skip'ed commits with post-rewrite
t5407: add a test demonstrating how interactive handles --skip differently
rebase: fix incompatible options error message
rebase: make builtin and legacy script error messages the same
The commit-graph facility did not work when in-core objects that
are promoted from unknown type to commit (e.g. a commit that is
accessed via a tag that refers to it) were involved, which has been
corrected.
* sg/object-as-type-commit-graph-fix:
object_as_type: initialize commit-graph-related fields of 'struct commit'
When GIT_SEQUENCE_EDITOR is set, the command was incorrectly
started when modes of "git rebase" that implicitly uses the
machinery for the interactive rebase are run, which has been
corrected.
* pw/no-editor-in-rebase-i-implicit:
implicit interactive rebase: don't run sequence editor
"git diff --color-moved --cc --stat -p" did not work well due to
funny interaction between a bug in color-moved and the rest, which
has been fixed.
* jk/diff-cc-stat-fixes:
combine-diff: treat --dirstat like --stat
combine-diff: treat --summary like --stat
combine-diff: treat --shortstat like --stat
combine-diff: factor out stat-format mask
diff: clear emitted_symbols flag after use
t4006: resurrect commented-out tests
"git checkout -b <new> [HEAD]" to create a new branch from the
current commit and check it out ought to be a no-op in the index
and the working tree in normal cases, but there are corner cases
that do require updates to the index and the working tree. Running
it immediately after "git clone --no-checkout" is one of these
cases that an earlier optimization kicked in incorrectly, which has
been fixed.
* bp/checkout-new-branch-optim:
checkout: fix regression in checkout -b on intitial checkout
checkout: add test demonstrating regression with checkout -b on initial commit
Asking "git check-attr" about a macro (e.g. "binary") on a specific
path did not work correctly, even though "git check-attr -a" listed
such a macro correctly. This has been corrected.
* jk/attr-macro-fix:
attr: do not mark queried macros as unset
On a case-insensitive filesystem, we failed to compare the part of
the path that is above the worktree directory in an absolute
pathname, which has been corrected.
"git p4" failed to update a shelved change when there were moved
files, which has been corrected.
* ld/git-p4-shelve-update-fix:
git-p4: handle update of moved/copied files when updating a shelve
git-p4: add failing test for shelved CL update involving move/copy
Update the protocol message specification to allow only the limited
use of scaled quantities. This is ensure potential compatibility
issues will not go out of hand.
* js/filter-options-should-use-plain-int:
filter-options: expand scaled numbers
tree:<depth>: skip some trees even when collecting omits
list-objects-filter: teach tree:# how to handle >0
The in-core repository instances are passed through more codepaths.
* sb/more-repo-in-api: (23 commits)
t/helper/test-repository: celebrate independence from the_repository
path.h: make REPO_GIT_PATH_FUNC repository agnostic
commit: prepare free_commit_buffer and release_commit_memory for any repo
commit-graph: convert remaining functions to handle any repo
submodule: don't add submodule as odb for push
submodule: use submodule repos for object lookup
pretty: prepare format_commit_message to handle arbitrary repositories
commit: prepare logmsg_reencode to handle arbitrary repositories
commit: prepare repo_unuse_commit_buffer to handle any repo
commit: prepare get_commit_buffer to handle any repo
commit-reach: prepare in_merge_bases[_many] to handle any repo
commit-reach: prepare get_merge_bases to handle any repo
commit-reach.c: allow get_merge_bases_many_0 to handle any repo
commit-reach.c: allow remove_redundant to handle any repo
commit-reach.c: allow merge_bases_many to handle any repo
commit-reach.c: allow paint_down_to_common to handle any repo
commit: allow parse_commit* to handle any repo
object: parse_object to honor its repository argument
object-store: prepare has_{sha1, object}_file to handle any repo
object-store: prepare read_object_file to deal with any repo
...
t1512: test ambiguous cat-file --batch and --batch-output
Test the new "ambiguous" result from cat-file --batch and
--batch-check. This is in t1512 instead of t1006 since
we need a repo with ambiguous object_id names.
Signed-off-by: Eric Wong <e@80x24.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Users who want UTF-16 files in the working tree set the .gitattributes
like this:
test.txt working-tree-encoding=UTF-16
The unicode standard itself defines 3 allowed ways how to encode UTF-16.
The following 3 versions convert all back to 'g' 'i' 't' in UTF-8:
a) UTF-16, without BOM, big endian:
$ printf "\000g\000i\000t" | iconv -f UTF-16 -t UTF-8 | od -c 0000000 g i t
b) UTF-16, with BOM, little endian:
$ printf "\377\376g\000i\000t\000" | iconv -f UTF-16 -t UTF-8 | od -c 0000000 g i t
c) UTF-16, with BOM, big endian:
$ printf "\376\377\000g\000i\000t" | iconv -f UTF-16 -t UTF-8 | od -c 0000000 g i t
Git uses libiconv to convert from UTF-8 in the index into ITF-16 in the
working tree.
After a checkout, the resulting file has a BOM and is encoded in "UTF-16",
in the version (c) above.
This is what iconv generates, more details follow below.
iconv (and libiconv) can generate UTF-16, UTF-16LE or UTF-16BE:
d) UTF-16
$ printf 'git' | iconv -f UTF-8 -t UTF-16 | od -c 0000000 376 377 \0 g \0 i \0 t
e) UTF-16LE
$ printf 'git' | iconv -f UTF-8 -t UTF-16LE | od -c 0000000 g \0 i \0 t \0
f) UTF-16BE
$ printf 'git' | iconv -f UTF-8 -t UTF-16BE | od -c 0000000 \0 g \0 i \0 t
There is no way to generate version (b) from above in a Git working tree,
but that is what some applications need.
(All fully unicode aware applications should be able to read all 3 variants,
but in practise we are not there yet).
When producing UTF-16 as an output, iconv generates the big endian version
with a BOM. (big endian is probably chosen for historical reasons).
iconv can produce UTF-16 files with little endianess by using "UTF-16LE"
as encoding, and that file does not have a BOM.
Not all users (especially under Windows) are happy with this.
Some tools are not fully unicode aware and can only handle version (b).
Today there is no way to produce version (b) with iconv (or libiconv).
Looking into the history of iconv, it seems as if version (c) will
be used in all future iconv versions (for compatibility reasons).
Solve this dilemma and introduce a Git-specific "UTF-16LE-BOM".
libiconv can not handle the encoding, so Git pick it up, handles the BOM
and uses libiconv to convert the rest of the stream.
(UTF-16BE-BOM is added for consistency)
Rported-by: Adrián Gimeno Balaguer <adrigibal@gmail.com> Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The code to walk tree objects has been taught that we may be
working with object names that are not computed with SHA-1.
* bc/tree-walk-oid:
cache: make oidcpy always copy GIT_MAX_RAWSZ bytes
tree-walk: store object_id in a separate member
match-trees: use hashcpy to splice trees
match-trees: compute buffer offset correctly when splicing
tree-walk: copy object ID before use
* ms/http-no-more-failonerror:
test: test GIT_CURL_VERBOSE=1 shows an error
remote-curl: unset CURLOPT_FAILONERROR
remote-curl: define struct for CURLOPT_WRITEFUNCTION
http: enable keep_error for HTTP requests
http: support file handles for HTTP_KEEP_ERROR
"git rebase" internally runs "checkout" to switch between branches,
and the command used to call the post-checkout hook, but the
reimplementation stopped doing so, which is getting fixed.
* os/rebase-runs-post-checkout-hook:
rebase: run post-checkout hook on checkout
t5403: simplify by using a single repository
Add sha-256 hash and plug it through the code to allow building Git
with the "NewHash".
* bc/sha-256:
hash: add an SHA-256 implementation using OpenSSL
sha256: add an SHA-256 implementation using libgcrypt
Add a base implementation of SHA-256 support
commit-graph: convert to using the_hash_algo
t/helper: add a test helper to compute hash speed
sha1-file: add a constant for hash block size
t: make the sha1 test-tool helper generic
t: add basic tests for our SHA-1 implementation
cache: make hashcmp and hasheq work with larger hashes
hex: introduce functions to print arbitrary hashes
sha1-file: provide functions to look up hash algorithms
sha1-file: rename algorithm to "sha1"
"git fetch --recurse-submodules" may not fetch the necessary commit
that is bound to the superproject, which is getting corrected.
* sb/submodule-recursive-fetch-gets-the-tip:
fetch: ensure submodule objects fetched
submodule.c: fetch in submodules git directory instead of in worktree
submodule: migrate get_next_submodule to use repository structs
repository: repo_submodule_init to take a submodule struct
submodule: store OIDs in changed_submodule_names
submodule.c: tighten scope of changed_submodule_names struct
submodule.c: sort changed_submodule_names before searching it
submodule.c: fix indentation
sha1-array: provide oid_array_filter
The v2 upload-pack protocol implementation failed to honor
hidden-ref configuration, which has been corrected.
An earlier attempt reverted out of 'next'.
* jk/proto-v2-hidden-refs-fix:
upload-pack: support hidden refs with protocol v2
There were many places the code relied on the string returned from
getenv() to be non-volatile, which is not true, that have been
corrected.
* jk/save-getenv-result:
builtin_diff(): read $GIT_DIFF_OPTS closer to use
merge-recursive: copy $GITHEAD strings
init: make a copy of $GIT_DIR string
config: make a copy of $GIT_CONFIG string
commit: copy saved getenv() result
get_super_prefix(): copy getenv() result
"git rebase -i" learned to re-execute a command given with 'exec'
to run after it failed the last time.
* js/rebase-i-redo-exec:
rebase: introduce a shortcut for --reschedule-failed-exec
rebase: add a config option to default to --reschedule-failed-exec
rebase: introduce --reschedule-failed-exec
The code to drive GIT_EXTERNAL_DIFF command relied on the string
returned from getenv() to be non-volatile, which is not true, that
has been corrected.
* kg/external-diff-save-env:
diff: ensure correct lifetime of external_diff_cmd
When using `human` several fields are suppressed depending on the time
difference between the reference date and the local computer date. In
cases where the difference is less than a year, the year field is
supppressed. If the time is less than a day; the month and year is
suppressed.
Use TEST_DATE_NOW environment variable when using the test-tool to
hold the expected output strings constant.
Signed-off-by: Stephen P. Smith <ischis2@cox.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add the human format support to the test tool so that
GIT_TEST_DATE_NOW can be used to specify the current time.
The get_time() helper function was created and and checks the
GIT_TEST_DATE_NOW environment variable. If GIT_TEST_DATE_NOW is set,
then that date is used instead of the date returned by by
gettimeofday().
All calls to gettimeofday() were replaced by calls to get_time().
Renamed occurances of TEST_DATE_NOW to GIT_TEST_DATE_NOW since the
variable is now used in the get binary and not just in the test-tool.
Signed-off-by: Stephen P. Smith <ischis2@cox.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The word "property" is vague here. Let's spell out that we mean the path
must be marked with the text attribute.
While we're here, let's make the paragraph a little easier to read by
de-emphasizing the "when core.autocrlf is false" bit. Putting it in the
first sentence obscures the main content, and many readers won't care
about autocrlf (i.e., anyone who is just following the gitattributes(7)
advice, which mainly discusses "text" and "core.eol").
Helped-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
We only override core.eol with core.autocrlf when the latter is set to
something besides "false". Let's make this more clear, and point the
reader to the git-config definitions, which discuss this in more detail.
Noticed-by: Sergey Lukashev <lukashev.s@ya.ru> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
test-lint: only use only sed [-n] [-e command] [-f command_file]
From `man sed` (on a Mac OS X box):
The -E, -a and -i options are non-standard FreeBSD extensions and may not be available
on other operating systems.
From `man sed` on a Linux box:
REGULAR EXPRESSIONS
POSIX.2 BREs should be supported, but they aren't completely because of
performance problems. The \n sequence in a regular expression matches the newline
character, and similarly for \a, \t, and other sequences.
The -E option switches to using extended regular expressions instead; the -E option
has been supported for years by GNU sed, and is now included in POSIX.
Well, there are still a lot of systems out there, which don't support it.
Beside that, IEEE Std 1003.1TM-2017, see
http://pubs.opengroup.org/onlinepubs/9699919799/
does not mention -E either.
To be on the safe side, don't allow -E (or -r, which is GNU).
Change check-non-portable-shell.pl to only accept the portable options:
sed [-n] [-e command] [-f command_file]
Reported-by: SZEDER Gábor <szeder.dev@gmail.com> Helped-by: Eric Sunshine <sunshine@sunshineco.com> Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
pack-objects: merge read_lock and lock in packing_data struct
Rename the packing_data lock to obd_lock and upgrade it to a recursive
mutex to make it suitable for current read_lock usages. Additionally
remove the superfluous #ifndef NO_PTHREADS guard around mutex
initialization in prepare_packing_data as the mutex functions
themselves are already protected.
Signed-off-by: Patrick Hogg <phogg@novamoon.net> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
pack-objects: move read mutex to packing_data struct
ac77d0c37 ("pack-objects: shrink size field in struct object_entry",
2018-04-14) added an extra usage of read_lock/read_unlock in the newly
introduced oe_get_size_slow for thread safety in parallel calls to
try_delta(). Unfortunately oe_get_size_slow is also used in serial
code, some of which is called before the first invocation of
ll_find_deltas. As such the read mutex is not guaranteed to be
initialized.
Resolve this by moving the read mutex to packing_data and initializing
it in prepare_packing_data which is initialized in cmd_pack_objects.
Signed-off-by: Patrick Hogg <phogg@novamoon.net> Reviewed-by: Duy Nguyen <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
git-instaweb: add Python builtin http.server support
With this patch it is possible to launch git-instaweb by using
Python http.server CGI handler via `-d python` option.
git-instaweb generates a small wrapper around the http.server
(in GIT_DIR/gitweb/) that address a limitation of the CGI handler
where CGI scripts have to be in a cgi-bin subdirectory and
directory index can't be easily changed. To keep the implementation
small, gitweb is running on url `/cgi-bin/gitweb.cgi` and an automatic
redirection is done when opening `/`.
The generated wrapper is compatible with both Python 2 and 3.
Python is by default installed on most modern Linux distributions
which enables running `git instaweb -d python` without needing
anything else.
Signed-off-by: Arti Zirk <arti.zirk@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
implicit interactive rebase: don't run sequence editor
If GIT_SEQUENCE_EDITOR is set then rebase runs it when executing
implicit interactive rebases which are supposed to appear
non-interactive to the user. Fix this by setting GIT_SEQUENCE_EDITOR=:
rather than GIT_EDITOR=:. A couple of tests relied on the old behavior
so they are updated to work with the new regime.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
object_as_type: initialize commit-graph-related fields of 'struct commit'
When the commit graph and generation numbers were introduced in
commits 177722b344 (commit: integrate commit graph with commit
parsing, 2018-04-10) and 83073cc994 (commit: add generation number to
struct commit, 2018-04-25), they tried to make sure that the
corresponding 'graph_pos' and 'generation' fields of 'struct commit'
are initialized conservatively, as if the commit were not included in
the commit-graph file.
Alas, initializing those fields only in alloc_commit_node() missed the
case when an object that happens to be a commit is first looked up via
lookup_unknown_object(), and is then later converted to a 'struct
commit' via the object_as_type() helper function (either calling it
directly, or as part of a subsequent lookup_commit() call).
Consequently, both of those fields incorrectly remain set to zero,
which means e.g. that the commit is present in and is the first entry
of the commit-graph file. This will result in wrong timestamp, parent
and root tree hashes, if such a 'struct commit' instance is later
filled from the commit-graph.
Extract the initialization of 'struct commit's fields from
alloc_commit_node() into a helper function, and call it from
object_as_type() as well, to make sure that it properly initializes
the two commit-graph-related fields, too. With this helper function
it is hopefully less likely that any new fields added to 'struct
commit' in the future would remain uninitialized.
With this change alloc_commit_index() won't have any remaining callers
outside of 'alloc.c', so mark it as static.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
strbuf.cocci: suggest strbuf_addbuf() to add one strbuf to an other
The best way to add one strbuf to an other is via:
strbuf_addbuf(&sb, &sb2);
This is a bit more idiomatic and efficient than:
strbuf_addstr(&sb, sb2.buf);
because the size of the second strbuf is known and thus it can spare a
strlen() call, and much more so than:
strbuf_addf(&sb, "%s", sb2.buf);
because it can spare the whole vsnprintf() formatting magic.
Add new semantic patches to 'contrib/coccinelle/strbuf.cocci' to catch
these undesired patterns and to suggest strbuf_addbuf() instead.
Luckily, our codebase is already clean from any such undesired
patterns (but one of the in-flight topics just tried to sneak in such
a strbuf_addf() call).
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Notice that the local side of branch jch starts with "*" instead of
ending with it like the rest. It's not exactly wrong. It just looks
weird.
This patch changes the find-and-replace code a bit to try finding prefix
first before falling back to strstr() which finds a substring from left
to right. Now we have something less OCD
The timestamp we receive is in epoch time, so there's no need for a
timezone parameter to interpret it. The matching show_date() uses "tz"
to show dates in author local time, but relative dates show only the
absolute time difference. The author's location is irrelevant, barring
relativistic effects from using Git close to the speed of light.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
column: drop unused "opts" parameter in item_length()
There are no column options which impact the length computation. In
theory there might be, but this is a file-local function, so it will be
trivial to re-add the parameter should it ever be useful.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
There's no need to pass a header struct to create_bundle(); it writes
the header information directly to a descriptor (and does not report
back details to the caller).
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
apply: drop unused "def" parameter from find_name_gnu()
We use the "def" parameter in find_name_common() for some heuristics,
but they are not necessary with the less-ambiguous GNU format. Let's
drop this unused parameter.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
match-trees: drop unused path parameter from score functions
The scores do not take the particular path into account at
all. It's possible they could, but these are all static file-local
functions. It won't be a big deal to re-add the parameter if they ever
need it.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently "--cc --dirstat" will show nothing for a merge. Like
--shortstat and --summary in the previous two patches, it probably makes
sense to treat it like we do --stat, and show a stat against the
first-parent.
This case is less obviously correct than for --shortstat and --summary,
as those are basically variants of --stat themselves. It's possible we
could develop a multi-parent combined dirstat format, in which case we
might regret defining this first-parent behavior. But the same could be
said for --stat, and in the 12+ years of it showing first-parent stats,
nobody has complained.
So showing the first-parent dirstat is at least _useful_, and if we
later develop a clever multi-parent stat format, we'd probably have to
deal with --stat anyway.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently "--cc --summary" on a merge shows nothing. Since we show "--cc
--stat" as a stat against the first parent, and because --summary is
typically used in combination with --stat, it makes sense to treat them
both the same way.
Note that we have to tweak t4013's setup a bit to test this case, as the
existing merges do not have any --summary results against their first
parent. But since the merge at the tip of 'master' does add and remove
files with respect to the second parent, we can just make a reversed
doppelganger merge where the parents are swapped.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The --stat of a combined diff is defined as the first-parent stat,
going all the way back to 965f803c32 (combine-diff: show diffstat with
the first parent., 2006-04-17).
Naturally, we gave --numstat the same treatment in 74e2abe5b7 (diff
--numstat, 2006-10-12).
But --shortstat, which is really just the final line of --stat, does
nothing, which produces confusing results:
$ git show --oneline --stat eab7584e37 eab7584e37 Merge branch 'en/show-ref-doc-fix'
There are several conditionals in the combine diff code that check if
we're doing --stat or --numstat output. Since these must all remain in
sync, let's pull them out into a separate bit-mask.
Arguably this could go into diff.h along with the other DIFF_FORMAT
macros, but it's not clear that the definition of "which formats are
stat" is universal (e.g., does --dirstat count? --summary?). So let's
keep this local to combine-diff.c, where the meaning is more clearly
"stat formats that combine-diff supports".
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
There's an odd bug when "log --color-moved" is used with the combination
of "--cc --stat -p": the stat for merge commits is erroneously shown
with the diff of the _next_ commit.
The included test demonstrates the issue. Our history looks something
like this:
A-B-M--D
\ /
C
When we run "git log --cc --stat -p --color-moved" starting at D, we get
this sequence of events:
1. The diff for D is using -p, so diff_flush() calls into
diff_flush_patch_all_file_pairs(). There we see that o->color_moved
is in effect, so we point o->emitted_symbols to a static local
struct, causing diff_flush_patch() to queue the symbols instead of
actually writing them out.
We then do our move detection, emit the symbols, and clear the
struct. But we leave o->emitted_symbols pointing to our struct.
2. Next we compute the diff for M. This is a merge, so we use the
combined diff code. In find_paths_generic(), we compute the
pairwise diff between each commit and its parent. Normally this is
done with DIFF_FORMAT_NO_OUTPUT, since we're just looking for
intersecting paths. But since "--stat --cc" shows the first-parent
stat, and since we're computing that diff anyway, we enable
DIFF_FORMAT_DIFFSTAT for the first parent. This outputs the stat
information immediately, saving us from running a separate
first-parent diff later.
But where does that output go? Normally it goes directly to stdout,
but because o->emitted_symbols is set, we queue it. As a result, we
don't actually print the diffstat for the merge commit (yet), which
is wrong.
3. Next we compute the diff for C. We're actually showing a patch
again, so we end up in diff_flush_patch_all_file_pairs(), but this
time we have the queued stat from step 2 waiting in our struct.
We add new elements to it for C's diff, and then flush the whole
thing. And we see the diffstat from M as part of C's diff, which is
wrong.
So triggering the bug really does require the combination of all of
those options.
To fix it, we can simply restore o->emitted_symbols to NULL after
flushing it, so that it does not affect anything outside of
diff_flush_patch_all_file_pairs(). This intuitively makes sense, since
nobody outside of that function is going to bother flushing it, so we
would not want them to write to it either.
In fact, we could take this a step further and turn the local "esm"
struct into a non-static variable that goes away after the function
ends. However, since it contains a dynamically sized array, we benefit
from amortizing the cost of allocations over many calls. So we'll leave
it as static to retain that benefit.
But let's push the zero-ing of esm.nr into the conditional for "if
(o->emitted_symbols)" to make it clear that we do not expect esm to hold
any values if we did not just try to use it. With the code as it is
written now, if we did encounter such a case (which I think would be a
bug), we'd silently leak those values without even bothering to display
them. With this change, we'd at least eventually show them, and somebody
would notice.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
This set of tests was added by 4434e6ba6c (tests: check --[short]stat
output after chmod, 2012-05-01), and is primarily about the handling of
binary versus text files.
Later, 74faaa16f0 (Fix "git diff --stat" for interesting - but empty -
file changes, 2012-10-17) changed the stat output so that the empty text
file is mentioned rather than omitted. That commit just comments out
these tests. There's no discussion in the commit message, but the
original email[1] says:
NOTE! This does break two of our tests, so we clearly did this on
purpose, or at least tested for it. I just uncommented the subtests
that this makes irrelevant, and changed the output of another one.
I don't think they're irrelevant, though. We should be testing this
"mode change only" case and making sure that it has the post-74faaa16f0
behavior. So this commit brings back those tests, with the current
expected output.
checkout: fix regression in checkout -b on intitial checkout
When doing a 'checkout -b' do a full checkout including updating the working
tree when doing the initial checkout. As the new test involves an filesystem
access, do it later in the sequence to give chance to other cheaper tests to
leave early. This fixes the regression in behavior caused by fa655d8411
(checkout: optimize "git checkout -b <new_branch>", 2018-08-16).
Signed-off-by: Ben Peart <benpeart@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
checkout: add test demonstrating regression with checkout -b on initial commit
Commit fa655d8411 (checkout: optimize "git checkout -b <new_branch>",
2018-08-16) introduced an unintentional change in behavior for 'checkout -b'
after doing 'clone --no-checkout'. Add a test to demonstrate the changed
behavior to be used in a later patch to verify the fix.
Signed-off-by: Ben Peart <benpeart@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
commit-graph write: emit a percentage for all progress
Follow-up 01ca387774 ("commit-graph: split up close_reachable()
progress output", 2018-11-19) by making the progress bars in
close_reachable() report a completion percentage. This fixes the last
occurrence where in the commit graph writing where we didn't report
that.
The change in 01ca387774 split up the 1x progress bar in
close_reachable() into 3x, but left them as dumb counters without a
percentage completion. Fixing that is easy, and the only reason it
wasn't done already is because that commit was rushed in during the
v2.20.0 RC period to fix the unrelated issue of over-reporting commit
numbers. See [1] and follow-ups for ML activity at the time and [2]
for an alternative approach where the progress bars weren't split up.
Now for e.g. linux.git we'll emit:
$ ~/g/git/git --exec-path=$HOME/g/git commit-graph write
Finding commits for commit graph among packed objects: 100% (6529159/6529159), done.
Expanding reachable commits in commit graph: 100% (815990/815980), done.
Computing commit graph generation numbers: 100% (815983/815983), done.
Writing out commit graph in 4 passes: 100% (3263932/3263932), done.
Add progress output to sections of code between "Annotating[...]" and
"Computing[...]generation numbers". This can collectively take 5-10
seconds on a large enough repository.
On a test repository with I have with ~7 million commits and ~50
million objects we'll now emit:
$ ~/g/git/git --exec-path=$HOME/g/git commit-graph write
Finding commits for commit graph among packed objects: 100% (124763727/124763727), done.
Loading known commits in commit graph: 100% (18989461/18989461), done.
Expanding reachable commits in commit graph: 100% (18989507/18989461), done.
Clearing commit marks in commit graph: 100% (18989507/18989507), done.
Counting distinct commits in commit graph: 100% (18989507/18989507), done.
Finding extra edges in commit graph: 100% (18989507/18989507), done.
Computing commit graph generation numbers: 100% (7250302/7250302), done.
Writing out commit graph in 4 passes: 100% (29001208/29001208), done.
Whereas on a medium-sized repository such as linux.git these new
progress bars won't have time to kick in and as before and we'll still
emit output like:
$ ~/g/git/git --exec-path=$HOME/g/git commit-graph write
Finding commits for commit graph among packed objects: 100% (6529159/6529159), done.
Expanding reachable commits in commit graph: 815990, done.
Computing commit graph generation numbers: 100% (815983/815983), done.
Writing out commit graph in 4 passes: 100% (3263932/3263932), done.
The "Counting distinct commits in commit graph" phase will spend most
of its time paused at "0/*" as we QSORT(...) the list. That's not
optimal, but at least we don't seem to be stalling anymore most of the
time.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
commit-graph write: remove empty line for readability
Remove the empty line between a QSORT(...) and the subsequent oideq()
for-loop. This makes it clearer that the QSORT(...) is being done so
that we can run the oideq() loop on adjacent OIDs. Amends code added
in 08fd81c9b6 ("commit-graph: implement write_commit_graph()",
2018-04-02).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
commit-graph write: add more descriptive progress output
Make the progress output shown when we're searching for commits to
include in the graph more descriptive. This amends code I added in 7b0f229222 ("commit-graph write: add progress output", 2018-09-17).
Now, on linux.git, we'll emit this sort of output in the various modes
we support:
$ git commit-graph write
Finding commits for commit graph among packed objects: 100% (6529159/6529159), done.
[...]
# Actually we don't emit this since this takes almost no time at
# all. But if we did (s/_delayed//) we'd show:
$ git for-each-ref --format='%(objectname)' | git commit-graph write --stdin-commits
Finding commits for commit graph from 630 refs: 100% (630/630), done.
[...]
$ (cd .git/objects/pack/ && ls *idx) | git commit-graph write --stdin-pack
Finding commits for commit graph in 3 packs: 6529159, done.
[...]
The middle on of those is going to be the output users might see in
practice, since it'll be emitted when they get the commit graph via
gc.writeCommitGraph=true. But as noted above you need a really large
number of refs for this message to show. It'll show up on a test
repository I have with ~165k refs:
Finding commits for commit graph from 165203 refs: 100% (165203/165203), done.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
commit-graph write: show progress for object search
Show the percentage progress for the "Finding commits for commit
graph" phase for the common case where we're operating on all packs in
the repository, as "commit-graph write" or "gc" will do.
Before we'd emit on e.g. linux.git with "commit-graph write":
Finding commits for commit graph: 6529159, done.
[...]
And now:
Finding commits for commit graph: 100% (6529159/6529159), done.
[...]
Since the commit graph only includes those commits that are packed
(via for_each_packed_object(...)) the approximate_object_count()
returns the actual number of objects we're going to process.
Still, it is possible due to a race with "gc" or another process
maintaining packs that the number of objects we're going to process is
lower than what approximate_object_count() reported. In that case we
don't want to stop the progress bar short of 100%. So let's make sure
it snaps to 100% at the end.
The inverse case is also possible and more likely. I.e. that a new
pack has been added between approximate_object_count() and
for_each_packed_object(). In that case the percentage will go beyond
100%, and we'll do nothing to snap it back to 100% at the end.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
commit-graph write: more descriptive "writing out" output
Make the "Writing out" part of the progress output more
descriptive. Depending on the shape of the graph we either make 3 or 4
passes over it.
Let's present this information to the user in case they're wondering
what this number, which is much larger than their number of commits,
has to do with writing out the commit graph. Now e.g. on linux.git we
emit:
$ ~/g/git/git --exec-path=$HOME/g/git -C ~/g/linux commit-graph write
Finding commits for commit graph: 6529159, done.
Expanding reachable commits in commit graph: 815990, done.
Computing commit graph generation numbers: 100% (815983/815983), done.
Writing out commit graph in 4 passes: 100% (3263932/3263932), done.
A note on i18n: Why are we using the Q_() function and passing a
number & English text for a singular which'll never be used? Because
the plural rules of translated languages may not match those of
English, and to use the plural function we need to use this format.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add progress output to be shown when we're writing out the
commit-graph, this adds to the output already added in 7b0f229222
("commit-graph write: add progress output", 2018-09-17).
As noted in that commit most of the progress output isn't displayed on
small repositories, but before this change we'd noticeably hang for
2-3 seconds at the end on medium sized repositories such as linux.git.
Now we'll instead show output like this, and reduce the
human-observable times at which we're not producing progress output:
This "Writing out" number is 3x or 4x the number of commits, depending
on the graph we're processing. A later change will make this explicit
to the user.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The optional 'Extra Edge List' chunk of the commit graph file stores
parent information for commits with more than two parents. Since the
chunk is optional, write_commit_graph() looks through all commits to
find those with more than two parents, and then writes the commit
graph file header accordingly, i.e. if there are no such commits, then
there won't be a 'Extra Edge List' chunk written, only the three
mandatory chunks.
However, when it later comes to writing actual chunk data,
write_commit_graph() unconditionally invokes
write_graph_chunk_extra_edges(), even when it was decided earlier that
that chunk won't be written. Strictly speaking there is no bug here,
because write_graph_chunk_extra_edges() won't write anything if it
doesn't find any commits with more than two parents, but then it
unnecessarily and in vain looks through all commits once again in
search for such commits.
Don't call write_graph_chunk_extra_edges() when that chunk won't be
written to spare an unnecessary iteration over all commits.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>