When studying the performance of 'git push' we would like to know
how much time is spent at various parts of the command. One area
that could cause performance trouble is 'git pack-objects'.
Add trace2 regions around the three main actions taken in this
command:
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add trace2_region_enter() and trace2_region_leave() calls around the
various phases of a status scan. This gives elapsed time for each
phase in the GIT_TR2_PERF and GIT_TR2_EVENT trace target.
Also, these Trace2 calls now use s->repo rather than the_repository.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Create a new unified tracing facility for git. The eventual intent is to
replace the current trace_printf* and trace_performance* routines with a
unified set of git_trace2* routines.
In addition to the usual printf-style API, trace2 provides higer-level
event verbs with fixed-fields allowing structured data to be written.
This makes post-processing and analysis easier for external tools.
Trace2 defines 3 output targets. These are set using the environment
variables "GIT_TR2", "GIT_TR2_PERF", and "GIT_TR2_EVENT". These may be
set to "1" or to an absolute pathname (just like the current GIT_TRACE).
* GIT_TR2 is intended to be a replacement for GIT_TRACE and logs command
summary data.
* GIT_TR2_PERF is intended as a replacement for GIT_TRACE_PERFORMANCE.
It extends the output with columns for the command process, thread,
repo, absolute and relative elapsed times. It reports events for
child process start/stop, thread start/stop, and per-thread function
nesting.
* GIT_TR2_EVENT is a new structured format. It writes event data as a
series of JSON records.
Calls to trace2 functions log to any of the 3 output targets enabled
without the need to call different trace_printf* or trace_performance*
routines.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is git-checy-racy command, added a long time ago [1] and was never
part of the default build. Naturally after some makefile changes [2],
git-check-racy was no longer recognized as a build target. Even if it
compiles to day, it will not link after the introduction of
common-main.c [3].
Racy index has not been a problem for a long time [4]. It's time to let
this code go. I briefly consider if check-racy should be part of
test-tool. But I don't think it's worth the effort.
[1] 42f774063d (Add check program "git-check-racy" - 2006-08-15)
[2] c373991375 (Makefile: list generated object files in OBJECTS -
2010-01-26)
[3] 3f2e2297b9 (add an extra level of indirection to main() -
2016-07-01)
[4] I pretend I don't remember anything about the recent split-index's
racy problem
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
remote-curl: refactor reading into rpc_state's buf
Currently, whenever remote-curl reads pkt-lines from its response file
descriptor, only the payload is written to its buf, not the 4 characters
denoting the length. A future patch will require the ability to also
write those 4 characters, so in preparation for that, refactor this read
into its own function.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change an unportable invocation of "dd" with count=0, that wanted to
truncate the commit-graph file. In POSIX it is unspecified what
happens when count=0 is provided[1]. The NetBSD "dd" behavior
differs from GNU (and seemingly other BSDs), which has left this test
broken since d2b86fbaa1 ("commit-graph: fix buffer read-overflow",
2019-01-15).
Copying from /dev/null would seek/truncate to seek=$zero_pos and
stop immediately after that (without being able to copy anything),
which is the right way to truncate the file.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Helped-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix widely supported but non-POSIX basic regex syntax introduced in
[1] and [2]. On GNU, NetBSD and FreeBSD the following works:
$ echo xy >f
$ grep 'xy\?' f; echo $?
xy
0
The same goes for "\+". The "?" and "+" syntax is not in the BRE
syntax, just in ERE, but on some implementations it can be invoked by
prefixing the meta-operator with "\", but not on OpenBSD:
In 7171d8c15f ("upload-pack: send symbolic ref information as
capability"), we added a symref capability to the pack protocol, but it
was never documented. Adapt the patch notes from that commit and add
them to the capabilities documentation.
While we're at it, add a disclaimer to the top of
protocol-capabilities.txt noting that the doc only applies to v0/v1 of
the wire protocol.
Signed-off-by: Josh Steadmon <steadmon@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
merge-options.txt: correct wording of --no-commit option
The former wording implied that --no-commit would always cause the
merge operation to "pause" and allow the user to make further changes
and/or provide a special commit message for the merge commit. This
is not the case for fast-forward merges, as there is no merge commit
to create. Without a merge commit, there is no place where it makes
sense to "stop the merge and allow the user to tweak changes"; doing
that would require a full rebase of some sort.
Since users may be unaware of whether their branches have diverged or
not, modify the wording to correctly address fast-forward cases as well
and suggest using --no-ff with --no-commit if the point is to ensure
that the merge stops before completing.
Reported-by: Ulrich Windl <Ulrich.Windl@rz.uni-regensburg.de> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
mention use of "hooks.allownonascii" in "man githooks"
The default pre-commit script checks the config variable
"hooks.allownonascii" to determine whether to allow non-ASCII file
names -- mention this in "man githooks", just as the section on
"update" mentions the use of "hooks.allowunannotated".
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The resolve_ref_unsafe() function can, and sometimes will in the case
of this codepath, return the char * passed to it to the caller. In
this case we construct a strbuf, free it, and then continue using the
dst_name after that free().
The code being fixed dates back to da3efdb17b ("receive-pack: detect
aliased updates which can occur with symrefs", 2010-04-19). When it
was originally added it didn't have this bug, it was introduced when
it was subsequently modified to use strbuf in 6b01ecfe22 ("ref
namespaces: Support remote repositories via upload-pack and
receive-pack", 2011-07-08).
This is theoretically a security issue, the C standard makes no
guarantees that a value you use after free() hasn't been poked at or
changed by something else on the system, but in practice modern OSs
will have mapped the relevant page to this process, so nothing else
would have used it. We do no further allocations between the free()
and use-after-free, so we ourselves didn't corrupt or change the
value.
Jeff investigated that and found: "It probably would be an issue if
the allocation were larger. glibc at least will use mmap()/munmap()
after some cutoff[1], in which case we'd get a segfault from hitting
the unmapped page. But for small allocations, it just bumps brk() and
the memory is still available for further allocations after
free(). [...] If you had a sufficiently large refname you might be
able to trigger the bug [...]. I tried to push such a ref. I had to
manually make a packed-refs file with the long name to avoid
filesystem limits (though probably you could have a long a/b/c/ name
on ext4). But the result can't actually be pushed, because it all has
to fit into a 64k pkt-line as part of the push protocol.".
An a alternative and more succinct way of implementing this would have
been to do the strbuf_release() at the end of check_aliased_update()
and use "goto out" instead of the early "return" statements. Hopefully
this approach of using a helper instead makes it easier to follow.
1. Jeff: "Weirdly, the mmap() cutoff on my glibc system is 135168
bytes. Which is...2^17 + 2^12? 33 pages? I'm sure there's a good
reason for that, but I didn't dig into it."
Reported-by: 王健强 <jianqiang.wang@securitygossip.com> Helped-by: Jeff King <peff@peff.net> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
This adds value completion for a couple more paramters. To make it
easier to maintain these hard coded lists, add a comment at the original
list/code to remind people to update git-completion.bash too.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* mk/t5562-no-input-to-too-large-an-input-test:
t5562: do not depend on /dev/zero
Revert "t5562: replace /dev/zero with a pipe from generate_zero_bytes"
Some expected failures of git-http-backend leaves running its children
(receive-pack or upload-pack) which still hold opened descriptors
to act.err and with some probability they live long enough to write
there their failure messages after next test has already truncated
the files. This causes occasional failures of the test script.
Avoid the issue by using separated output and error file for each test,
apprending the test number to their name.
Reported-by: Carlo Arenas <carenas@gmail.com> Helped-by: Carlo Arenas <carenas@gmail.com> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Max Kirillov <max@max630.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
tests: teach the test-tool to generate NUL bytes and use it
In cc95bc2025 (t5562: replace /dev/zero with a pipe from
generate_zero_bytes, 2019-02-09), we replaced usage of /dev/zero (which
is not available on NonStop, apparently) by a Perl script snippet to
generate NUL bytes.
Sadly, it does not seem to work on NonStop, as t5562 reportedly hangs.
Worse, this also hangs in the Ubuntu 16.04 agents of the CI builds on
Azure Pipelines: for some reason, the Perl script snippet that is run
via `generate_zero_bytes` in t5562's 'CONTENT_LENGTH overflow ssite_t'
test case tries to write out an infinite amount of NUL bytes unless a
broken pipe is encountered, that snippet never encounters the broken
pipe, and keeps going until the build times out.
Oddly enough, this does not reproduce on the Windows and macOS agents,
nor in a local Ubuntu 18.04.
This developer tried for a day to figure out the exact circumstances
under which this hang happens, to no avail, the details remain a
mystery.
In the end, though, what counts is that this here change incidentally
fixes that hang (maybe also on NonStop?). Even more positively, it gets
rid of yet another unnecessary Perl invocation.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
It was reported [1] that NonStop platform does not have /dev/zero.
The test uses /dev/zero as a dummy input. Passing case (http-backed
failed because of too big input size) should not be reading anything
from it. If http-backend would erroneously try to read any data
returning EOF probably would be even safer than providing some
meaningless data.
Replace /dev/zero with /dev/null to avoid issues with platforms which do
not have /dev/zero.
Revert "t5562: replace /dev/zero with a pipe from generate_zero_bytes"
Revert cc95bc20 ("t5562: replace /dev/zero with a pipe from
generate_zero_bytes", 2019-02-09), as not feeding anything to the
command is a better way to test it.
l10n: de.po: consistent translation of 'root commit'
'root commit' is usually translated as 'Root-Commit'. But in one
occasion it‘s translated as 'Basis-Commit' which is the translation
for 'base commit'.
Signed-off-by: Sebastian Staudt <koraktor@gmail.com>
mingw: safe-guard a bit more against getenv() problems
Running up to v2.21.0, we fixed two bugs that were made prominent by the
Windows-specific change to retain copies of only the 30 latest getenv()
calls' returned strings, invalidating any copies of previous getenv()
calls' return values.
While this really shines a light onto bugs of the form where we hold
onto getenv()'s return values without copying them, it is also a real
problem for users.
And even if Jeff King's patches merged via 773e408881 (Merge branch
'jk/save-getenv-result', 2019-01-29) provide further work on that front,
we are far from done. Just one example: on Windows, we unset environment
variables when spawning new processes, which potentially invalidates
strings that were previously obtained via getenv(), and therefore we
have to duplicate environment values that are somehow involved in
spawning new processes (e.g. GIT_MAN_VIEWER in show_man_page()).
We do not have a chance to investigate, let address, all of those issues
in time for v2.21.0, so let's at least help Windows users by increasing
the number of getenv() calls' return values that are kept valid. The
number 64 was determined by looking at the average number of getenv()
calls per process in the entire test suite run on Windows (which is
around 40) and then adding a bit for good measure. And it is a power of
two (which would have hit yesterday's theme perfectly).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
submodule's default behavior wasn't documented in both git-submodule.txt
and in the usage text of git-submodule. Document the default behavior
similar to how git-remote does it.
Signed-off-by: Denton Liu <liu.denton@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Many of our grab_* functions, which parse the object content, take a
buf/sz pair of the object bytes. However, the functions which actually
parse the buffers (like find_wholine() and find_subpos()) never look at
"sz", and instead use functions like strchr() and strchrnul() that
assume the result is NUL-terminated.
This is OK in practice (and common for Git's parsing code), since we
always allocate an extra NUL when loading an object into memory (and
likewise, we are OK with stopping parsing if a commit or tag contains an
embedded NUL).
Let's drop these extra "sz" parameters, as they are misleading about how
the functions intend to access the buffer. We can drop from both the
functions mentioned above, which in turn lets us drop from their
callers, cascading all the way up to the top-level grab_values().
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The grab_person() and grab_sub_body_contents() functions take both an
object struct and a buf/sz pair of the object bytes. However, they use
only the latter, since "struct object" does not contain the parsed ident
(nor the whole commit message, of course).
Let's get rid of these misleading "struct object" parameters. It's
possible we may want them in the future (e.g., to generate error
messages that mention the object id), but since these are static
functions, we can easily add them back in later (and if we do want that
information, it's likely we'd pass it through a more generalized
"parsing context" struct anyway).
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The grab_tag_values() and grab_commit_values() functions take both the
"struct object" as well as the buf/sz pair for the actual object bytes.
However, neither function uses the latter, as they pull the data
directly from the parsed object struct.
Let's get rid of these misleading parameters.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
files-backend: drop refs parameter from split_symref_update()
This parameter was added in fcc42ea0c9 (split_symref_update(): add a
files_ref_store argument, 2016-09-04) without comment, but never used.
The splitting is purely mechanical, and doesn't depend on the particular
ref-store. Let's drop this parameter in the name of simplicity.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
pack-objects: drop unused parameter from oe_map_new_pack()
Since 43fa44fa3b (pack-objects: move in_pack out of struct object_entry,
2018-04-14), we store the source pack for each object as a small index
rather than as a pointer. When we see a new pack that has no allocated
index, we fall back to generating an array of pointers by calling
oe_map_new_pack().
Perhaps counter-intuitively, that function does not need to actually see
our new index-less pack. It only allocates and populates the array with
the existing packs, after which oe_set_in_pack() actually adds the new
pack to the array.
Let's drop the unused "struct packed_git" argument to oe_map_new_pack()
to avoid confusion.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are a few functions related to directory renames that have unused
parameters. After consulting with the author in [1], these seem to be
leftover cruft from the development process, and not signs of any bug.
Let's drop them.
diff: drop complete_rewrite parameter from run_external_diff()
Our builtin_diff() wants to know whether break-detection found a
complete rewrite, because it changes how the diff is shown. However,
when calling out to an external diff, we don't pass this information
along (and doing so would require designing a new interface to the
user-provided program).
Let's drop the unused parameter to make this fact more clear.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
diff: drop unused emit data parameter from sane_truncate_line()
We pass the "struct emit_callback" (which contains all of the context
for our diff) into sane_truncate_line(), but that function doesn't
actually use it. In theory we might eventually develop a diff option
that impacts this, but in the meantime let's not mislead anybody reading
the code. Since the function is static, it would be easy to pass it
again if it should ever become useful.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Several of the emit_* functions take a "reset" color parameter, but
never actually look at it (instead, they call into emit_diff_symbol,
which handles the colors itself). Let's drop these unused parameters.
Note that emit_line() does still take a color/reset pair, and actually
uses it. It cannot be refactored to match these other functions because
it's the thing that emit_diff_symbol eventually calls into (i.e., it
does not by itself know which colors to use, and must be told by the
caller).
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
diff: drop options parameter from diffcore_fix_diff_index()
The sole purpose of this function is to fix the sorting order of the
queued diff entries. It doesn't need to know about any diff options, so
we can drop the unused parameter.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The git-prune command checks reachability by doing a traversal, and then
checking whether a given object exists in the global object hash. This
can yield false positives if any other part of the code had to create an
object struct for some reason. It's not clear whether this is even
possible, but it's more robust to rely on something a little more
concrete: the SEEN flag set by our traversal.
Note that there is a slight possibility of regression here, as we're
relying on mark_reachable_objects() to consistently set the flag.
However, it has always done so, and we're already relying on that fact
in prune_shallow(), which is called as part of git-prune. So this is
making these two parts of the prune operation more consistent.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Pruning generally has to traverse the whole commit graph in order to
see which objects are reachable. This is the exact problem that
reachability bitmaps were meant to solve, so let's use them (if they're
available, of course).
Here are timings on git.git:
Test HEAD^ HEAD
------------------------------------------------------------------------
5304.6: prune with bitmaps 3.65(3.56+0.09) 1.01(0.92+0.08) -72.3%
And on linux.git:
Test HEAD^ HEAD
--------------------------------------------------------------------------
5304.6: prune with bitmaps 35.05(34.79+0.23) 3.00(2.78+0.21) -91.4%
The tests show a pretty optimal case, as we'll have just repacked and
should have pretty good coverage of all refs with our bitmaps. But
that's actually pretty realistic: normally prune is run via "gc" right
after repacking.
A few notes on the implementation:
- the change is actually in reachable.c, so it would improve
reachability traversals by "reflog expire --stale-fix", as well.
Those aren't performed regularly, though (a normal "git gc" doesn't
use --stale-fix), so they're not really worth measuring. There's a
low chance of regressing that caller, since the use of bitmaps is
totally transparent from the caller's perspective.
- The bitmap case could actually get away without creating a "struct
object", and instead the caller could just look up each object id in
the bitmap result. However, this would be a marginal improvement in
runtime, and it would make the callers much more complicated. They'd
have to handle both the bitmap and non-bitmap cases separately, and
in the case of git-prune, we'd also have to tweak prune_shallow(),
which relies on our SEEN flags.
- Because we do create real object structs, we go through a few
contortions to create ones of the right type. This isn't strictly
necessary (lookup_unknown_object() would suffice), but it's more
memory efficient to use the correct types, since we already know
them.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The general strategy of "git prune" is to do a full reachability walk,
then for each loose object see if we found it in our walk. But if we
don't have any loose objects, we don't need to do the expensive walk in
the first place.
This patch postpones that walk until the first time we need to see its
results.
Note that this is really a specific case of a more general optimization,
which is that we could traverse only far enough to find the object under
consideration (i.e., stop the traversal when we find it, then pick up
again when asked about the next object, etc). That could save us in some
instances from having to do a full walk. But it's actually a bit tricky
to do with our traversal code, and you'd need to do a full walk anyway
if you have even a single unreachable object (which you generally do, if
any objects are actually left after running git-repack).
So in practice this lazy-load of the full walk catches one easy but
common case (i.e., you've just repacked via git-gc, and there's nothing
unreachable).
The perf script is fairly contrived, but it does show off the
improvement:
Test HEAD^ HEAD
-------------------------------------------------------------------------
5304.4: prune with no objects 3.66(3.60+0.05) 0.00(0.00+0.00) -100.0%
and would let us know if we accidentally regress this optimization.
Note also that we need to take special care with prune_shallow(), which
relies on us having performed the traversal. So this optimization can
only kick in for a non-shallow repository. Since this is easy to get
wrong and is not covered by existing tests, let's add an extra test to
t5304 that covers this case explicitly.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
git-rebase.txt: update to reflect merge now implemented on sequencer
Since commit 8fe9c3f21dff (Merge branch 'en/rebase-merge-on-sequencer',
2019-02-06), --merge now uses the interactive backend (and matches its
behavior) so there is no separate merge backend anymore. Fix an
oversight in the docs that should have been updated with the previous
change.
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
t/lib-httpd: pass GIT_TEST_SIDEBAND_ALL through Apache
07c3c2aa16 ("tests: define GIT_TEST_SIDEBAND_ALL", 2019-01-16) added
GIT_TEST_SIDEBAND_ALL to the apache.conf PassEnv list. Avoid warnings
from Apache when the variable is unset, as we do for GIT_VALGRIND* and
GIT_TRACE, from f628825481 ("t/lib-httpd: handle running under
--valgrind", 2012-07-24) and 89c57ab3f0 ("t: pass GIT_TRACE through
Apache", 2015-03-13), respectively.
Signed-off-by: Todd Zullinger <tmz@pobox.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The result field in struct rpc_state is only used in rpc_service(), and
not in any functions it directly or indirectly calls. Refactor it to
become an argument of rpc_service() instead.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
remote-curl: reduce scope of rpc_state.stdin_preamble
The stdin_preamble field in struct rpc_state is only used in
rpc_service(), and not in any functions it directly or indirectly calls.
Refactor it to become an argument of rpc_service() instead.
An observation of all callers of rpc_service() shows that the preamble
is always set, so we no longer need the "if (preamble)" check.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>