strbuf_setlen(., 0) writes '\0' to sb.buf[0], where buf is either
allocated and unique to sb, or the global slopbuf. The slopbuf is meant
to provide a guarantee that buf is not NULL and that a freshly
initialized buffer contains the empty string, but it is not supposed to
be written to. That strbuf_setlen writes to slopbuf has at least two
implications:
First, it's wrong in principle. Second, it might be hiding misuses which
are just waiting to wreak havoc. Third, ThreadSanitizer detects a race
when multiple threads write to slopbuf at roughly the same time, thus
potentially making any more critical races harder to spot.
Avoid writing to strbuf_slopbuf in strbuf_setlen. Let's instead assert
on the first byte of slopbuf being '\0', since it helps ensure the
promised invariant of buf[len] == '\0'. (We know that "len" was already
0, or someone has messed with "alloc". If someone has fiddled with the
fields that much beyond the correct interface, they're on their own.)
This is a function which is used in many places, possibly also in hot
code paths. There are two branches in strbuf_setlen already, and we are
adding a third and possibly a fourth (in the assert). In hot code paths,
we hopefully reuse the buffer in order to avoid continous reallocations.
Thus, after a start-up phase, we should always take the same path,
which might help branch prediction, and we would never make the assert.
If a hot code path continuously reallocates, we probably have bigger
performance problems than this new safety-check.
Simple measurements do not contradict this reasoning. 100000000 times
resetting a buffer and adding the empty string takes 5.29/5.26 seconds
with/without this patch (best of three). Releasing at every iteration
yields 18.01/17.87. Adding a 30-character string instead of the empty
string yields 5.61/5.58 and 17.28/17.28(!).
This patch causes the git binary emitted by gcc 5.4.0 -O2 on my machine
to grow from 11389848 bytes to 11497184 bytes, an increase of 0.9%.
I also tried to piggy-back on the fact that we already check alloc,
which should already tell us whether we are using the slopbuf:
if (sb->alloc) {
if (len > sb->alloc - 1)
die("BUG: strbuf_setlen() beyond buffer");
sb->buf[len] = '\0';
} else {
if (len)
die("BUG: strbuf_setlen() beyond buffer");
assert(!strbuf_slopbuf[0]);
}
sb->len = len;
That didn't seem to be much slower (5.38, 18.02, 5.70, 17.32 seconds),
but it does introduce some minor code duplication. The resulting git
binary was 11510528 bytes large (another 0.1% increase).
Signed-off-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
pack-objects: take lock before accessing `remaining`
When checking the conditional of "while (me->remaining)", we did not
hold the lock. Calling find_deltas would still be safe, since it checks
"remaining" (after taking the lock) and is able to handle all values. In
fact, this could (currently) not trigger any bug: a bug could happen if
`remaining` transitioning from zero to non-zero races with the evaluation
of the while-condition, but these are always separated by the
data_ready-mechanism.
Make sure we have the lock when we read `remaining`. This does mean we
release it just so that find_deltas can take it immediately again. We
could tweak the contract so that the lock should be taken before calling
find_deltas, but let's defer that until someone can actually show that
"unlock+lock" has a measurable negative impact.
Signed-off-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
convert: always initialize attr_action in convert_attrs
convert_attrs contains an "if-else". In the "if", we set attr_action
twice, and the first assignment has no effect. In the "else", we do not
set it at all. Since git_check_attr always returns the same value, we'll
always end up in the "if", so there is no problem right now. But
convert_attrs is obviously trying not to rely on such an
implementation-detail of another component.
Make the initialization of attr_action after the if-else. Remove the
earlier assignments.
Suggested-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Merge branch 'pw/unquote-path-in-git-pm' into maint
Code refactoring.
* pw/unquote-path-in-git-pm:
t9700: add tests for Git::unquote_path()
Git::unquote_path(): throw an exception on bad path
Git::unquote_path(): handle '\a'
add -i: move unquote_path() to Git.pm
Merge branch 'jk/gc-pre-detach-under-hook' into maint
We run an early part of "git gc" that deals with refs before
daemonising (and not under lock) even when running a background
auto-gc, which caused multiple gc processes attempting to run the
early part at the same time. This is now prevented by running the
early part also under the GC lock.
* jk/gc-pre-detach-under-hook:
gc: run pre-detach operations under lock
Merge branch 'rs/progress-overall-throughput-at-the-end' into maint
The progress meter did not give a useful output when we haven't had
0.5 seconds to measure the throughput during the interval. Instead
show the overall throughput rate at the end, which is a much more
useful number.
* rs/progress-overall-throughput-at-the-end:
progress: show overall rate in last update
Merge branch 'tb/push-to-cygwin-unc-path' into maint
On Cygwin, similar to Windows, "git push //server/share/repository"
ought to mean a repository on a network share that can be accessed
locally, but this did not work correctly due to stripping the double
slashes at the beginning.
This may need to be heavily tested before it gets unleashed to the
wild, as the change is at a fairly low-level code and would affect
not just the code to decide if the push destination is local. There
may be unexpected fallouts in the path normalization.
* tb/push-to-cygwin-unc-path:
cygwin: allow pushing to UNC paths
connect: reject paths that look like command line options
If we get a repo path like "-repo.git", we may try to invoke
"git-upload-pack -repo.git". This is going to fail, since
upload-pack will interpret it as a set of bogus options. But
let's reject this before we even run the sub-program, since
we would not want to allow any mischief with repo names that
actually are real command-line options.
You can still ask for such a path via git-daemon, but there's no
security problem there, because git-daemon enters the repo itself
and then passes "." on the command line.
Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
connect: reject dashed arguments for proxy commands
If you have a GIT_PROXY_COMMAND configured, we will run it
with the host/port on the command-line. If a URL contains a
mischievous host like "--foo", we don't know how the proxy
command may handle it. It's likely to break, but it may also
do something dangerous and unwanted (technically it could
even do something useful, but that seems unlikely).
We should err on the side of caution and reject this before
we even run the command.
The hostname check matches the one we do in a similar
circumstance for ssh. The port check is not present for ssh,
but there it's not necessary because the syntax is "-p
<port>", and there's no ambiguity on the parsing side.
It's not clear whether you can actually get a negative port
to the proxy here or not. Doing:
git fetch git://remote:-1234/repo.git
keeps the "-1234" as part of the hostname, with the default
port of 9418. But it's a good idea to keep this check close
to the point of running the command to make it clear that
there's no way to circumvent it (and at worst it serves as a
belt-and-suspenders check).
Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
connect: factor out "looks like command line option" check
We reject hostnames that start with a dash because they may
be confused for command-line options. Let's factor out that
notion into a helper function, as we'll use it in more
places. And while it's simple now, it's not clear if some
systems might need more complex logic to handle all cases.
Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
connect: reject ssh hostname that begins with a dash
When commands like "git fetch" talk with ssh://$rest_of_URL/, the
code splits $rest_of_URL into components like host, port, etc., and
then spawns the underlying "ssh" program by formulating argv[] array
that has:
- the path to ssh command taken from GIT_SSH_COMMAND, etc.
- dashed options like '-batch' (for Tortoise), '-p <port>' as
needed.
- ssh_host, which is supposed to be the hostname parsed out of
$rest_of_URL.
- then the command to be run on the other side, e.g. git
upload-pack.
If the ssh_host ends up getting '-<anything>', the argv[] that is
used to spawn the command becomes something like:
which obviously is bogus, but depending on the actual value of
"<anything>", will make "ssh" parse and use it as an option.
Prevent this by forbidding ssh_host that begins with a "-".
Noticed-by: Joern Schneeweisz of Recurity Labs Reported-by: Brian at GitLab Signed-off-by: Junio C Hamano <gitster@pobox.com> Reviewed-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
t/lib-proto-disable: restore protocol.allow after config tests
The tests for protocol.allow actually set that variable in
the on-disk config, run a series of tests, and then never
clean up after themselves. This means that whatever tests we
run after have protocol.allow=never, which may influence
their results.
In most cases we either exit after running these tests, or
do another round of test_proto(). In the latter case, this happens to
work because:
1. Tests of the GIT_ALLOW_PROTOCOL environment variable
override the config.
2. Tests of the specific config "protocol.foo.allow"
override the protocol.allow config.
3. The next round of protocol.allow tests start off by
setting the config to a known value.
However, it's a land-mine waiting to trap somebody adding
new tests to one of the t581x test scripts. Let's make sure
we clean up after ourselves.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
blame: fix memory corruption scrambling revision name in error message
When attempting to blame a non-existing path, git should show an error
message like this:
$ git blame e83c51633 -- nonexisting-file
fatal: no such path nonexisting-file in e83c51633
Since the recent commit 835c49f7d (blame: rework methods that
determine 'final' commit, 2017-05-24) the revision name is either
missing or some scrambled characters are shown instead. The reason is
that the revision name must be duplicated, because it is invalidated
when the pending objects array is cleared in the meantime, but this
commit dropped the duplication.
Restore the duplication of the revision name in the affected functions
(find_single_final() and find_single_initial()).
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
We started using "%" PRItime, imitating "%" PRIuMAX and friends, as
a way to format the internal timestamp value, but this does not
play well with gettext(1) i18n framework, and causes "make pot"
that is run by the l10n coordinator to create a broken po/git.pot
file. This is a possible workaround for that problem.
* jc/po-pritime-fix:
Makefile: help gettext tools to cope with our custom PRItime format
A recent update made it easier to use "-fsanitize=" option while
compiling but supported only one sanitize option. Allow more than
one to be combined, joined with a comma, like "make SANITIZE=foo,bar".
* jk/build-with-asan:
Makefile: allow combining UBSan with other sanitizers
RelNotes: mention "sha1dc: optionally use sha1collisiondetection as a submodule"
To note that merely cloning git.git without --recurse-submodules
doesn't get you a full copy of the code anymore. See 5f6482d642 ("RelNotes: mention "log: make --regexp-ignore-case work
with --perl-regexp"", 2017-07-20).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
RelNotes: mention "log: make --regexp-ignore-case work with --perl-regexp"
To inform users that they can use --regexp-ignore-case now, and that
existing scripts which relied on that + PCRE may be buggy. See 9e3cbc59d5 ("log: make --regexp-ignore-case work with --perl-regexp",
2017-05-20).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Makefile: help gettext tools to cope with our custom PRItime format
We started using our own timestamp_t type and PRItime format
specifier to go along with it, so that we can later change the
underlying type and output format more easily, but this does not
play well with gettext tools.
Because gettext tools need to keep the *.po file portable across
platforms, they have to special-case the format specifiers like
PRIuMAX that are known types in inttypes.h, instead of letting CPP
handle strings like
"%" PRIuMAX " seconds ago"
as an ordinary string concatenation. They fundamentally cannot do
the same for our own custom type/format.
Given that po/git.pot needs to be generated only once every release
and by only one person, i.e. the l10n coordinator, let's update the
Makefile rule to generate po/git.pot so that gettext tools are run
on a munged set of sources in which all mentions of PRItime are
replaced with PRIuMAX, which is what we happen to use right now.
This way, developers do not have to care that PRItime does not play
well with gettext, and translators do not have to care that we use
our own PRItime.
The credit for the idea to munge the source files goes to Dscho.
Possible bugs are mine.
Helped-by: Jiang Xin <worldhello.net@gmail.com> Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
doc: reformat the paragraph containing the 'cut-line'
The paragraph that describes the 'scissors' cleanup mode of
'commit' had the 'cut-line' in the middle of a sentence. This
made it possible for the line to get wrapped on smaler windows.
This shouldn't be the case as it makes it hard for the user to
understand the structure of the cut-line.
Reformat the pragraph to make the 'cut-line' stand on a line of
it's own thus distinguishing it from the rest of the paragraph.
This further prevents it from getting wrapped to some extent.
Signed-off-by: Kaartic Sivaraam <kaarticsivaraam91196@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
We run an early part of "git gc" that deals with refs before
daemonising (and not under lock) even when running a background
auto-gc, which caused multiple gc processes attempting to run the
early part at the same time. This is now prevented by running the
early part also under the GC lock.
* jk/gc-pre-detach-under-hook:
gc: run pre-detach operations under lock
The progress meter did not give a useful output when we haven't had
0.5 seconds to measure the throughput during the interval. Instead
show the overall throughput rate at the end, which is a much more
useful number.
* rs/progress-overall-throughput-at-the-end:
progress: show overall rate in last update
On Cygwin, similar to Windows, "git push //server/share/repository"
ought to mean a repository on a network share that can be accessed
locally, but this did not work correctly due to stripping the double
slashes at the beginning.
This may need to be heavily tested before it gets unleashed to the
wild, as the change is at a fairly low-level code and would affect
not just the code to decide if the push destination is local. There
may be unexpected fallouts in the path normalization.
* tb/push-to-cygwin-unc-path:
cygwin: allow pushing to UNC paths