Currently, we only accept the flag indicating whether the HEAD should be
detached not. In the next commit, we want to introduce another flag: to
toggle between emulating `reset --hard` vs `checkout -q`.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
`--exclude` from rev-list and rev-parse fails to exclude references if
the next `--branches`, `--tags` or `--remotes` use the optional
inclusive glob because those options are implemented as particular cases
of `--glob=`, which itself requires that exclude patterns begin with
'refs/'.
But it makes sense for `--branches=glob` and friends to be aware that
exclusions patterns for them shouldn't be 'refs/<type>/' prefixed, the
same way exclude patterns for `--branches` and friends (without the
optional glob) already are.
Let's record in 'refs.c:struct ref_filter' which context the exclude
pattern is tied to, so refs.c:filter_refs() can decide if it should
ignore the prefix when trying to match.
Signed-off-by: Rafael Ascensão <rafa.almas@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
refs: show --exclude failure with --branches/tags/remotes=glob
The documentation of `--exclude=` option from rev-list and rev-parse
explicitly states that exclude patterns *should not* start with 'refs/'
when used with `--branches`, `--tags` or `--remotes`.
However, following this advice results in refereces not being excluded
if the next `--branches`, `--tags`, `--remotes` use the optional
inclusive glob.
Demonstrate this failure.
Signed-off-by: Rafael Ascensão <rafa.almas@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 024aa4696c (fetch-pack.c: use oidset to check existence of loose
object, 2018-03-14) added a cache to avoid calling stat() for a bunch of
loose objects we don't have.
Now that OBJECT_INFO_QUICK handles this caching itself, we can drop the
custom solution.
Note that this might perform slightly differently, as the original code
stopped calling readdir() when we saw more loose objects than there were
refs. So:
1. The old code might have spent work on readdir() to fill the cache,
but then decided there were too many loose objects, wasting that
effort.
2. The new code might spend a lot of time on readdir() if you have a
lot of loose objects, even though there are very few objects to
ask about.
In practice it probably won't matter either way; see the previous commit
for some discussion of the tradeoff.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
sha1-file: use loose object cache for quick existence check
In cases where we expect to ask has_sha1_file() about a lot of objects
that we are not likely to have (e.g., during fetch negotiation), we
already use OBJECT_INFO_QUICK to sacrifice accuracy (due to racing with
a simultaneous write or repack) for speed (we avoid re-scanning the pack
directory).
However, even checking for loose objects can be expensive, as we will
stat() each one. On many systems this cost isn't too noticeable, but
stat() can be particularly slow on some operating systems, or due to
network filesystems.
Since the QUICK flag already tells us that we're OK with a slightly
stale answer, we can use that as a cue to look in our in-memory cache of
each object directory. That basically trades an in-memory binary search
for a stat() call.
Note that it is possible for this to actually be _slower_. We'll do a
full readdir() to fill the cache, so if you have a very large number of
loose objects and a very small number of lookups, that readdir() may end
up more expensive.
This shouldn't be a big deal in practice. If you have a large number of
reachable loose objects, you'll already run into performance problems
(which you should remedy by repacking). You may have unreachable objects
which wouldn't otherwise impact performance. Usually these would go away
with the prune step of "git gc", but they may be held for up to 2 weeks
in the default configuration.
So it comes down to how many such objects you might reasonably expect to
have, how much slower is readdir() on N entries versus M stat() calls
(and here we really care about the syscall backing readdir(), like
getdents() on Linux, but I'll just call this readdir() below).
If N is much smaller than M (a typical packed repo), we know this is a
big win (few readdirs() followed by many uses of the resulting cache).
When N and M are similar in size, it's also a win. We care about the
latency of making a syscall, and readdir() should be giving us many
values in a single call. How many?
On Linux, running "strace -e getdents ls" shows a 32k buffer getting 512
entries per call (which is 64 bytes per entry; the name itself is 38
bytes, plus there are some other fields). So we can imagine that this is
always a win as long as the number of loose objects in the repository is
a factor of 500 less than the number of lookups you make. It's hard to
auto-tune this because we don't generally know up front how many lookups
we're going to do. But it's unlikely for this to perform significantly
worse.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
object-store: provide helpers for loose_objects_cache
Our object_directory struct has a loose objects cache that all users of
the struct can see. But the only one that knows how to load the cache is
find_short_object_filename(). Let's extract that logic in to a reusable
function.
While we're at it, let's also reset the cache when we re-read the object
directories. This shouldn't have an impact on performance, as re-reads
are meant to be rare (and are already expensive, so we avoid them with
things like OBJECT_INFO_QUICK).
Since the cache is already meant to be an approximation, it's tempting
to skip even this bit of safety. But it's necessary to allow more code
to use it. For instance, fetch-pack explicitly re-reads the object
directory after performing its fetch, and would be confused if we didn't
clear the cache.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
sha1-file: use an object_directory for the main object dir
Our handling of alternate object directories is needlessly different
from the main object directory. As a result, many places in the code
basically look like this:
do_something(r->objects->objdir);
for (odb = r->objects->alt_odb_list; odb; odb = odb->next)
do_something(odb->path);
That gets annoying when do_something() is non-trivial, and we've
resorted to gross hacks like creating fake alternates (see
find_short_object_filename()).
Instead, let's give each raw_object_store a unified list of
object_directory structs. The first will be the main store, and
everything after is an alternate. Very few callers even care about the
distinction, and can just loop over the whole list (and those who care
can just treat the first element differently).
A few observations:
- we don't need r->objects->objectdir anymore, and can just
mechanically convert that to r->objects->odb->path
- object_directory's path field needs to become a real pointer rather
than a FLEX_ARRAY, in order to fill it with expand_base_dir()
- we'll call prepare_alt_odb() earlier in many functions (i.e.,
outside of the loop). This may result in us calling it even when our
function would be satisfied looking only at the main odb.
But this doesn't matter in practice. It's not a very expensive
operation in the first place, and in the majority of cases it will
be a noop. We call it already (and cache its results) in
prepare_packed_git(), and we'll generally check packs before loose
objects. So essentially every program is going to call it
immediately once per program.
Arguably we should just prepare_alt_odb() immediately upon setting
up the repository's object directory, which would save us sprinkling
calls throughout the code base (and forgetting to do so has been a
source of subtle bugs in the past). But I've stopped short of that
here, since there are already a lot of other moving parts in this
patch.
- Most call sites just get shorter. The check_and_freshen() functions
are an exception, because they have entry points to handle local and
nonlocal directories separately.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
handle alternates paths the same as the main object dir
When we generate loose file paths for the main object directory, the
caller provides a buffer to loose_object_path (formerly sha1_file_name).
The callers generally keep their own static buffer to avoid excessive
reallocations.
But for alternate directories, each struct carries its own scratch
buffer. This is needlessly different; let's unify them.
We could go either direction here, but this patch moves the alternates
struct over to the main directory style (rather than vice-versa).
Technically the alternates style is more efficient, as it avoids
rewriting the object directory name on each call. But this is unlikely
to matter in practice, as we avoid reallocations either way (and nobody
has ever noticed or complained that the main object directory is copying
a few extra bytes before making a much more expensive system call).
And this has the advantage that the reusable buffers are tied to
particular calls, which makes the invalidation rules simpler (for
example, the return value from stat_sha1_file() used to be invalidated
by basically any other object call, but now it is affected only by other
calls to stat_sha1_file()).
We do steal the trick from alt_sha1_path() of returning a pointer to the
filled buffer, which makes a few conversions more convenient.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
sha1_file_name(): overwrite buffer instead of appending
The sha1_file_name() function is used to generate the path to a loose
object in the object directory. It doesn't make much sense for it to
append, since the the path we write may be absolute (i.e., you cannot
reliably build up a path with it). Because many callers use it with a
static buffer, they have to strbuf_reset() manually before each call
(and the other callers always use an empty buffer, so they don't care
either way). Let's handle this automatically.
Since we're changing the semantics, let's take the opportunity to give
it a more hash-neutral name (which will also catch any callers from
topics in flight).
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
rename "alternate_object_database" to "object_directory"
In preparation for unifying the handling of alt odb's and the normal
repo object directory, let's use a more neutral name. This patch is
purely mechanical, swapping the type name, and converting any variables
named "alt" to "odb". There should be no functional change, but it will
reduce the noise in subsequent diffs.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
submodule--helper: prefer strip_suffix() to ends_with()
Using strip_suffix() lets us avoid repeating ourselves. It also makes
the handling of "/" a bit less subtle (we strip one less character than
we matched in order to leave it in place, but we can just as easily
include the "/" when we add more path components).
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The run-command API makes no promises about what is left in a struct
child_process after a command finishes, and it's not safe to simply
reuse it again for a similar command. In particular:
- if you use child->args or child->env_array, they are cleared after
finish_command()
- likewise, start_command() may point child->argv at child->args->argv;
reusing that would lead to accessing freed memory
- the in/out/err may hold pipe descriptors from the previous run
These two calls are _probably_ OK because they do not use any of those
features. But it's only by chance, and may break in the future; let's
reinitialize our struct for each program we run.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
When editing patches e.g. in `git add -e`, it is quite common that a
hunk ends up having no -/+ lines, i.e. it is now supposed to do nothing.
This use case was broken by ad6e8ed37bc1 (apply: reject a hunk that does
not do anything, 2015-06-01) with the good intention of catching a very
real, different issue in hand-edited patches.
So let's use the `--recount` option as the tell-tale whether the user
would actually be okay with no-op hunks.
Add a test case to make sure that this use case does not regress again.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Reviewed-by: Josh Steadmon <steadmon@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
status: rebase and merge can be in progress at the same time
Since `git rebase -r` was introduced, that is possible. But our
machinery did not think that possible, and failed to say anything about
the rebase in progress when in the middle of a merge.
Let's work around that in the minimal fashion.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
built-in rebase --skip/--abort: clean up stale .git/<name> files
The scripted version of the rebase used to execute `git reset --hard`
when skipping or aborting. When we ported this to C, we did update the
worktree and some reflogs, but we failed to imitate `git reset --hard`'s
behavior regarding files in .git/ such as MERGE_HEAD.
Let's address this oversight.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
rebase -i: include MERGE_HEAD into files to clean up
Every once in a while, the interactive rebase makes sure that no stale
files are lying around. These days, we need to include MERGE_HEAD into
that set of files, as the `merge` command will generate them.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we detect that a `merge` can be skipped because the merged commit
is already an ancestor of HEAD, we do not need to commit, therefore
writing the MERGE_HEAD file is useless.
It is actually worse than useless: a subsequent `git commit` will pick
it up and think that we want to merge that commit, still.
To avoid that, move the code that writes the MERGE_HEAD file to a
location where we already know that the `merge` cannot be skipped.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
rebase -r: demonstrate bug with conflicting merges
When calling `merge` on a branch that has already been merged, that
`merge` is skipped quietly, but currently a MERGE_HEAD file is being
left behind and will then be grabbed by the next `pick` (that did
not want to create a *merge* commit).
Demonstrate this.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
511726e4b1 ("builtin/notes: fix premature failure when trying to add
the empty blob", 2014-11-09) removed the check for !len but left a
call to free the buffer that will be otherwise NULL
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com> Acked-by: Johan Herland <johan@herland.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
remote-curl.c: xcurl_off_t is not portable (on 32 bit platfoms)
When setting
DEVELOPER = 1
DEVOPTS = extra-all
"gcc (Raspbian 6.3.0-18+rpi1+deb9u1) 6.3.0 20170516" errors out with
"comparison is always false due to limited range of data type"
"[-Werror=type-limits]"
It turns out that the function xcurl_off_t() has 2 flavours:
- It gives a warning 32 bit systems, like Linux
- It takes the signed ssize_t as a paramter, but the only caller is using
a size_t (which is typically unsigned these days)
The original motivation of this function is to make sure that sizes > 2GiB
are handled correctly. The curl documentation says:
"For any given platform/compiler curl_off_t must be typedef'ed to a 64-bit
wide signed integral data type"
On a 32 bit system "size_t" can be promoted into a 64 bit signed value
without loss of data, and therefore we may see the
"comparison is always false" warning.
On a 64 bit system it may happen, at least in theory, that size_t is > 2^63,
and then the promotion from an unsigned "size_t" into a signed "curl_off_t"
may be a problem.
One solution to suppress a possible compiler warning could be to remove
the function xcurl_off_t().
However, to be on the very safe side, we keep it and improve it:
- The len parameter is changed from ssize_t to size_t
- A temporally variable "size" is used, promoted int uintmax_t and the compared
with "maximum_signed_value_of_type(curl_off_t)".
Thanks to Junio C Hamano for this hint.
Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Upcast size_t variables to uintmax_t when printing
When printing variables which contain a size, today "unsigned long"
is used at many places.
In order to be able to change the type from "unsigned long" into size_t
some day in the future, we need to have a way to print 64 bit variables
on a system that has "unsigned long" defined to be 32 bit, like Win64.
Upcast all those variables into uintmax_t before they are printed.
This is to prepare for a bigger change, when "unsigned long"
will be converted into size_t for variables which may be > 4Gib.
Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
p3400: replace calls to `git checkout -b' by `git checkout -B'
p3400 makes a copy of the current repository to test git-rebase
performance, and creates new branches in the copy with `git checkout
-b'. If the original repository has branches with the same name as the
script is trying to create, this operation will fail.
This replaces these calls by `git checkout -B' to force the creation and
update of these branches.
Signed-off-by: Alban Gruin <alban.gruin@gmail.com> Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
build: fix broken command-list.h generation with core.autocrlf
The script generate-cmdlist.sh needs input text files in UNIX line
ending to work correctly. It's been fine even with core.autocrlf set
because Documentation/git-*.txt is forced LF conversion.
But this leaves out gitk.txt and also Documentation/*config.txt that
recently becomes new input for this script. Update the attribute file
to force LF on all *.txt files to be on the safe side.
For more details, please see 00ddc9d13c (Fix build with
core.autocrlf=true - 2017-05-09)
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
This case is more interesting than other boring "remove the_repo"
commits because while we need access to the object database, we cannot
simply use r->index because unpack-trees.c can operate on a temporary
index, not $GIT_DIR/index. Ideally we should be able to pass an object
database to lookup_tree() but that ship has sailed.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
sequencer.c: remove implicit dependency on the_repository
Note that the_hash_algo stays, even if we can easily replace it with
repo->hash_algo. My reason is I still believe tying hash_algo to a
struct repository is a wrong move. But if I'm wrong, we can always go
for another round of conversion.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
sequencer.c: remove implicit dependency on the_index
Since we're going to pass 'struct repository *' around most of the
time instead of 'struct index_state *' because most sequencer.c
operations need more than just the index, the_repository is replaced
as well in the functions that now take 'struct repository
*'. the_repository is still present in this file, but total clean up
will be done later. It's not the main focus of this patch.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
transport.c: remove implicit dependency on the_index
note, there's still another hidden dependency related to this: even
though we pass a repo to transport_push() we still use
is_bare_repository() which pretty much assumes the_repository (and
some other global state).
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
These messages will be marked for translation later. Reduce word legos
and give translators almost full phrases. describe_object() is updated
so that it can be called from printf() twice.
While at there, remove \n from the strings to reduce a bit of work
from translators.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce optname() that does the early half of original opterror() to
come up with the name of the option reported back to the user, and use
it to kill opterror(). The callers of opterror() now directly call
error() using the string returned by opterror() instead.
There are a few issues with opterror()
- it tries to assemble an English sentence from pieces. This is not
great for translators because we give them pieces instead of a full
sentence.
- It's a wrapper around error() and needs some hack to let the
compiler know it always returns -1.
- Since it takes a string instead of printf format, one call site has
to assemble the string manually before passing to it.
Using error() directly solves the second and third problems.
It kind helps the first problem as well because "%s does foo" does
give a translator a full sentence in a sense and let them reorder if
needed. But it has limitations, if the subject part has to change
based on the rest of the sentence, that language is screwed. This is
also why I try to avoid calling optname() when 'flags' is known in
advance.
Mark of these strings for translation as well while at there.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The first error, "internal error", is clearly a BUG(). The second two
are meant to catch calls with invalid parameters and should never
happen outside the test suite.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
typechange_fmt and added_fmt should have a colon before "needs
update". Align the statements to make it easier to read and see. Also
drop the unnecessary ().
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
alias.c: mark split_cmdline_strerror() strings for translation
This function can be part of translated messages. To make sure we
don't have a sentence with mixed languages, mark the strings for
translation, but only use translated strings in places we know we will
output translated strings.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
This patch makes the output of `git shortlog -nse v2.10.0..master`
duplicate-free by taking/guessing the current and preferred
addresses for authors that appear with more than one address.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
range-diff: fix regression in passing along diff options
In 73a834e9e2 ("range-diff: relieve callers of low-level configuration
burden", 2018-07-22) we broke passing down options like --no-patch,
--stat etc.
Fix that regression, and add a test asserting the pre-73a834e9e2
behavior for some of these diff options.
As noted in a change leading up to this ("range-diff doc: add a
section about output stability", 2018-11-07) the output is not meant
to be stable. So this regression test will likely need to be tweaked
once we get a "proper" --stat option.
See
https://public-inbox.org/git/nycvar.QRO.7.76.6.1811071202480.39@tvgsbejvaqbjf.bet/
for a further explanation of the regression. The fix here is not the
same as in Johannes's on-list patch, for reasons that'll be explained
in a follow-up commit.
The quoting of "EOF" here mirrors that of an earlier test. Perhaps
that should be fixed, but let's leave that up to a later cleanup
change.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
range-diff doc: add a section about output stability
The range-diff command is already advertised as porcelain, but let's
make it really clear that the output is completely subject to change,
particularly when it comes to diff options such as --stat. Right now
that option doesn't work, but fixing that is the subject of a later
change.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Usually we don't need to set libcurl to choose which version of the
HTTP protocol to use to communicate with a server.
But different versions of libcurl, the default value is not the same.
Earlier we made the entire build to fail when GETTEXT_POISON=Yes is
given to make, to notify those who did not notice that text poisoning
is now a runtime behaviour.
It turns out that this is too irritating for those who need to build
and test different versions of Git that cross the boundary between
history with and without this topic to switch between two
environment variables. Demote the error to a warning, so that you
can say something like
make GETTEXT_POISON=Yes GIT_TEST_GETTEXT_POISON=Yes test
during the transition period, without having to worry about whether
exact version you are testing has or does not have this topic.
Signed-off-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the GETTEXT_POISON compile-time + runtime GIT_GETTEXT_POISON
test parameter to only be a GIT_TEST_GETTEXT_POISON=<non-empty?>
runtime parameter, to be consistent with other parameters documented
in "Running tests with special setups" in t/README.
When I added GETTEXT_POISON in bb946bba76 ("i18n: add GETTEXT_POISON
to simulate unfriendly translator", 2011-02-22) I was concerned with
ensuring that the _() function would get constant folded if NO_GETTEXT
was defined, and likewise that GETTEXT_POISON would be compiled out
unless it was defined.
But as the benchmark in my [1] shows doing a one-off runtime
getenv("GIT_TEST_[...]") is trivial, and since GETTEXT_POISON was
originally added the GIT_TEST_* env variables have become the common
idiom for turning on special test setups.
So change GETTEXT_POISON to work the same way. Now the
GETTEXT_POISON=YesPlease compile-time option is gone, and running the
tests with GIT_TEST_GETTEXT_POISON=[YesPlease|] can be toggled on/off
without recompiling.
This allows for conditionally amending tests to test with/without
poison, similar to what 859fdc0c3c ("commit-graph: define
GIT_TEST_COMMIT_GRAPH", 2018-08-29) did for GIT_TEST_COMMIT_GRAPH. Do
some of that, now we e.g. always run the t0205-gettext-poison.sh test.
I did enough there to remove the GETTEXT_POISON prerequisite, but its
inverse C_LOCALE_OUTPUT is still around, and surely some tests using
it can be converted to e.g. always set GIT_TEST_GETTEXT_POISON=.
Notes on the implementation:
* We still compile a dedicated GETTEXT_POISON build in Travis
CI. Perhaps this should be revisited and integrated into the
"linux-gcc" build, see ae59a4e44f ("travis: run tests with
GIT_TEST_SPLIT_INDEX", 2018-01-07) for prior art in that area. Then
again maybe not, see [2].
* We now skip a test in t0000-basic.sh under
GIT_TEST_GETTEXT_POISON=YesPlease that wasn't skipped before. This
test relies on C locale output, but due to an edge case in how the
previous implementation of GETTEXT_POISON worked (reading it from
GIT-BUILD-OPTIONS) wasn't enabling poison correctly. Now it does,
and needs to be skipped.
* The getenv() function is not reentrant, so out of paranoia about
code of the form:
printf(_("%s"), getenv("some-env"));
call use_gettext_poison() in our early setup in git_setup_gettext()
so we populate the "poison_requested" variable in a codepath that's
won't suffer from that race condition.
* We error out in the Makefile if you're still saying
GETTEXT_POISON=YesPlease to prompt users to change their
invocation.
* We should not print out poisoned messages during the test
initialization itself to keep it more readable, so the test library
hides the variable if set in $GIT_TEST_GETTEXT_POISON_ORIG during
setup. See [3].
See also [4] for more on the motivation behind this patch, and the
history of the GETTEXT_POISON facility.
In handle_rename_rename_1to2(), we have duplicated error handling
around colliding paths. Specifically, when we want to write out
the file and there is a directory or untracked file in the way,
we need to create a temporary file to hold the contents. This has
some special output to alert the user, and this output is
duplicated for each side of the conflict.
Simplify the call by generating this new path in a helper
function.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
t6036, t6043: increase code coverage for file collision handling
Stolee's coverage reports found a few code blocks for file collision
conflicts that had not previously been covered by testcases; add a few
more testcases to cover those too.
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we have a rename/rename(1to2) conflict, each of the renames can
collide with a file addition. Each of these rename/add conflicts suffered
from the same kinds of problems that normal rename/add suffered from.
Make the code use handle_file_conflicts() as well so that we get all the
same fixes and consistent behavior between the different conflict types.
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
merge-recursive: use handle_file_collision for add/add conflicts
This results in no-net change of behavior, it simply ensures that all
file-collision conflict handling types are being handled the same by
calling the same function.
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
merge-recursive: improve handling for rename/rename(2to1) conflicts
This makes the rename/rename(2to1) conflicts use the new
handle_file_collision() function. Since that function was based
originally on the rename/rename(2to1) handling code, the main
differences here are in what was added. In particular:
* Instead of storing files at collide_path~HEAD and collide_path~MERGE,
the files are two-way merged and recorded at collide_path.
* Instead of recording the version of the renamed file that existed
on the renamed side in the index (thus ignoring any changes that
were made to the file on the side of history without the rename),
we do a three-way content merge on the renamed path, then store
that at either stage 2 or stage 3.
* Note that since the content merge for each rename may have conflicts,
and then we have to merge the two renamed files, we can end up with
nested conflict markers.
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
This makes the rename/add conflict handling make use of the new
handle_file_collision() function, which fixes several bugs and improves
things for the rename/add case significantly. Previously, rename/add
would:
* Not leave any higher order stage entries in the index, making it
appear as if there were no conflict.
* Would place the rename file at the colliding path, and move the
added file elsewhere, which combined with the lack of higher order
stage entries felt really odd. It's not clear to me why the
rename should take precedence over the add; if one should be moved
out of the way, they both probably should.
* In the recursive case, it would do a two way merge of the added
file and the version of the renamed file on the renamed side,
completely excluding modifications to the renamed file on the
unrenamed side of history.
Use the new handle_file_collision() to fix all of these issues.
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
merge-recursive: new function for better colliding conflict resolutions
There are three conflict types that represent two (possibly entirely
unrelated) files colliding at the same location:
* add/add
* rename/add
* rename/rename(2to1)
These three conflict types already share more similarity than might be
immediately apparent from their description: (1) the handling of the
rename variants already involves removing any entries from the index
corresponding to the original file names[*], thus only leaving entries
in the index for the colliding path; (2) likewise, any trace of the
original file name in the working tree is also removed. So, in all
three cases we're left with how to represent two colliding files in both
the index and the working copy.
[*] Technically, this isn't quite true because rename/rename(2to1)
conflicts in the recursive (o->call_depth > 0) case do an "unrename"
since about seven years ago. But even in that case, Junio felt
compelled to explain that my decision to "unrename" wasn't necessarily
the only or right answer -- search for "Comment from Junio" in t6036 for
details.
My initial motivation for looking at these three conflict types was that
if the handling of these three conflict types is the same, at least in
the limited set of cases where a renamed file is unmodified on the side
of history where the file is not renamed, then a significant performance
improvement for rename detection during merges is possible. However,
while that served as motivation to look at these three types of
conflicts, the actual goal of this new function is to try to improve the
handling for all three cases, not to merely make them the same as each
other in that special circumstance.
=== Handling the working tree ===
The previous behavior for these conflict types in regards to the
working tree (assuming the file collision occurs at 'foo') was:
* add/add does a two-way merge of the two files and records it as 'foo'.
* rename/rename(2to1) records the two different files into two new
uniquely named files (foo~HEAD and foo~$MERGE), while removing 'foo'
from the working tree.
* rename/add records the two different files into two different
locations, recording the add at foo~$SIDE and, oddly, recording
the rename at foo (why is the rename more important than the add?)
So, the question for what to write to the working tree boils down to
whether the two colliding files should be two-way merged and recorded in
place, or recorded into separate files. As per discussion on the git
mailing lit, two-way merging was deemed to always be preferred, as that
makes these cases all more like content conflicts that users can handle
from within their favorite editor, IDE, or merge tool. Note that since
renames already involve a content merge, rename/add and
rename/rename(2to1) conflicts could result in nested conflict markers.
=== Handling of the index ===
For a typical rename, unpack_trees() would set up the index in the
following fashion:
old_path new_path
stage1: 5ca1ab1e00000000
stage2: f005ba1100000000
stage3: 00000000b0a710ad
And merge-recursive would rewrite this to
new_path
stage1: 5ca1ab1e
stage2: f005ba11
stage3: b0a710ad
Removing old_path from the index means the user won't have to `git rm
old_path` manually every time a renamed path has a content conflict.
It also means they can use `git checkout [--ours|--theirs|--conflict|-m]
new_path`, `git diff [--ours|--theirs]` and various other commands that
would be difficult otherwise.
This strategy becomes a problem when we have a rename/add or
rename/rename(2to1) conflict, however, because then we have only three
slots to store blob sha1s and we need either four or six. Previously,
this was handled by continuing to delete old_path from the index, and
just outright ignoring any blob shas from old_path. That had the
downside of deleting any trace of changes made to old_path on the other
side of history. This function instead does a three-way content merge of
the renamed file, and stores the blob sha1 for that at either stage2 or
stage3 for new_path (depending on which side the rename came from). That
has the advantage of bringing information about changes on both sides and
still allows for easy resolution (no need to git rm old_path, etc.), but
does have the downside that if the content merge had conflict markers,
then what we store in the index is the sha1 of a blob with conflict
markers. While that is a downside, it seems less problematic than the
downsides of any obvious alternatives, and certainly makes more sense
than the previous handling. Further, it has a precedent in that when we
do recursive merges, we may accept a file with conflict markers as the
resolution for the merge of the merge-bases, which will then show up in
the index of the outer merge at stage 1 if a conflict exists at the outer
level.
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
merge-recursive: increase marker length with depth of recursion
Later patches in this series will modify file collision conflict
handling (e.g. from rename/add and rename/rename(2to1) conflicts) so
that multiply nested conflict markers can arise even before considering
conflicts in the virtual merge base. Including the virtual merge base
will provide a way to get triply (or higher) nested conflict markers.
This new way to get nested conflict markers will force the need for a
more general mechanism to extend the length of conflict markers in order
to differentiate between different nestings.
Along with this change to conflict marker length handling, we want to
make sure that we don't regress handling for other types of conflicts
with nested conflict markers. Add a more involved testcase using
merge.conflictstyle=diff3, where not only does the virtual merge base
contain conflicts, but its virtual merge base does as well (i.e. a case
with triply nested conflict markers). While there are multiple
reasonable ways to handle nested conflict markers in the virtual merge
base for this type of situation, the easiest approach that dovetails
well with the new needs for the file collision conflict handling is to
require that the length of the conflict markers increase with each
subsequent nesting.
Subsequent patches which change the rename/add and rename/rename(2to1)
conflict handling will modify the extra_marker_size flag appropriately
for their new needs.
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
t6036, t6042: testcases for rename collision of already conflicting files
When a single file is renamed, it can also be modified, yielding the
possibility of that renamed file having content conflicts. If two
different such files are renamed into the same location, then two-way
merging those files may result in nested conflicts. Add a testcase that
makes sure we get this case correct, and uses different lengths of
conflict markers to differentiate between the different nestings.
Also add another case with an extra (i.e. third) level of conflict
markers due to using merge.conflictstyle=diff3 and the virtual merge
base also having conflicts present.
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
t6042: add tests for consistency in file collision conflict handling
Add testcases dealing with file collisions for the following types of
conflicts:
* add/add
* rename/add
* rename/rename(2to1)
All these conflict types simplify down to two files "colliding"
and should thus be handled similarly. This means that rename/add and
rename/rename(2to1) conflicts need to be modified to behave the same as
add/add conflicts currently do: the colliding files should be two-way
merged (instead of the current behavior of writing the two colliding
files out to separate temporary unique pathnames). Add testcases which
check this; subsequent commits will fix the conflict handling to make
these tests pass.
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
built-in rebase --autostash: leave the current branch alone if possible
When we converted a `git reset --hard` call in the original Unix shell
script to built-in code, we asked to reset the worktree and the index
and explicitly *not* to detach the HEAD. By mistake, though, we still
did. Let's fix this.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
built-in rebase: demonstrate regression with --autostash
An unnamed colleague of Ævar Arnfjörð Bjarmason reported a breakage
where a `pull --rebase` (which did not really need to do anything but
stash, see that nothing was changed, and apply the stash again) also
detached the HEAD.
This patch adds a minimal reproducer for this regression.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Windows: force-recompile git.res for differing architectures
When git.rc is compiled into git.res, the result is actually dependent
on the architecture. That is, you cannot simply link a 32-bit git.res
into a 64-bit git.exe.
Therefore, to allow 32-bit and 64-bit builds in the same directory, we
let git.res depend on GIT-PREFIX so that it gets recompiled when
compiling for a different architecture (this works because the exec path
changes based on the architecture: /mingw32/libexec/git-core for 32-bit
and /mingw64/libexec/git-core for 64-bit).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we see a time like "noon", we pass "12" to our date_time() helper,
which sets the hour to 12pm. If the current time is before noon, then we
wrap around to yesterday using date_yesterday(). But unlike the normal
calls to date_yesterday() from approxidate_alpha(), we pass a NULL "num"
parameter. Since c27cc94fad (approxidate: handle pending number for
"specials", 2018-11-02), that causes a segfault.
One way to fix this is by checking for NULL. But arguably date_time() is
abusing our helper by passing NULL in the first place (and this is the
only case where one of these "special" parsers is used this way). So
instead, let's have it just do the 1-day subtraction itself. It's still
just a one-liner due to our update_tm() helper.
Note that the test added here is a little funny, as we say "10am noon",
which makes the "10am" seem pointless. But this bug can only be
triggered when it the currently-parsed hour is before the special time.
The latest special time is "tea" at 1700, but t0006 uses a hard-coded
TEST_DATE_NOW of 1900. We could reset TEST_DATE_NOW, but that may lead
to confusion in other tests. Just saying "10am noon" makes this test
self-contained.
Reported-by: Carlo Arenas <carenas@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
A git push process runs several processes during its run, but one
includes git send-pack which calls git pack-objects and passes
the known have/wants into stdin using object ids. However, the
default setting for core.warnAmbiguousRefs requires git pack-objects
to check for ref names matching the ref_rev_parse_rules array in
refs.c. This means that every object is triggering at least six
"file exists?" queries. When there are a lot of refs, this can
add up significantly! I observed a simple push spending three
seconds checking these paths.
The fix here is similar to 4c30d50 "rev-list: disable object/refname
ambiguity check with --stdin". While the get_object_list() method
reads the objects from stdin, turn warn_on_object_refname_ambiguity
flag (which is usually true) to false. Just for code hygiene, save
away the original at the beginning and restore it once we are done.
Helped-by: Jeff King <peff@peff.net> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
pull: handle --verify-signatures for unborn branch
We usually just forward the --verify-signatures option along to
git-merge, and trust it to do the right thing. However, when we are on
an unborn branch (i.e., there is no HEAD yet), we handle this case
ourselves without even calling git-merge. And in this code path, we do
not respect the verification option at all.
It may be more maintainable in the long run to call git-merge for the
unborn case. That would fix this bug, as well as prevent similar ones in
the future. But unfortunately it's not easy to do. As t5520.3
demonstrates, there are some special cases that git-merge does not
handle, like "git pull .. master:master" (by the time git-merge is
invoked, we've overwritten the unborn HEAD).
So for now let's just teach git-pull to handle this feature.
Reported-by: Felix Eckhofer <felix@eckhofer.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
merge: handle --verify-signatures for unborn branch
When git-merge sees that we are on an unborn branch (i.e., there is no
HEAD), it follows a totally separate code path than the usual merge
logic. This code path does not know about verify_signatures, and so we
fail to notice bad or missing signatures.
This has been broken since --verify-signatures was added in efed002249
(merge/pull: verify GPG signatures of commits being merged, 2013-03-31).
In an ideal world, we'd unify the flow for this case with the regular
merge logic, which would fix this bug and avoid introducing similar
ones. But because the unborn case is so different, it would be a burden
on the rest of the function to continually handle the missing HEAD. So
let's just port the verification check to this special case.
Reported-by: Felix Eckhofer <felix@eckhofer.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The logic to implement "merge --verify-signatures" is inline in
cmd_merge(), but this site misses some cases. Let's extract the logic
into a function so we can call it from more places.
We'll move it to commit.[ch], since one of the callers (git-pull) is
outside our source file. This function isn't all that general (after
all, its main function is to exit the program) but it's not worth trying
to fix that. The heavy lifting is done by check_commit_signature(), and
our purpose here is just sharing the die() logic. We'll mark it with a
comment to make that clear.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Windows port learned to use nano-second resolution file timestamps.
* js/mingw-ns-filetime:
mingw: implement nanosecond-precision file times
mingw: replace MSVCRT's fstat() with a Win32-based implementation
mingw: factor out code to set stat() data
Operations on promisor objects make sense in the context of only a
small subset of the commands that internally use the revisions
machinery, but the "--exclude-promisor-objects" option were taken
and led to nonsense results by commands like "log", to which it
didn't make much sense. This has been corrected.
* md/exclude-promisor-objects-fix:
exclude-promisor-objects: declare when option is allowed
Documentation/git-log.txt: do not show --exclude-promisor-objects
"git send-email" learned to disable SMTP authentication via the
"--smtp-auth=none" option, even when the smtp username is given
(which turns the authentication on by default).
The command line completion machinery (in contrib/) has been
updated to allow the completion script to tweak the list of options
that are reported by the parse-options machinery correctly.
* nd/completion-negation:
completion: fix __gitcomp_builtin no longer consider extra options
"git fetch" over protocol v2 into a shallow repository failed to
fetch full history behind a new tip of history that was diverged
before the cut-off point of the history that was previously fetched
shallowly.
* jt/upload-pack-v2-fix-shallow:
upload-pack: clear flags before each v2 request
upload-pack: make want_obj not global
upload-pack: make have_obj not global