The is_descendant_of method previously used in_merge_bases() to check if
the commit can reach any of the commits in the provided list. This had
two performance problems:
1. The performance is quadratic in worst-case.
2. A single in_merge_bases() call requires walking beyond the target
commit in order to find the full set of boundary commits that may be
merge-bases.
The can_all_from_reach method avoids this quadratic behavior and can
limit the search beyond the target commits using generation numbers. It
requires a small prototype adjustment to stop using commit-date as a
cutoff, as that optimization is no longer appropriate here.
Since in_merge_bases() uses paint_down_to_common(), is_descendant_of()
naturally found cutoffs to avoid walking the entire commit graph. Since
we want to always return the correct result, we cannot use the
min_commit_date cutoff in can_all_from_reach. We then rely on generation
numbers to provide the cutoff.
Since not all repos will have a commit-graph file, nor will we always
have generation numbers computed for a commit-graph file, create a new
method, generation_numbers_enabled(), that checks for a commit-graph
file and sees if the first commit in the file has a non-zero generation
number. In the case that we do not have generation numbers, use the old
logic for is_descendant_of().
Performance was meausured on a copy of the Linux repository using the
'test-tool reach is_descendant_of' command using this input:
Note that this input is tailored to demonstrate the quadratic nature of
the previous method, as it will compute merge-bases for v4.9 versus all
of the later versions before checking against v4.1.
Before: 0.26 s
After: 0.21 s
Since we previously used the is_descendant_of method in the ref_newer
method, we also measured performance there using
'test-tool reach ref_newer' with this input:
A:v4.9
B:v3.19
Before: 0.10 s
After: 0.08 s
By adding a new commit with parent v3.19, we test the non-reachable case
of ref_newer:
Before: 0.09 s
After: 0.08 s
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The can_all_from_reach_with_flags() algorithm is currently quadratic in
the worst case, because it calls the reachable() method for every 'from'
without tracking which commits have already been walked or which can
already reach a commit in 'to'.
Rewrite the algorithm to walk each commit a constant number of times.
We also add some optimizations that should work for the main consumer of
this method: fetch negotitation (haves/wants).
The first step includes using a depth-first-search (DFS) from each
'from' commit, sorted by ascending generation number. We do not walk
beyond the minimum generation number or the minimum commit date. This
DFS is likely to be faster than the existing reachable() method because
we expect previous ref values to be along the first-parent history.
If we find a target commit, then we mark everything in the DFS stack as
a RESULT. This expands the set of targets for the other 'from' commits.
We also mark the visited commits using 'assign_flag' to prevent re-
walking the same commits.
We still need to clear our flags at the end, which is why we will have a
total of three visits to each commit.
Performance was measured on the Linux repository using
'test-tool reach can_all_from_reach'. The input included rows seeded by
tag values. The "small" case included X-rows as v4.[0-9]* and Y-rows as
v3.[0-9]*. This mimics a (very large) fetch that says "I have all major
v3 releases and want all major v4 releases." The "large" case included
X-rows as "v4.*" and Y-rows as "v3.*". This adds all release-candidate
tags to the set, which does not greatly increase the number of objects
that are considered, but does increase the number of 'from' commits,
demonstrating the quadratic nature of the previous code.
Small Case:
Before: 1.52 s
After: 0.26 s
Large Case:
Before: 3.50 s
After: 0.27 s
Note how the time increases between the two cases in the two versions.
The new code increases relative to the number of commits that need to be
walked, but not directly relative to the number of 'from' commits.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The ref_newer method is used by 'git push' to check if a force-push is
required. This method does not use any kind of cutoff when walking, so
in the case of a force-push will walk all reachable commits.
The is_descendant_of method already uses paint_down_to_common along with
cutoffs. By translating the ref_newer arguments into the commit and
commit_list required by is_descendant_of, we can have one fewer commit
walk and also improve our performance!
For a copy of the Linux repository, 'test-tool reach ref_newer' presents
the following improvements with the specified input. In the case that
ref_newer returns 1, there is no improvement. The improvement is in the
second case where ref_newer returns 0.
Input:
A:v4.9
B:v3.19
Before: 0.09 s
After: 0.09 s
To test the negative case, add a new commit with parent v3.19,
regenerate the commit-graph, and then run with B pointing at that
commit.
Before: 0.43 s
After: 0.09 s
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The commit_contains method has two modes which depend on the given
ref_filter struct. We have the "normal" algorithm (which is also the
typically-slow operation) and the "tag" algorithm. This difference is
essentially what changes performance for 'git branch --contains' versus
'git tag --contains'. There are thoughts that the data shapes used by
these two applications justify the different implementations.
Create tests using 'test-tool reach commit_contains [--tag]' to cover
both methods.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The can_all_from_reach_with_flags method is used by ok_to_give_up in
upload-pack.c to see if we have done enough negotiation during a fetch.
This method is intentionally created to preserve state between calls to
assist with stateful negotiation, such as over SSH.
To make this method testable, add a new can_all_from_reach method that
does the initial setup and final tear-down. We will later use this
method in production code. Call the method from 'test-tool reach' for
now.
Since this is a many-to-many reachability query, add a new type of input
to the 'test-tool reach' input format. Lines "Y:<committish>" create a
list of commits to be the reachability targets from the commits in the
'X' list. In the context of fetch negotiation, the 'X' commits are the
'want' commits and the 'Y' commits are the 'have' commits.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The get_merge_bases_many method returns a list of merge bases for a
single commit (A) against a list of commits (X). Some care is needed in
constructing the expected behavior because the result is not the
expected merge-base for an octopus merge with those parents but instead
the set of maximal commits that are reachable from A and at least one of
the commits in X.
Add get_merge_bases_many to 'test-tool reach' and create a test that
demonstrates that this output returns multiple results. Specifically, we
select a list of three commits such that we output two commits that are
reachable from one of the first two, respectively, and none are
reachable from the third.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The is_descendant_of method takes a single commit as its first parameter
and a list of commits as its second parameter. Extend the input of the
'test-tool reach' command to take multiple lines of the form
"X:<committish>" to construct a list of commits. Pass these to
is_descendant_of and create tests that check each result.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
As we prepare to change the behavior of the algorithms in
commit-reach.c, create a new test-tool subcommand 'reach' to test these
methods on interesting commit-graph shapes.
To use the new test-tool, use 'test-tool reach <method>' and provide
input to stdin that describes the inputs to the method. Currently, we
only implement the ref_newer method, which requires two commits. Use
lines "A:<committish>" and "B:<committish>" for the two inputs. We will
expand this input later to accommodate methods that take lists of
commits.
The test t6600-test-reach.sh creates a repo whose commits form a
two-dimensional grid. This grid makes it easy for us to determine
reachability because commit-A-B can reach commit-X-Y if and only if A is
at least X and B is at least Y. This helps create interesting test cases
for each result of the methods in commit-reach.c.
We test all methods in three different states of the commit-graph file:
Non-existent (no generation numbers), fully computed, and mixed (some
commits have generation numbers and others do not).
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are several commit walks in the codebase. Group them together into
a new commit-reach.c file and corresponding header. After we group these
walks into one place, we can reduce duplicate logic by calling
equivalent methods.
The can_all_from_reach_with_flags method is used in a stateful way by
upload-pack.c. The parameters are very flexible, so we will be able to
use its commit walking logic for many other callers.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The ok_to_give_up() method uses the commit date as a cutoff to avoid
walking the entire reachble set of commits. Before moving the
reachable() method to commit-reach.c, pull out the dependence on the
global constant 'oldest_have' with a 'min_commit_date' parameter.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
In anticipation of consolidating all commit reachability algorithms,
refactor ok_to_give_up() in order to allow splitting its logic into
an external method.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
In anticipation of moving the reachable() method to commit-reach.c,
modify the prototype to be more generic to flags known outside of
upload-pack.c. Also rename 'want' to 'from' to make the statement
more clear outside of the context of haves/wants negotiation.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
commit-reach: move commit_contains from ref-filter
There are several commit walks in the codebase. Group them together into
a new commit-reach.c file and corresponding header. After we group these
walks into one place, we can reduce duplicate logic by calling
equivalent methods.
All methods are direct moves, except we also make the commit_contains()
method public so its consumers in ref-filter.c can still call it. We can
also test this method in a test-tool in a later commit.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are several commit walks in the codebase. Group them together into
a new commit-reach.c file and corresponding header. After we group these
walks into one place, we can reduce duplicate logic by calling
equivalent methods.
The ref_newer() method is used by 'git push -f' to check if a force-push
is necessary. By making the method public, we make it possible to test
the method directly without setting up an envieronment where a 'git
push' call makes sense.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
These methods are now declared in commit-reach.h. Remove them from
commit.h and add new include statements in all files that require these
declarations.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are several commit walks in the codebase. Group them together into
a new commit-reach.c file and corresponding header. After we group these
walks into one place, we can reduce duplicate logic by calling
equivalent methods.
The method declarations in commit.h are not touched by this commit and
will be moved in a following commit. Many consumers need to point to
commit-reach.h and that would bloat this commit.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach clone to send a list of ref-prefixes, when using protocol v2, to
allow the server to filter out irrelevant references from the
ref-advertisement. This reduces wasted time and bandwidth when cloning
repositories with a larger number of references.
Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The multi-pack-index, when present, tracks the existence of objects and
their offsets within a list of packfiles. This allows us to use the
multi-pack-index for object lookups, abbreviations, and object counts.
When the multi-pack-index tracks a packfile, then we do not need to add
that packfile to the packed_git linked list or the MRU list.
We still need to load the packfiles that are not tracked by the
multi-pack-index.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Due to how Windows handles replacing a lockfile when there is an open
handle, create the close_midx() method to close the existing midx before
writing the new one.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The core.multiPackIndex config setting controls the multi-pack-
index (MIDX) feature. If false, the setting will disable all reads
from the multi-pack-index file.
Read this config setting in the new prepare_multi_pack_index_one()
which is called during prepare_packed_git(). This check is run once
per repository.
Add comparison commands in t5319-multi-pack-index.sh to check
typical Git behavior remains the same as the config setting is turned
on and off. This currently includes 'git rev-list' and 'git log'
commands to trigger several object database reads. Currently, these
would only catch an error in the prepare_multi_pack_index_one(), but
with later commits will catch errors in object lookups, abbreviations,
and approximate object counts.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The final pair of chunks for the multi-pack-index file stores the object
offsets. We default to using 32-bit offsets as in the pack-index version
1 format, but if there exists an offset larger than 32-bits, we use a
trick similar to the pack-index version 2 format by storing all offsets
at least 2^31 in a 64-bit table; we use the 32-bit table to point into
that 64-bit table as necessary.
We only store these 64-bit offsets if necessary, so create a test that
manipulates a version 2 pack-index to fake a large offset. This allows
us to test that the large offset table is created, but the data does not
match the actual packfile offsets. The multi-pack-index offset does match
the (corrupted) pack-index offset, so a future feature will compare these
offsets during a 'verify' step.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Before writing a list of objects and their offsets to a multi-pack-index,
we need to collect the list of objects contained in the packfiles. There
may be multiple copies of some objects, so this list must be deduplicated.
It is possible to artificially get into a state where there are many
duplicate copies of objects. That can create high memory pressure if we
are to create a list of all objects before de-duplication. To reduce
this memory pressure without a significant performance drop,
automatically group objects by the first byte of their object id. Use
the IDX fanout tables to group the data, copy to a local array, then
sort.
Copy only the de-duplicated entries. Select the duplicate based on the
most-recent modified time of a packfile containing the object.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The multi-pack-index needs to track which packfiles it indexes. Store
these in our first required chunk. Since filenames are not well
structured, add padding to keep good alignment in later chunks.
Modify the 'git multi-pack-index read' subcommand to output the
existence of the pack-file name chunk. Modify t5319-multi-pack-index.sh
to reflect this new output and the new expected number of chunks.
Defense in depth: A pattern we are using in the multi-pack-index feature
is to verify the data as we write it. We want to ensure we never write
invalid data to the multi-pack-index. There are many checks that verify
that the values we are writing fit the format definitions. This mainly
helps developers while working on the feature, but it can also identify
issues that only appear when dealing with very large data sets. These
large sets are hard to encode into test cases.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
When constructing a multi-pack-index file for a given object directory,
read the files within the enclosed pack directory and find matches that
end with ".idx" and find the correct paired packfile using
add_packed_git().
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
As we build the multi-pack-index file format, we want to test the format
on real repositories. Add tests that create repository data including
multiple packfiles with both version 1 and version 2 formats.
The current 'git multi-pack-index write' command will always write the
same file with no "real" data. This will be expanded in future commits,
along with the test expectations.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Create a new multi_pack_index struct for loading multi-pack-indexes into
memory. Create a test-tool builtin for reading basic information about
that multi-pack-index to verify the correct data is written.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
As we begin writing the multi-pack-index format to disk, start with
the basics: the 12-byte header and the 20-byte checksum footer. Start
with these basics so we can add the rest of the format in small
increments.
As we implement the format, we will use a technique to check that our
computed offsets within the multi-pack-index file match what we are
actually writing. Each method that writes to the hashfile will return
the number of bytes written, and we will track that those values match
our expectations.
Currently, write_midx_header() returns 12, but is not checked. We will
check the return value in a later commit.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
In anticipation of writing multi-pack-indexes, add a skeleton
'git multi-pack-index write' subcommand and send the options to a
write_midx_file() method. Also create a skeleton test script that
tests the 'write' subcommand.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
This new 'git multi-pack-index' builtin will be the plumbing access
for writing, reading, and checking multi-pack-index files. The
initial implementation is a no-op.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
t9300: wait for background fast-import process to die after killing it
The five new tests added to 't9300-fast-import.sh' in 30e215a65c
(fast-import: checkpoint: dump branches/tags/marks even if
object_count==0, 2017-09-28), all with the prefix "V:" in their test
description, run 'git fast-import' in the background and then 'kill'
it as part of a 'test_when_finished' cleanup command. When this test
script is executed with Bash, some or even all of these tests tend to
pollute the test script's stderr, and messages about terminated
processes end up on the terminal:
$ bash ./t9300-fast-import.sh
<... snip ...>
ok 179 - V: checkpoint helper does not get stuck with extra output
/<...>/test-lib-functions.sh: line 388: 28383 Terminated git fast-import $options 0<&8 1>&9
ok 180 - V: checkpoint updates refs after reset
./t9300-fast-import.sh: line 3210: 28401 Terminated git fast-import $options 0<&8 1>&9
ok 181 - V: checkpoint updates refs and marks after commit
ok 182 - V: checkpoint updates refs and marks after commit (no new objects)
./test-lib.sh: line 634: line 3250: 28485 Terminated git fast-import $options 0<&8 1>&9
ok 183 - V: checkpoint updates tags after tag
./t9300-fast-import.sh: line 3264: 28510 Terminated git fast-import $options 0<&8 1>&9
After a background child process terminates, its parent Bash process
always outputs a message like those above to stderr, even when in
non-interactive mode.
But how do some of these messages end up on the test script's stderr,
why don't we get them from all five tests, and why do they come from
different file/line locations? Well, after sending the TERM signal to
the background child process, it takes a little while until that
process receives the signal and terminates, and then it takes another
while until the parent process notices it. During this time the
parent Bash process is continuing execution, and by the time it
notices that its child terminated it might have already left
'test_eval_inner_' and its stderr is not redirected to /dev/null
anymore. That's why such a message can appear on the test script's
stderr, while other times, when the child terminates fast and/or the
parent shell is slow enough, the message ends up in /dev/null, just
like any other output of the test does. Bash always adds the file
name and line number of the code location it was about to execute when
it notices the termination of its child process as a prefix to that
message, hence the varying and sometimes totally unrelated location
prefixes in those messages (e.g. line 388 in 'test-lib-functions.sh'
is 'test_verify_prereq', and I saw such a message pointing to
'say_color' as well).
Prevent these messages from appearing on the test script's stderr by
'wait'-ing on the pid of the background 'git fast-import' process
after sending it the TERM signal. This ensures that the executing
shell's stderr is still redirected when the shell notices the
termination of its child process in the background, and that these
messages get a consistent file/line location prefix.
Note that this is not an issue when the test script is run with Bash
and '-v', because then these messages are supposed to go to the test
script's stderr anyway, and indeed all of them do; though the
sometimes seemingly random file/line prefixes could be confusing
still. Similarly, it's not an issue with Bash and '--verbose-log'
either, because then all messages go to the log file as they should.
Finally, it's not an issue with some other shells (I tried dash, ksh,
ksh93 and mksh) even without any of the verbose options, because they
don't print messages like these in non-interactive mode in the first
place.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
gpg-interface t: extend the existing GPG tests with GPGSM
Add test cases to cover the new X509/gpgsm support. Most of them
resemble existing ones. They just switch the format to x509 and set the
signingkey when creating signatures. Validation of signatures does not
need any configuration of git, it does need gpgsm to be configured to
trust the key(-chain).
Several of the testcases build on top of existing gpg testcases.
The commit ships a self-signed key for committer@example.com and
configures gpgsm to trust it.
Signed-off-by: Henning Schild <henning.schild@siemens.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
xdiff/xhistogram: move index allocation into find_lcs
This fixes a memory issue when recursing a lot, which can be reproduced as
seq 1 100000 >one
seq 1 4 100000 >two
git diff --no-index --histogram one two
Before this patch, histogram_diff would call itself recursively before
calling free_index, which would mean a lot of memory is allocated during
the recursion and only freed afterwards. By moving the memory allocation
(and its free call) into find_lcs, the memory is free'd before we recurse,
such that memory is reused in the next step of the recursion instead of
using new memory.
This addresses only the memory pressure, not the run time complexity,
that is also awful for the corner case outlined above.
Helpful in understanding the code (in addition to the sparse history of
this file), was https://stackoverflow.com/a/32367597 which reproduces
most of the code comments of the JGit implementation.
Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
xdiff/xhistogram: pass arguments directly to fall_back_to_classic_diff
By passing the 'xpp' and 'env' argument directly to the function
'fall_back_to_classic_diff', we eliminate an occurrence of the 'index'
in histogram_diff, which will prove useful in a bit.
While at it, move it up in the file. This will make the diff of
one of the next patches more legible.
Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
diff.c: add white space mode to move detection that allows indent changes
The option of --color-moved has proven to be useful as observed on the
mailing list. However when refactoring sometimes the indentation changes,
for example when partitioning a functions into smaller helper functions
the code usually mostly moved around except for a decrease in indentation.
To just review the moved code ignoring the change in indentation, a mode
to ignore spaces in the move detection as implemented in a previous patch
would be enough. However the whole move coloring as motivated in commit 2e2d5ac (diff.c: color moved lines differently, 2017-06-30), brought
up the notion of the reviewer being able to trust the move of a "block".
As there are languages such as python, which depend on proper relative
indentation for the control flow of the program, ignoring any white space
change in a block would not uphold the promises of 2e2d5ac that allows
reviewers to pay less attention to the inside of a block, as inside
the reviewer wants to assume the same program flow.
This new mode of white space ignorance will take this into account and will
only allow the same white space changes per line in each block. This patch
even allows only for the same change at the beginning of the lines.
As this is a white space mode, it is made exclusive to other white space
modes in the move detection.
This patch brings some challenges, related to the detection of blocks.
We need a wide net to catch the possible moved lines, but then need to
narrow down to check if the blocks are still intact. Consider this
example (ignoring block sizes):
- A
- B
- C
+ A
+ B
+ C
At the beginning of a block when checking if there is a counterpart
for A, we have to ignore all space changes. However at the following
lines we have to check if the indent change stayed the same.
Checking if the indentation change did stay the same, is done by computing
the indentation change by the difference in line length, and then assume
the change is only in the beginning of the longer line, the common tail
is the same. That is why the test contains lines like:
- <TAB> A
...
+ A <TAB>
...
As the first line starting a block is caught using a compare function that
ignores white spaces unlike the rest of the block, where the white space
delta is taken into account for the comparison, we also have to think about
the following situation:
- A
- B
- A
- B
+ A
+ B
+ A
+ B
When checking if the first A (both in the + and - lines) is a start of
a block, we have to check all 'A' and record all the white space deltas
such that we can find the example above to be just one block that is
indented.
Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
We can already disable replace refs using a command line
option or environment variable, but those are awkward to
apply universally. Let's add a config option to do the same
thing.
That raises the question of why one might want to do so
universally. The answer is that replace refs violate the
immutability of objects. For instance, if you wanted to
cache the diff between commit XYZ and its parent, then in
theory that never changes; the hash XYZ represents the total
state. But replace refs violate that; pushing up a new ref
may create a completely new diff.
The obvious "if it hurts, don't do it" answer is not to
create replace refs if you're doing this kind of caching.
But for a site hosting arbitrary repositories, they may want
to allow users to share replace refs with each other, but
not actually respect them on the site (because the caching
is more important than the replace feature).
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
This was added as a NEEDSWORK in c3c36d7de2 (replace-object:
check_replace_refs is safe in multi repo environment, 2018-04-11),
waiting for a calmer period. Since doing so now doesn't conflict
with anything in 'pu', it seems as good a time as any.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit afc711b8e1 (rename read_replace_refs to
check_replace_refs, 2014-02-18) added a comment mentioning
that check_replace_refs is set in two ways:
- from user intent via --no-replace-objects, etc
- after seeing there are no replace refs to respect
Since c3c36d7de2 (replace-object: check_replace_refs is safe
in multi repo environment, 2018-04-11) the second is no
longer true. Let's drop that part of the comment.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* en/rebase-i-microfixes:
git-rebase--merge: modernize "git-$cmd" to "git $cmd"
Fix use of strategy options with interactive rebases
t3418: add testcase showing problems with rebase -i and strategy options
"git filter-branch" when used with the "--state-branch" option
still attempted to rewrite the commits whose filtered result is
known from the previous attempt (which is recorded on the state
branch); the command has been corrected not to waste cycles doing
so.
* mb/filter-branch-optim:
filter-branch: skip commits present on --state-branch
Tighten the API to make it harder to misuse in-tree .gitmodules
file, even though it shares the same syntax with configuration
files, to read random configuration items from it.
* ao/config-from-gitmodules:
submodule-config: reuse config_from_gitmodules in repo_read_gitmodules
submodule-config: pass repository as argument to config_from_gitmodules
submodule-config: make 'config_from_gitmodules' private
submodule-config: add helper to get 'update-clone' config from .gitmodules
submodule-config: add helper function to get 'fetch' config from .gitmodules
config: move config_from_gitmodules to submodule-config.c
The "-l" option in "git branch -l" is an unfortunate short-hand for
"--create-reflog", but many users, both old and new, somehow expect
it to be something else, perhaps "--list". This step warns when "-l"
is used as a short-hand for "--create-reflog" and warns about the
future repurposing of the it when it is used.
* jk/branch-l-0-deprecation:
branch: deprecate "-l" option
t: switch "branch -l" to "branch --create-reflog"
t3200: unset core.logallrefupdates when testing reflog creation
"git grep" learned the "--column" option that gives not just the
line number but the column number of the hit.
* tb/grep-column:
contrib/git-jump/git-jump: jump to exact location
grep.c: add configuration variables to show matched option
builtin/grep.c: add '--column' option to 'git-grep(1)'
grep.c: display column number of first match
grep.[ch]: extend grep_opt to allow showing matched column
grep.c: expose {,inverted} match column in match_line()
Documentation/config.txt: camel-case lineNumber for consistency
"git submodule" did not correctly adjust core.worktree setting that
indicates whether/where a submodule repository has its associated
working tree across various state transitions, which has been
corrected.
* sb/submodule-core-worktree:
submodule deinit: unset core.worktree
submodule: ensure core.worktree is set after update
submodule: unset core.worktree if no working tree is present
gpg-interface: do not hardcode the key string len anymore
gnupg does print the keyid followed by a space and the signer comes
next. The same pattern is also used in gpgsm, but there the key length
would be 40 instead of 16. Instead of hardcoding the expected length,
find the first space and calculate it.
Input that does not match the expected format will be ignored now,
before we jumped to found+17 which might have been behind the end of an
unexpected string.
Signed-off-by: Henning Schild <henning.schild@siemens.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
gpg-interface: introduce an abstraction for multiple gpg formats
Create a struct that holds the format details for the supported formats.
At the moment that is still just "openpgp". This commit prepares for the
introduction of more formats, that might use other programs and match
other signatures.
Signed-off-by: Henning Schild <henning.schild@siemens.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a struct repository argument to the functions in commit-graph.h that
read the commit graph. (This commit does not affect functions that write
commit graphs.)
Because the commit graph functions can now read the commit graph of any
repository, the global variable core_commit_graph has been removed.
Instead, the config option core.commitGraph is now read on the first
time in a repository that a commit is attempted to be parsed using its
commit graph.
This commit includes a test that exercises the functionality on an
arbitrary repository that is not the_repository.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of storing commit graphs in static variables, store them in
struct object_store. There are no changes to the signatures of existing
functions - they all still only support the_repository, and support for
other instances of struct repository will be added in a subsequent
commit.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Two functions in the code (1) check if the repository is configured for
commit graphs, (2) call prepare_commit_graph(), and (3) check if the
graph exists. Move (1) and (3) into prepare_commit_graph(), reducing
duplication of code.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use oid_object_info_extended() to get object info instead of
read_object_file().
It will help to handle some requests faster (e.g., we do not need to
parse whole object if we need to know %(objectsize)).
It could also help us to add new atoms such as %(objectsize:disk)
and %(deltabase).
Signed-off-by: Olga Telezhnaia <olyatelezhnaya@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Atoms like "align" or "end" do not have string representation.
Earlier we had to go and parse whole object with a hope that we
could fill their string representations. It's easier to fill them
with an empty string before we start to work with whole object.
It is important to mention that we fill only these atoms that must
contain nothing. So, if we could not fill the atom because, for example,
the object is missing, we leave it with NULL.
Signed-off-by: Olga Telezhnaia <olyatelezhnaya@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add the source of object data to prevent parsing of unneeded data.
The goal is to improve performance by avoiding calling expensive
functions when we don't need the information they provide
or when we could get it by using a cheaper function.
It is stored in valid_atoms because it depends on the atoms we are
interested in.
Signed-off-by: Olga Telezhnaia <olyatelezhnaya@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
gpg-interface: add new config to select how to sign a commit
Add "gpg.format" where the user can specify which type of signature to
use for commits. At the moment only "openpgp" is supported and the value is
not even used. This commit prepares for a new types of signatures.
Signed-off-by: Henning Schild <henning.schild@siemens.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
diff.c: factor advance_or_nullify out of mark_color_as_moved
This moves the part of code that checks if we're still in a block
into its own function. We'll need a different approach on advancing
the blocks in a later patch, so having it as a separate function will
prove useful.
While at it rename the variable `p` to `prev` to indicate that it refers
to the previous line. This is as pmb[i] was assigned in the last iteration
of the outmost for loop.
Further rename `pnext` to `cur` to indicate that this should match up with
the current line of the outmost for loop.
Also replace the advancement of pmb[i] to reuse `cur` instead of
using `p->next` (which is how the name for pnext could be explained.
Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
diff.c: decouple white space treatment from move detection algorithm
In the original implementation of the move detection logic the choice for
ignoring white space changes is the same for the move detection as it is
for the regular diff. Some cases came up where different treatment would
have been nice.
Allow the user to specify that white space should be ignored differently
during detection of moved lines than during generation of added and removed
lines. This is done by providing analogs to the --ignore-space-at-eol,
-b, and -w options by introducing the option --color-moved-ws=<modes>
with the modes named "ignore-space-at-eol", "ignore-space-change" and
"ignore-all-space", which is used only during the move detection phase.
As we change the default, we'll adjust the tests.
For now we do not infer any options to treat white spaces in the move
detection from the generic white space options given to diff.
This can be tuned later to reasonable default.
As we plan on adding more white space related options in a later patch,
that interferes with the current white space options, use a flag field
and clamp it down to XDF_WHITESPACE_FLAGS, as that (a) allows to easily
check at parse time if we give invalid combinations and (b) can reuse
parts of this patch.
By having the white space treatment in its own option, we'll also
make it easier for a later patch to have an config option for
spaces in the move detection.
Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
diff.c: add a blocks mode for moved code detection
The new "blocks" mode provides a middle ground between plain and zebra.
It is as intuitive (few colors) as plain, but still has the requirement
for a minimum of lines/characters to count a block as moved.
Suggested-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
(https://public-inbox.org/git/87o9j0uljo.fsf@evledraar.gmail.com/) Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
diff.c: do not pass diff options as keydata to hashmap
When we initialize the hashmap, we give it a pointer to the
diff_options, which it then passes along to each call of the
hashmap_cmp_fn function. There's no need to pass it a second time as
the "keydata" parameter, and our comparison functions never look at
keydata.
This was a mistake left over from an earlier round of 2e2d5ac184
(diff.c: color moved lines differently, 2017-06-30), before hashmap
learned to pass the data pointer for us.
Explanation-by: Jeff King <peff@peff.net> Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>