When verifying a multi-pack-index, the only action that takes
significant time is checking the object offsets. For example,
to verify a multi-pack-index containing 6.2 million objects in
the Linux kernel repository takes 1.3 seconds on my machine.
99% of that time is spent looking up object offsets in each of
the packfiles and comparing them to the multi-pack-index offset.
Add a progress indicator for that section of the 'verify' verb.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'git multi-pack-index verify' command must verify the object
offsets stored in the multi-pack-index are correct. There are two
ways the offset chunk can be incorrect: the pack-int-id and the
object offset.
Replace the BUG() statement with a die() statement, now that we
may hit a bad pack-int-id during a 'verify' command on a corrupt
multi-pack-index, and it is covered by a test.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
When loading a 64-bit offset, we intend to check that off_t can store
the resulting offset. However, the condition accidentally checks the
32-bit offset to see if it is smaller than a 64-bit value. Fix it,
and this will be covered by a test in the 'git multi-pack-index verify'
command in a later commit.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The final check we make while loading a multi-pack-index is that
the packfile names are in lexicographical order. Make this error
be a die() instead.
In order to test this condition, we need multiple packfiles.
Earlier in t5319-multi-pack-index.sh, we tested the interaction with
'git repack' but this limits us to one packfile in our object dir.
Move these repack tests until after the 'verify' tests.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
When verifying if a multi-pack-index file is valid, we want the
command to fail to signal an invalid file. Previously, we wrote
an error to stderr and continued as if we had no multi-pack-index.
Now, die() instead of error().
Add tests that check corrupted headers in a few ways:
* Bad signature
* Bad file version
* Bad hash version
* Truncated hash count
* Extended hash count
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The multi-pack-index builtin writes multi-pack-index files, and
uses a 'write' verb to do so. Add a 'verify' verb that checks this
file matches the contents of the pack-indexes it replaces.
The current implementation is a no-op, but will be extended in
small increments in later commits.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
When running 'git pack-objects --local', we want to avoid packing
objects that are in an alternate. Currently, we check for these
objects using the packed_git_mru list, which excludes the pack-files
covered by a multi-pack-index.
Add a new iteration over the multi-pack-indexes to find these
copies and mark them as unwanted.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The new get_all_packs() method exposed the packfiles coverede by
a multi-pack-index. Before, the 'git cat-file --batch' and
'git count-objects' commands would skip objects in an environment
with a multi-pack-index.
Further, a reachability bitmap would be ignored if its pack-file
was covered by a multi-pack-index.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are many places in the codebase that want to iterate over
all packfiles known to Git. The purposes are wide-ranging, and
those that can take advantage of the multi-pack-index already
do. So, use get_all_packs() instead of get_packed_git() to be
sure we are iterating over all packfiles.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
If a repo contains a multi-pack-index, then the packed_git list
does not contain the packfiles that are covered by the multi-pack-index.
This is important for doing object lookups, abbreviations, and
approximating object count. However, there are many operations that
really want to iterate over all packfiles.
Create a new 'all_packs' linked list that contains this list, starting
with the packfiles in the multi-pack-index and then continuing along
the packed_git linked list.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The logic for constructing the linked list of multi-pack-indexes
in the object store is incorrect. If the local object store has
a multi-pack-index, but an alternate does not, then the list is
dropped.
Add tests that would have revealed this bug.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
When prepare_packed_git is called with the report_garbage method
initialized, we report unexpected files in the objects directory
as garbage. Stop reporting the multi-pack-index and the pack-files
it covers as garbage.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
When an object fails to decompress from a pack-file, we mark the object
as 'bad' so we can retry with a different copy of the object (if such a
copy exists).
Before now, the multi-pack-index did not update the bad objects list for
the pack-files it contains, and we did not check the bad objects list
when reading an object. Now, do both.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
A pack-file is 'local' if it is stored within the usual object
directory. If it is stored in an alternate, it is non-local.
Pack-files are stored using a 'pack_local' member in the packed_git
struct. Add a similar 'local' member to the multi_pack_index struct
and 'local' parameters to the methods that load and prepare multi-
pack-indexes.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The multi-pack-index builtin has a very simple command-line
interface. Instead of simply reporting usage, give the user a
hint to why the arguments failed.
Reported-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ds/multi-pack-index: (23 commits)
midx: clear midx on repack
packfile: skip loading index if in multi-pack-index
midx: prevent duplicate packfile loads
midx: use midx in approximate_object_count
midx: use existing midx when writing new one
midx: use midx in abbreviation calculations
midx: read objects from multi-pack-index
config: create core.multiPackIndex setting
midx: write object offsets
midx: write object id fanout chunk
midx: write object ids in a chunk
midx: sort and deduplicate objects from packfiles
midx: read pack names into array
multi-pack-index: write pack names in chunk
multi-pack-index: read packfile list
packfile: generalize pack directory list
t5319: expand test data
multi-pack-index: load into memory
midx: write header information to lockfile
multi-pack-index: add 'write' verb
...
"git cherry-pick --quit" failed to remove CHERRY_PICK_HEAD even
though we won't be in a cherry-pick session after it returns, which
has been corrected.
* nd/cherry-pick-quit-fix:
cherry-pick: fix --quit not deleting CHERRY_PICK_HEAD
When "git rebase -i" is told to squash two or more commits into
one, it labeled the log message for each commit with its number.
It correctly called the first one "1st commit", but the next one
was "commit #1", which was off-by-one. This has been corrected.
* pw/rebase-i-squash-number-fix:
rebase -i: fix numbering in squash message
Recent update to "git config" broke updating variable in a
subsection, which has been corrected.
* sb/config-write-fix:
git-config: document accidental multi-line setting in deprecated syntax
config: fix case sensitive subsection names on writing
t1300: document current behavior of setting options
* en/incl-forward-decl:
Remove forward declaration of an enum
compat/precompose_utf8.h: use more common include guard style
urlmatch.h: fix include guard
Move definition of enum branch_track from cache.h to branch.h
alloc: make allocate_alloc_state and clear_alloc_state more consistent
Add missing includes and forward declarations
After a partial clone, repeated fetches from promisor remote would
have accumulated many packfiles marked with .promisor bit without
getting them coalesced into fewer packfiles, hurting performance.
"git repack" now learned to repack them.
* jt/repack-promisor-packs:
repack: repack promisor objects if -a or -A is set
repack: refactor setup of pack-objects cmd
A test prerequisite defined by various test scripts with slightly
different semantics has been consolidated into a single copy and
made into a lazily defined one.
* wc/make-funnynames-shared-lazy-prereq:
t: factor out FUNNYNAMES as shared lazy prereq
"git pull --rebase -v" in a repository with a submodule barfed as
an intermediate process did not understand what "-v(erbose)" flag
meant, which has been fixed.
* sb/pull-rebase-submodule:
git-submodule.sh: accept verbose flag in cmd_update to be non-quiet
"git tbdiff" that lets us compare individual patches in two
iterations of a topic has been rewritten and made into a built-in
command.
* js/range-diff: (21 commits)
range-diff: use dim/bold cues to improve dual color mode
range-diff: make --dual-color the default mode
range-diff: left-pad patch numbers
completion: support `git range-diff`
range-diff: populate the man page
range-diff --dual-color: skip white-space warnings
range-diff: offer to dual-color the diffs
diff: add an internal option to dual-color diffs of diffs
color: add the meta color GIT_COLOR_REVERSE
range-diff: use color for the commit pairs
range-diff: add tests
range-diff: do not show "function names" in hunk headers
range-diff: adjust the output of the commit pairs
range-diff: suppress the diff headers
range-diff: indent the diffs just like tbdiff
range-diff: right-trim commit messages
range-diff: also show the diff between patches
range-diff: improve the order of the shown commits
range-diff: first rudimentary implementation
Introduce `range-diff` to compare iterations of a topic branch
...
The more library-ish parts of the codebase learned to work on the
in-core index-state instance that is passed in by their callers,
instead of always working on the singleton "the_index" instance.
* nd/no-the-index: (24 commits)
blame.c: remove implicit dependency on the_index
apply.c: remove implicit dependency on the_index
apply.c: make init_apply_state() take a struct repository
apply.c: pass struct apply_state to more functions
resolve-undo.c: use the right index instead of the_index
archive-*.c: use the right repository
archive.c: avoid access to the_index
grep: use the right index instead of the_index
attr: remove index from git_attr_set_direction()
entry.c: use the right index instead of the_index
submodule.c: use the right index instead of the_index
pathspec.c: use the right index instead of the_index
unpack-trees: avoid the_index in verify_absent()
unpack-trees: convert clear_ce_flags* to avoid the_index
unpack-trees: don't shadow global var the_index
unpack-trees: add a note about path invalidation
unpack-trees: remove 'extern' on function declaration
ls-files: correct index argument to get_convert_attr_ascii()
preload-index.c: use the right index instead of the_index
dir.c: remove an implicit dependency on the_index in pathspec code
...
Improve built-in facility to catch broken &&-chain in the tests.
* es/chain-lint-more:
chainlint: add test of pathological case which triggered false positive
chainlint: recognize multi-line quoted strings more robustly
chainlint: let here-doc and multi-line string commence on same line
chainlint: recognize multi-line $(...) when command cuddled with "$("
chainlint: match 'quoted' here-doc tags
chainlint: match arbitrary here-docs tags rather than hard-coded names
Among the three codepaths we use O_APPEND to open a file for
appending, one used for writing GIT_TRACE output requires O_APPEND
implementation that behaves sensibly when multiple processes are
writing to the same file. POSIX emulation used in the Windows port
has been updated to improve in this area.
The API to iterate over all objects learned to optionally list
objects in the order they appear in packfiles, which helps locality
of access if the caller accesses these objects while as objects are
enumerated.
* jk/for-each-object-iteration:
for_each_*_object: move declarations to object-store.h
cat-file: use a single strbuf for all output
cat-file: split batch "buf" into two variables
cat-file: use oidset check-and-insert
cat-file: support "unordered" output for --batch-all-objects
cat-file: rename batch_{loose,packed}_object callbacks
t1006: test cat-file --batch-all-objects with duplicates
for_each_packed_object: support iterating in pack-order
for_each_*_object: give more comprehensive docstrings
for_each_*_object: take flag arguments as enum
for_each_*_object: store flag definitions in a single location
"git verify-tag" and "git verify-commit" have been taught to use
the exit status of underlying "gpg --verify" to signal bad or
untrusted signature they found.
* jc/gpg-status:
gpg-interface: propagate exit status from gpg back to the callers
* en/t7406-fixes:
t7406: avoid using test_must_fail for commands other than git
t7406: prefer test_* helper functions to test -[feds]
t7406: avoid having git commands upstream of a pipe
t7406: simplify by using diff --name-only instead of diff --raw
t7406: fix call that was failing for the wrong reason
t2024: mark test using "checkout -p" with PERL prerequisite
Checkout with the -p switch uses the "add interactive" framework which
is written in Perl.
One test added in 8d7b558bae ("checkout & worktree: introduce
checkout.defaultRemote", 2018-06-05) didn't declare the PERL
prerequisite, breaking the test when built with NO_PERL.
Reported-by: CB Bailey <cb@hashpling.org> Signed-off-by: CB Bailey <cb@hashpling.org> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The caller of maybe_colorize_sideband() gives a counted buffer
<src, n>, but the callee checked src[] as if it were a NUL terminated
buffer. If src[] had all isspace() bytes in it, we would have made
n negative, and then
(1) made number of strncasecmp() calls to see if the remaining
bytes in src[] matched keywords, reading beyond the end of the
array (this actually happens even if n does not go negative),
and/or
(2) called strbuf_add() with negative count, most likely triggering
the "you want to use way too much memory" error due to unsigned
integer overflow.
Fix both issues by making sure we do not go beyond &src[n].
In the longer term we may want to accept size_t as parameter for
clarity (even though we know that a sideband message we are painting
typically would fit on a line on a terminal and int is sufficient).
Write it down as a NEEDSWORK comment.
Helped-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git merge --abort" etc. did not clean things up properly when
there were conflicted entries in the index in certain order that
are involved in D/F conflicts. This has been corrected.
The http-backend (used for smart-http transport) used to slurp the
whole input until EOF, without paying attention to CONTENT_LENGTH
that is supplied in the environment and instead expecting the Web
server to close the input stream. This has been fixed.
* mk/http-backend-content-length:
t5562: avoid non-portable "export FOO=bar" construct
http-backend: respect CONTENT_LENGTH for receive-pack
http-backend: respect CONTENT_LENGTH as specified by rfc3875
http-backend: cleanup writing to child process
A few atoms like %(objecttype) and %(objectsize) in the format
specifier of "for-each-ref --format=<format>" can be filled without
getting the full contents of the object, but just with the object
header. These cases have been optimized by calling
oid_object_info() API (instead of reading and inspecting the data).
* ot/ref-filter-object-info:
ref-filter: use oid_object_info() to get object
ref-filter: merge get_obj and get_object
ref-filter: initialize eaten variable
ref-filter: fill empty fields with empty values
ref-filter: add info_source to valid_atom
Noiseword "extern" has been removed from function decls in the
header files.
* nd/no-extern:
submodule.h: drop extern from function declaration
revision.h: drop extern from function declaration
repository.h: drop extern from function declaration
rerere.h: drop extern from function declaration
line-range.h: drop extern from function declaration
diff.h: remove extern from function declaration
diffcore.h: drop extern from function declaration
convert.h: drop 'extern' from function declaration
cache-tree.h: drop extern from function declaration
blame.h: drop extern on func declaration
attr.h: drop extern from function declaration
apply.h: drop extern on func declaration
The parse-options machinery learned to refrain from enclosing
placeholder string inside a "<bra" and "ket>" pair automatically
without PARSE_OPT_LITERAL_ARGHELP. Existing help text for option
arguments that are not formatted correctly have been identified and
fixed.
* rs/parse-opt-lithelp:
parse-options: automatically infer PARSE_OPT_LITERAL_ARGHELP
shortlog: correct option help for -w
send-pack: specify --force-with-lease argument help explicitly
pack-objects: specify --index-version argument help explicitly
difftool: remove angular brackets from argument help
add, update-index: fix --chmod argument help
push: use PARSE_OPT_LITERAL_ARGHELP instead of unbalanced brackets
"git fetch $there refs/heads/s" ought to fetch the tip of the
branch 's', but when "refs/heads/refs/heads/s", i.e. a branch whose
name is "refs/heads/s" exists at the same time, fetched that one
instead by mistake. This has been corrected to honor the usual
disambiguation rules for abbreviated refnames.
* jt/refspec-dwim-precedence-fix:
remote: make refspec follow the same disambiguation rule as local refs
The test performed at the receiving end of "git push" to prevent
bad objects from entering repository can be customized via
receive.fsck.* configuration variables; we now have gained a
counterpart to do the same on the "git fetch" side, with
fetch.fsck.* configuration variables.
* ab/fsck-transfer-updates:
fsck: test and document unknown fsck.<msg-id> values
fsck: add stress tests for fsck.skipList
fsck: test & document {fetch,receive}.fsck.* config fallback
fetch: implement fetch.fsck.*
transfer.fsckObjects tests: untangle confusing setup
config doc: elaborate on fetch.fsckObjects security
config doc: elaborate on what transfer.fsckObjects does
config doc: unify the description of fsck.* and receive.fsck.*
config doc: don't describe *.fetchObjects twice
receive.fsck.<msg-id> tests: remove dead code
cherry-pick: fix --quit not deleting CHERRY_PICK_HEAD
--quit is supposed to be --abort but without restoring HEAD. Leaving
CHERRY_PICK_HEAD behind could make other commands mistake that
cherry-pick is still ongoing (e.g. "git commit --amend" will refuse to
work). Clean it too.
For --abort, this job of deleting CHERRY_PICK_HEAD is on "git reset"
so we don't need to do anything else. But let's add extra checks in
--abort tests to confirm.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
rebase -i: fix SIGSEGV when 'merge <branch>' fails
If a merge command in the todo list specifies just a branch to merge
with no -C/-c argument then item->commit is NULL. This means that if
there are merge conflicts error_with_patch() is passed a NULL commit
which causes a segmentation fault when make_patch() tries to look it up.
This commit implements a minimal fix which fixes the crash and allows
the user to successfully commit a conflict resolution with 'git rebase
--continue'. It does not write .git/rebase-merge/patch,
.git/rebase-merge/stopped-sha or update REBASE_HEAD. To sensibly get the
hashes of the merge parents would require refactoring do_merge() to
extract the code that parses the merge parents into a separate function
which error_with_patch() could then use to write the parents into the
stopped-sha file. To create meaningful output make_patch() and 'git
rebase --show-current-patch' would also need to be modified to diff the
merge parent and merge base in this case.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the creation of conflicting-G from a test to the setup so that it
can be used in subsequent tests without creating the kind of implicit
dependencies that plague t3404. While we're at it simplify the
arguments to the test_commit() call the creates the conflicting commit.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a script (in contrib/) to help users of VSCode work better with
our codebase.
* js/vscode:
vscode: let cSpell work on commit messages, too
vscode: add a dictionary for cSpell
vscode: use 8-space tabs, no trailing ws, etc for Git's source code
vscode: wrap commit messages at column 72 by default
vscode: only overwrite C/C++ settings
mingw: define WIN32 explicitly
cache.h: extract enum declaration from inside a struct declaration
vscode: hard-code a couple defines
contrib: add a script to initialize VS Code configuration
It is too easy to misuse system API functions such as strcat();
these selected functions are now forbidden in this codebase and
will cause a compilation failure.
* jk/banned-function:
banned.h: mark strncpy() as banned
banned.h: mark sprintf() as banned
banned.h: mark strcat() as banned
automatically ban strcpy()
When the sparse checkout feature is in use, "git cherry-pick" and
other mergy operations lost the skip_worktree bit when a path that
is excluded from checkout requires content level merge, which is
resolved as the same as the HEAD version, without materializing the
merge result in the working tree, which made the path appear as
deleted. This has been corrected by preserving the skip_worktree
bit (and not materializing the file in the working tree).
* en/merge-recursive-skip-fix:
merge-recursive: preserve skip_worktree bit when necessary
t3507: add a testcase showing failure with sparse checkout
The wire-protocol v2 relies on the client to send "ref prefixes" to
limit the bandwidth spent on the initial ref advertisement. "git
fetch $remote branch:branch" that asks tags that point into the
history leading to the "branch" automatically followed sent to
narrow prefix and broke the tag following, which has been fixed.
* jt/tag-following-with-proto-v2-fix:
fetch: send "refs/tags/" prefix upon CLI refspecs
t5702: test fetch with multiple refspecs at a time
Code clean-up to use size_t/ssize_t when they are the right type.
* jk/size-t:
strbuf_humanise: use unsigned variables
pass st.st_size as hint for strbuf_readlink()
strbuf_readlink: use ssize_t
strbuf: use size_t for length in intermediate variables
reencode_string: use size_t for string lengths
reencode_string: use st_add/st_mult helpers
Update the way we use Coccinelle to find out-of-style code that
need to be modernised.
* sg/coccicheck-updates:
coccinelle: extract dedicated make target to clean Coccinelle's results
coccinelle: put sane filenames into output patches
coccinelle: exclude sha1dc source files from static analysis
coccinelle: use $(addsuffix) in 'coccicheck' make target
coccinelle: mark the 'coccicheck' make target as .PHONY