pack-objects: shrink delta_size field in struct object_entry
Allowing a delta size of 64 bits is crazy. Shrink this field down to
20 bits with one overflow bit.
If we find an existing delta larger than 1MB, we do not cache
delta_size at all and will get the value from oe_size(), potentially
from disk if it's larger than 4GB.
Note, since DELTA_SIZE() is used in try_delta() code, it must be
thread-safe. Luckily oe_size() does guarantee this so we it is
thread-safe.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
pack-objects: shrink size field in struct object_entry
It's very very rare that an uncompressed object is larger than 4GB
(partly because Git does not handle those large files very well to
begin with). Let's optimize it for the common case where object size
is smaller than this limit.
Shrink size field down to 31 bits and one overflow bit. If the size is
too large, we read it back from disk. As noted in the previous patch,
we need to return the delta size instead of canonical size when the
to-be-reused object entry type is a delta instead of a canonical one.
Add two compare helpers that can take advantage of the overflow
bit (e.g. if the file is 4GB+, chances are it's already larger than
core.bigFileThreshold and there's no point in comparing the actual
value).
Another note about oe_get_size_slow(). This function MUST be thread
safe because SIZE() macro is used inside try_delta() which may run in
parallel. Outside parallel code, no-contention locking should be dirt
cheap (or insignificant compared to i/o access anyway). To exercise
this code, it's best to run the test suite with something like
make test GIT_TEST_OE_SIZE=4
which forces this code on all objects larger than 3 bytes.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
pack-objects: clarify the use of object_entry::size
While this field most of the time contains the canonical object size,
there is one case it does not: when we have found that the base object
of the delta in question is also to be packed, we will very happily
reuse the delta by copying it over instead of regenerating the new
delta.
"size" in this case will record the delta size, not canonical object
size. Later on in write_reuse_object(), we reconstruct the delta
header and "size" is used for this purpose. When this happens, the
"type" field contains a delta type instead of a canonical type.
Highlight this in the code since it could be tricky to see.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
pack-objects: don't check size when the object is bad
sha1_object_info() in check_objects() may fail to locate an object in
the pack and return type OBJ_BAD. In that case, it will likely leave
the "size" field untouched. We delay error handling until later in
prepare_pack() though. Until then, do not touch "size" field.
This field should contain the default value zero, but we can't say
sha1_object_info() cannot damage it. This becomes more important later
when the object size may have to be retrieved back from the
(non-existing) pack.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
pack-objects: shrink z_delta_size field in struct object_entry
We only cache deltas when it's smaller than a certain limit. This limit
defaults to 1000 but save its compressed length in a 64-bit field.
Shrink that field down to 20 bits, so you can only cache 1MB deltas.
Larger deltas must be recomputed at when the pack is written down.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
pack-objects: refer to delta objects by index instead of pointer
These delta pointers always point to elements in the objects[] array
in packing_data struct. We can only hold maximum 4G of those objects
because the array size in nr_objects is uint32_t. We could use
uint32_t indexes to address these elements instead of pointers. On
64-bit architecture (8 bytes per pointer) this would save 4 bytes per
pointer.
Convert these delta pointers to indexes. Since we need to handle NULL
pointers as well, the index is shifted by one [1].
[1] This means we can only index 2^32-2 objects even though nr_objects
could contain 2^32-1 objects. It should not be a problem in
practice because when we grow objects[], nr_alloc would probably
blow up long before nr_objects hits the wall.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
pack-objects: move in_pack out of struct object_entry
Instead of using 8 bytes (on 64 bit arch) to store a pointer to a
pack. Use an index instead since the number of packs should be
relatively small.
This limits the number of packs we can handle to 1k. Since we can't be
sure people can never run into the situation where they have more than
1k pack files. Provide a fall back route for it.
If we find out they have too many packs, the new in_pack_by_idx[]
array (which has at most 1k elements) will not be used. Instead we
allocate in_pack[] array that holds nr_objects elements. This is
similar to how the optional in_pack_pos field is handled.
The new simple test is just to make sure the too-many-packs code path
is at least executed. The true test is running
make test GIT_TEST_FULL_IN_PACK_ARRAY=1
to take advantage of other special case tests.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
pack-objects: move in_pack_pos out of struct object_entry
This field is only need for pack-bitmap, which is an optional
feature. Move it to a separate array that is only allocated when
pack-bitmap is used (like objects[], it is not freed, since we need it
until the end of the process)
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
pack-objects: use bitfield for object_entry::depth
Because of struct packing from now on we can only handle max depth
4095 (or even lower when new booleans are added in this struct). This
should be ok since long delta chain will cause significant slow down
anyway.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
pack-objects: turn type and in_pack_type to bitfields
An extra field type_valid is added to carry the equivalent of OBJ_BAD
in the original "type" field. in_pack_type always contains a valid
type so we only need 3 bits for it.
A note about accepting OBJ_NONE as "valid" type. The function
read_object_list_from_stdin() can pass this value [1] and it
eventually calls create_object_entry() where current code skip setting
"type" field if the incoming type is zero. This does not have any bad
side effects because "type" field should be memset()'d anyway.
But since we also need to set type_valid now, skipping oe_set_type()
leaves type_valid zero/false, which will make oe_type() return
OBJ_BAD, not OBJ_NONE anymore. Apparently we do care about OBJ_NONE in
prepare_pack(). This switch from OBJ_NONE to OBJ_BAD may trigger
fatal: unable to get type of object ...
Accepting OBJ_NONE [2] does sound wrong, but this is how it is has
been for a very long time and I haven't time to dig in further.
[1] See 5c49c11686 (pack-objects: better check_object() performances -
2007-04-16)
[2] 21666f1aae (convert object type handling from a string to a number
- 2007-02-26)
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
pack-objects: a bit of document about struct object_entry
The role of this comment block becomes more important after we shuffle
fields around to shrink this struct. It will be much harder to see what
field is related to what.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Merge branch 'svn/authors-prog-2' of git://bogomips.org/git-svn
* 'svn/authors-prog-2' of git://bogomips.org/git-svn:
git-svn: allow empty email-address using authors-prog and authors-file
git-svn: search --authors-prog in PATH too
The topic appears to inflict severe regression in renaming merges,
even though the promise of it was that it would improve them.
We do not yet know which exact change in the topic was wrong, but in
the meantime, let's play it safe and revert it out of 'master'
before real Git-using projects are harmed.
* ab/doc-hash-brokenness:
doc hash-function-transition: clarify what SHAttered means
doc hash-function-transition: clarify how older gits die on NewHash
The mechanism to use parse-options API to automate the command line
completion continues to get extended and polished.
* nd/parseopt-completion-more:
completion: use __gitcomp_builtin in _git_cherry
completion: use __gitcomp_builtin in _git_ls_tree
completion: delete option-only completion commands
completion: add --option completion for most builtin commands
completion: factor out _git_xxx calling code
completion: mention the oldest version we need to support
git.c: add hidden option --list-parseopt-builtins
git.c: move cmd_struct declaration up
Code to find the length to uniquely abbreviate object names based
on packfile content, which is a relatively recent addtion, has been
optimized to use the same fan-out table.
* ds/bsearch-hash:
sha1_name: use bsearch_pack() in unique_in_pack()
sha1_name: use bsearch_pack() for abbreviations
packfile: define and use bsearch_pack()
sha1_name: convert struct min_abbrev_data to object_id
* ws/rebase-p:
rebase: remove merges_option and a blank line
rebase: remove unused code paths from git_rebase__interactive__preserve_merges
rebase: remove unused code paths from git_rebase__interactive
rebase: add and use git_rebase__interactive__preserve_merges
rebase: extract functions out of git_rebase__interactive
rebase: reindent function git_rebase__interactive
rebase: update invocation of rebase dot-sourced scripts
rebase-interactive: simplify pick_on_preserving_merges
"diff-highlight" filter (in contrib/) learned to undertand "git log
--graph" output better.
* jk/diff-highlight-graph-fix:
diff-highlight: detect --graph by indent
diff-highlight: use flush() helper consistently
diff-highlight: test graphs with --color
diff-highlight: test interleaved parallel lines of history
diff-highlight: prefer "echo" to "cat" in tests
diff-highlight: use test_tick in graph test
diff-highlight: correct test graph diagram
* nd/remove-ignore-env-field:
repository.h: add comment and clarify repo_set_gitdir
repository: delete ignore_env member
sha1_file.c: move delayed getenv(altdb) back to setup_git_env()
repository.c: delete dead functions
repository.c: move env-related setup code back to environment.c
repository: initialize the_repository in main()
"git stash push -u -- <pathspec>" gave an unnecessary and confusing
error message when there was no tracked files that match the
<pathspec>, which has been fixed.
The way "git worktree prune" worked internally has been simplified,
by assuming how "git worktree move" moves an existing worktree to a
different place.
* nd/worktree-prune:
worktree prune: improve prune logic when worktree is moved
worktree: delete dead code
gc.txt: more details about what gc does
Conversion from uchar[20] to struct object_id continues.
* bc/object-id: (36 commits)
convert: convert to struct object_id
sha1_file: introduce a constant for max header length
Convert lookup_replace_object to struct object_id
sha1_file: convert read_sha1_file to struct object_id
sha1_file: convert read_object_with_reference to object_id
tree-walk: convert tree entry functions to object_id
streaming: convert istream internals to struct object_id
tree-walk: convert get_tree_entry_follow_symlinks internals to object_id
builtin/notes: convert static functions to object_id
builtin/fmt-merge-msg: convert remaining code to object_id
sha1_file: convert sha1_object_info* to object_id
Convert remaining callers of sha1_object_info_extended to object_id
packfile: convert unpack_entry to struct object_id
sha1_file: convert retry_bad_packed_offset to struct object_id
sha1_file: convert assert_sha1_type to object_id
builtin/mktree: convert to struct object_id
streaming: convert open_istream to use struct object_id
sha1_file: convert check_sha1_signature to struct object_id
sha1_file: convert read_loose_object to use struct object_id
builtin/index-pack: convert struct ref_delta_entry to object_id
...
"git shortlog cruft" aborted with a BUG message when run outside a
Git repository. The command has been taught to complain about
extra and unwanted arguments on its command line instead in such a
case.
The build procedure learned to optionally use symbolic links
(instead of hardlinks and copies) to install "git-foo" for built-in
commands, whose binaries are all identical.
* ab/install-symlinks:
Makefile: optionally symlink libexec/git-core binaries to bin/git
Makefile: add a gitexecdir_relative variable
Makefile: fix broken bindir_relative variable
"git filter-branch" learned to use a different exit code to allow
the callers to tell the case where there was no new commits to
rewrite from other error cases.
* ml/filter-branch-no-op-error:
filter-branch: return 2 when nothing to rewrite
Git can be built to use either v1 or v2 of the PCRE library, and so
far, the build-time configuration USE_LIBPCRE=YesPlease instructed
the build procedure to use v1, but now it means v2. USE_LIBPCRE1
and USE_LIBPCRE2 can be used to explicitly choose which version to
use, as before.
* ab/pcre-v2:
Makefile: make USE_LIBPCRE=YesPlease mean v2, not v1
configure: detect redundant --with-libpcre & --with-libpcre1
configure: fix a regression in PCRE v1 detection
A "git fetch" from a repository with insane number of refs into a
repository that is already up-to-date still wasted too many cycles
making many lstat(2) calls to see if these objects at the tips
exist as loose objects locally. These lstat(2) calls are optimized
away by enumerating all loose objects beforehand.
It is unknown if the new strategy negatively affects existing use
cases, fetching into a repository with many loose objects from a
repository with small number of refs.
* ti/fetch-everything-local-optim:
fetch-pack.c: use oidset to check existence of loose object
Rename detection logic in "diff" family that is used in "merge" has
learned to guess when all of x/a, x/b and x/c have moved to z/a,
z/b and z/c, it is likely that x/d added in the meantime would also
want to move to z/d by taking the hint that the entire directory
'x' moved to 'z'. A bug causing dirty files involved in a rename
to be overwritten during merge has also been fixed as part of this
work.
* en/rename-directory-detection: (29 commits)
merge-recursive: ensure we write updates for directory-renamed file
merge-recursive: avoid spurious rename/rename conflict from dir renames
directory rename detection: new testcases showcasing a pair of bugs
merge-recursive: fix remaining directory rename + dirty overwrite cases
merge-recursive: fix overwriting dirty files involved in renames
merge-recursive: avoid clobbering untracked files with directory renames
merge-recursive: apply necessary modifications for directory renames
merge-recursive: when comparing files, don't include trees
merge-recursive: check for file level conflicts then get new name
merge-recursive: add computation of collisions due to dir rename & merging
merge-recursive: check for directory level conflicts
merge-recursive: add get_directory_renames()
merge-recursive: make a helper function for cleanup for handle_renames
merge-recursive: split out code for determining diff_filepairs
merge-recursive: make !o->detect_rename codepath more obvious
merge-recursive: fix leaks of allocated renames and diff_filepairs
merge-recursive: introduce new functions to handle rename logic
merge-recursive: move the get_renames() function
directory rename detection: tests for handling overwriting dirty files
directory rename detection: tests for handling overwriting untracked files
...
containing the SVN repository UUID. Now git-svn behaves like git-commit:
If the email is *explicitly* set to the empty string using '<>', the
commit does not contain an email address, only the name:
jondoe <>
Allowing to remove the email address *intentionally* prevents automatic
systems from sending emails to those fictional addresses and avoids
cluttering the log output with unnecessary stuff.
Signed-off-by: Andreas Heiduk <asheiduk@gmail.com> Signed-off-by: Eric Wong <e@80x24.org>
In 36db1eddf9 ("git-svn: add --authors-prog option", 2009-05-14) the path
to authors-prog was made absolute because git-svn changes the current
directory in some situations. This makes sense if the program is part of
the repository but prevents searching via $PATH.
The old behaviour is still retained, but if the file does not exists, then
authors-prog is searched for in $PATH as any other command.
Signed-off-by: Andreas Heiduk <asheiduk@gmail.com> Signed-off-by: Eric Wong <e@80x24.org>
add -p: fix 2.17.0-rc* regression due to moved code
Fix a regression in 88f6ffc1c2 ("add -p: only bind search key if
there's more than one hunk", 2018-02-13) which is present in
2.17.0-rc*, but not 2.16.0.
In Perl, regex variables like $1 always refer to the last regex
match. When the aforementioned change added a new regex match between
the old match and the corresponding code that was expecting $1, the $1
variable would always be undef, since the newly inserted regex match
doesn't have any captures.
As a result the "/" feature to search for a string in a hunk by regex
completely broke, on git.git:
$ perl -pi -e 's/Git/Tig/g' README.md
$ ./git --exec-path=$PWD add -p
[..]
Stage this hunk [y,n,q,a,d,j,J,g,/,s,e,?]? s
Split into 4 hunks.
[...]
Stage this hunk [y,n,q,a,d,j,J,g,/,s,e,?]? /Many
Use of uninitialized value $1 in string eq at /home/avar/g/git/git-add--interactive line 1568, <STDIN> line 1.
search for regex? Many
I.e. the initial "/regex" command wouldn't work, and would always emit
a warning and ask again for a regex, now it works as intended again.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
upload-pack: disable object filtering when disabled by config
When upload-pack gained partial clone support (v2.17.0-rc0~132^2~12,
2017-12-08), it was guarded by the uploadpack.allowFilter config item
to allow server operators to control when they start supporting it.
That config item didn't go far enough, though: it controls whether the
'filter' capability is advertised, but if a (custom) client ignores
the capability advertisement and passes a filter specification anyway,
the server would handle that despite allowFilter being false.
This is particularly significant if a security bug is discovered in
this new experimental partial clone code. Installations without
uploadpack.allowFilter ought not to be affected since they don't
intend to support partial clone, but they would be swept up into being
vulnerable.
Simplify and limit the attack surface by making uploadpack.allowFilter
disable the feature, not just the advertisement of it.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
credential: ignore SIGPIPE when writing to credential helpers
The credential subsystem can trigger SIGPIPE when writing to an
external helper if that helper closes its stdin before reading the
whole input. Normally this is rare, since helpers would need to read
that input to make a decision about how to respond, but:
1. It's reasonable to configure a helper which only handles "get"
while ignoring "store". Such a handler might not read stdin
for "store", thereby rapidly closing stdin upon helper exit.
2. A broken or misbehaving helper might exit immediately. That's an
error, but it's not reasonable for it to take down the parent Git
process with SIGPIPE.
Even with such a helper, seeing this problem should be rare. Getting
SIGPIPE requires the helper racily exiting before we've written the
fairly small credential output.
Signed-off-by: Erik E Brady <brady@cisco.com> Reviewed-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
submodule: check for NULL return of get_submodule_ref_store()
If we can't find a ref store for a submodule then assume the latter
is not initialized (or was removed). Print a status line accordingly
instead of causing a segmentation fault by passing NULL as the first
parameter of refs_head_ref().
Reported-by: Jeremy Feusi <jeremy@feusi.co> Reviewed-by: Stefan Beller <sbeller@google.com> Initial-Test-By: Stefan Beller <sbeller@google.com> Helped-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Hotfix for recently graduated topic that give help to completion
scripts from the Git subcommands that are being completed
* nd/parseopt-completion:
t9902: disable test on the list of merge-strategies under GETTEXT_POISON
completion: clear cached --options when sourcing the completion script
test: avoid pipes in git related commands for test
Avoid using pipes downstream of Git commands since the exit codes
of commands upstream of pipes get swallowed, thus potentially
hiding failure of those commands. Instead, capture Git command
output to a file and apply the downstream command(s) to that file.
Signed-off-by: Pratik Karki <predatoramigo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
submodule deinit: handle non existing pathspecs gracefully
This fixes a regression introduced in 2e612731b5 (submodule: port
submodule subcommand 'deinit' from shell to C, 2018-01-15), when
handling pathspecs that do not exist gracefully. This restores the
historic behavior of reporting the pathspec as unknown and returning
instead of reporting a bug.
Reported-by: Peter Oberndorfer <kumbayo84@arcor.de> Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 1ada5020b3 ("stash: use stash_push for no verb form", 2017-02-28),
when the pathspec argument was introduced in 'git stash', that was also
documented. However I forgot to remove an extra square bracket after
the '--message' argument, even though the square bracket should have
been after the pathspec argument (where it was also added).
Remove the extra square bracket after the '--message' argument, to show
that the pathspec argument should be used with the 'push' verb.
While the pathspec argument can be used without the push verb, that's a
special case described later in the man page, and removing the first extra
square bracket instead of the second one makes the synopis easier to
understand.
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
doc hash-function-transition: clarify what SHAttered means
Attempt to clarify what the SHAttered attack means in practice for
Git. The previous version of the text made no mention whatsoever of
Git already having a mitigation for this specific attack, which the
SHAttered researchers claim will detect cryptanalytic collision
attacks.
I may have gotten some of the nuances wrong, but as far as I know this
new text accurately summarizes the current situation with SHA-1 in
git. I.e. git doesn't really use SHA-1 anymore, it uses
Hardened-SHA-1 (they just so happen to produce the same outputs
99.99999999999...% of the time).
Thus the previous text was incorrect in asserting that:
[...]As a result [of SHAttered], SHA-1 cannot be considered
cryptographically secure any more[...]
That's not the case. We have a mitigation against SHAttered, *however*
we consider it prudent to move to work towards a NewHash should future
vulnerabilities in either SHA-1 or Hardened-SHA-1 emerge.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
doc hash-function-transition: clarify how older gits die on NewHash
Change the "Repository format extension" to accurately describe what
happens with different versions of Git when they encounter NewHash
repositories, instead of only saying what happens with versions v2.7.0
and later.
See ab9cb76f66 ("Repository format version check.", 2005-11-25) and 00a09d57eb ("introduce "extensions" form of
core.repositoryformatversion", 2015-06-23) for the relevant changes to
the setup code where these variables are checked.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 11395a3b4b (test_must_be_empty: make sure the file exists, not
just empty, 2018-02-27) basically duplicated the 'test_path_is_file'
helper function in 'test_must_be_empty'.
Just call 'test_path_is_file' to avoid this code duplication.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
One of the most interesting thing one can be interested in when
looking at performance test results is possible performance
regressions.
This new option makes it easy to spot such possible regressions.
This new option is named '--sort-by=regression' to make it
possible and easy to add other ways to sort the results, like for
example '--sort-by=utime'.
If we would like to sort according to how much the stime regressed
we could also add a new option called '--sort-by=regression:stime'.
Then '--sort-by=regression' could become a synonym for
'--sort-by=regression:rtime'.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>