gc --auto: exclude base pack if not enough mem to "repack -ad"
pack-objects could be a big memory hog especially on large repos,
everybody knows that. The suggestion to stick a .keep file on the
giant base pack to avoid this problem is also known for a long time.
Recent patches add an option to do just this, but it has to be either
configured or activated manually. This patch lets `git gc --auto`
activate this mode automatically when it thinks `repack -ad` will use
a lot of memory and start affecting the system due to swapping or
flushing OS cache.
gc --auto decides to do this based on an estimation of pack-objects
memory usage, which is quite accurate at least for the heap part, and
whether that fits in half of system memory (the assumption here is for
desktop environment where there are many other applications running).
This mechanism only kicks in if gc.bigBasePackThreshold is not configured.
If it is, it is assumed that the user already knows what they want.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
This config allows us to keep <N> packs back if their size is larger
than a limit. But if this N >= gc.autoPackLimit, we may have a
problem. We are supposed to reduce the number of packs after a
threshold because it affects performance.
We could tell the user that they have incompatible gc.bigPackThreshold
and gc.autoPackLimit, but it's kinda hard when 'git gc --auto' runs in
background. Instead let's fall back to the next best stategy: try to
reduce the number of packs anyway, but keep the base pack out. This
reduces the number of packs to two and hopefully won't take up too
much resources to repack (the assumption still is the base pack takes
most resources to handle).
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The --keep-largest-pack option is not very convenient to use because
you need to tell gc to do this explicitly (and probably on just a few
large repos).
Add a config key that enables this mode when packs larger than a limit
are found. Note that there's a slight behavior difference compared to
--keep-largest-pack: all packs larger than the threshold are kept, not
just the largest one.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
This adds a new repack mode that combines everything into a secondary
pack, leaving the largest pack alone.
This could help reduce memory pressure. On linux-2.6.git, valgrind
massif reports 1.6GB heap in "pack all" case, and 535MB in "pack
all except the base pack" case. We save roughly 1GB memory by
excluding the base pack.
This should also lower I/O because we don't have to rewrite a giant
pack every time (e.g. for linux-2.6.git that's a 1.4GB pack file)..
PS. The use of string_list here seems overkill, but we'll need it in
the next patch...
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
We allow to keep existing packs by having companion .keep files. This
is helpful when a pack is permanently kept. In the next patch, git-gc
just wants to keep a pack temporarily, for one pack-objects
run. git-gc can use --keep-pack for this use case.
A note about why the pack_keep field cannot be reused and
pack_keep_in_core has to be added. This is about the case when
--keep-pack is specified together with either --keep-unreachable or
--unpack-unreachable, but --honor-pack-keep is NOT specified.
In this case, we want to exclude objects from the packs specified on
command line, not from ones with .keep files. If only one bit flag is
used, we have to clear pack_keep on pack files with the .keep file.
But we can't make any assumption about unreachable objects in .keep
packs. If "pack_keep" field is false for .keep packs, we could
potentially pull lots of unreachable objects into the new pack, or
unpack them loose. The safer approach is ignore all packs with either
.keep file or --keep-pack.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Merge branch 'svn/authors-prog-2' of git://bogomips.org/git-svn
* 'svn/authors-prog-2' of git://bogomips.org/git-svn:
git-svn: allow empty email-address using authors-prog and authors-file
git-svn: search --authors-prog in PATH too
The topic appears to inflict severe regression in renaming merges,
even though the promise of it was that it would improve them.
We do not yet know which exact change in the topic was wrong, but in
the meantime, let's play it safe and revert it out of 'master'
before real Git-using projects are harmed.
* ab/doc-hash-brokenness:
doc hash-function-transition: clarify what SHAttered means
doc hash-function-transition: clarify how older gits die on NewHash
The mechanism to use parse-options API to automate the command line
completion continues to get extended and polished.
* nd/parseopt-completion-more:
completion: use __gitcomp_builtin in _git_cherry
completion: use __gitcomp_builtin in _git_ls_tree
completion: delete option-only completion commands
completion: add --option completion for most builtin commands
completion: factor out _git_xxx calling code
completion: mention the oldest version we need to support
git.c: add hidden option --list-parseopt-builtins
git.c: move cmd_struct declaration up
Code to find the length to uniquely abbreviate object names based
on packfile content, which is a relatively recent addtion, has been
optimized to use the same fan-out table.
* ds/bsearch-hash:
sha1_name: use bsearch_pack() in unique_in_pack()
sha1_name: use bsearch_pack() for abbreviations
packfile: define and use bsearch_pack()
sha1_name: convert struct min_abbrev_data to object_id
* ws/rebase-p:
rebase: remove merges_option and a blank line
rebase: remove unused code paths from git_rebase__interactive__preserve_merges
rebase: remove unused code paths from git_rebase__interactive
rebase: add and use git_rebase__interactive__preserve_merges
rebase: extract functions out of git_rebase__interactive
rebase: reindent function git_rebase__interactive
rebase: update invocation of rebase dot-sourced scripts
rebase-interactive: simplify pick_on_preserving_merges
"diff-highlight" filter (in contrib/) learned to undertand "git log
--graph" output better.
* jk/diff-highlight-graph-fix:
diff-highlight: detect --graph by indent
diff-highlight: use flush() helper consistently
diff-highlight: test graphs with --color
diff-highlight: test interleaved parallel lines of history
diff-highlight: prefer "echo" to "cat" in tests
diff-highlight: use test_tick in graph test
diff-highlight: correct test graph diagram
* nd/remove-ignore-env-field:
repository.h: add comment and clarify repo_set_gitdir
repository: delete ignore_env member
sha1_file.c: move delayed getenv(altdb) back to setup_git_env()
repository.c: delete dead functions
repository.c: move env-related setup code back to environment.c
repository: initialize the_repository in main()
"git stash push -u -- <pathspec>" gave an unnecessary and confusing
error message when there was no tracked files that match the
<pathspec>, which has been fixed.
The way "git worktree prune" worked internally has been simplified,
by assuming how "git worktree move" moves an existing worktree to a
different place.
* nd/worktree-prune:
worktree prune: improve prune logic when worktree is moved
worktree: delete dead code
gc.txt: more details about what gc does
Conversion from uchar[20] to struct object_id continues.
* bc/object-id: (36 commits)
convert: convert to struct object_id
sha1_file: introduce a constant for max header length
Convert lookup_replace_object to struct object_id
sha1_file: convert read_sha1_file to struct object_id
sha1_file: convert read_object_with_reference to object_id
tree-walk: convert tree entry functions to object_id
streaming: convert istream internals to struct object_id
tree-walk: convert get_tree_entry_follow_symlinks internals to object_id
builtin/notes: convert static functions to object_id
builtin/fmt-merge-msg: convert remaining code to object_id
sha1_file: convert sha1_object_info* to object_id
Convert remaining callers of sha1_object_info_extended to object_id
packfile: convert unpack_entry to struct object_id
sha1_file: convert retry_bad_packed_offset to struct object_id
sha1_file: convert assert_sha1_type to object_id
builtin/mktree: convert to struct object_id
streaming: convert open_istream to use struct object_id
sha1_file: convert check_sha1_signature to struct object_id
sha1_file: convert read_loose_object to use struct object_id
builtin/index-pack: convert struct ref_delta_entry to object_id
...
"git shortlog cruft" aborted with a BUG message when run outside a
Git repository. The command has been taught to complain about
extra and unwanted arguments on its command line instead in such a
case.
The build procedure learned to optionally use symbolic links
(instead of hardlinks and copies) to install "git-foo" for built-in
commands, whose binaries are all identical.
* ab/install-symlinks:
Makefile: optionally symlink libexec/git-core binaries to bin/git
Makefile: add a gitexecdir_relative variable
Makefile: fix broken bindir_relative variable
"git filter-branch" learned to use a different exit code to allow
the callers to tell the case where there was no new commits to
rewrite from other error cases.
* ml/filter-branch-no-op-error:
filter-branch: return 2 when nothing to rewrite
Git can be built to use either v1 or v2 of the PCRE library, and so
far, the build-time configuration USE_LIBPCRE=YesPlease instructed
the build procedure to use v1, but now it means v2. USE_LIBPCRE1
and USE_LIBPCRE2 can be used to explicitly choose which version to
use, as before.
* ab/pcre-v2:
Makefile: make USE_LIBPCRE=YesPlease mean v2, not v1
configure: detect redundant --with-libpcre & --with-libpcre1
configure: fix a regression in PCRE v1 detection
A "git fetch" from a repository with insane number of refs into a
repository that is already up-to-date still wasted too many cycles
making many lstat(2) calls to see if these objects at the tips
exist as loose objects locally. These lstat(2) calls are optimized
away by enumerating all loose objects beforehand.
It is unknown if the new strategy negatively affects existing use
cases, fetching into a repository with many loose objects from a
repository with small number of refs.
* ti/fetch-everything-local-optim:
fetch-pack.c: use oidset to check existence of loose object
Rename detection logic in "diff" family that is used in "merge" has
learned to guess when all of x/a, x/b and x/c have moved to z/a,
z/b and z/c, it is likely that x/d added in the meantime would also
want to move to z/d by taking the hint that the entire directory
'x' moved to 'z'. A bug causing dirty files involved in a rename
to be overwritten during merge has also been fixed as part of this
work.
* en/rename-directory-detection: (29 commits)
merge-recursive: ensure we write updates for directory-renamed file
merge-recursive: avoid spurious rename/rename conflict from dir renames
directory rename detection: new testcases showcasing a pair of bugs
merge-recursive: fix remaining directory rename + dirty overwrite cases
merge-recursive: fix overwriting dirty files involved in renames
merge-recursive: avoid clobbering untracked files with directory renames
merge-recursive: apply necessary modifications for directory renames
merge-recursive: when comparing files, don't include trees
merge-recursive: check for file level conflicts then get new name
merge-recursive: add computation of collisions due to dir rename & merging
merge-recursive: check for directory level conflicts
merge-recursive: add get_directory_renames()
merge-recursive: make a helper function for cleanup for handle_renames
merge-recursive: split out code for determining diff_filepairs
merge-recursive: make !o->detect_rename codepath more obvious
merge-recursive: fix leaks of allocated renames and diff_filepairs
merge-recursive: introduce new functions to handle rename logic
merge-recursive: move the get_renames() function
directory rename detection: tests for handling overwriting dirty files
directory rename detection: tests for handling overwriting untracked files
...
containing the SVN repository UUID. Now git-svn behaves like git-commit:
If the email is *explicitly* set to the empty string using '<>', the
commit does not contain an email address, only the name:
jondoe <>
Allowing to remove the email address *intentionally* prevents automatic
systems from sending emails to those fictional addresses and avoids
cluttering the log output with unnecessary stuff.
Signed-off-by: Andreas Heiduk <asheiduk@gmail.com> Signed-off-by: Eric Wong <e@80x24.org>
In 36db1eddf9 ("git-svn: add --authors-prog option", 2009-05-14) the path
to authors-prog was made absolute because git-svn changes the current
directory in some situations. This makes sense if the program is part of
the repository but prevents searching via $PATH.
The old behaviour is still retained, but if the file does not exists, then
authors-prog is searched for in $PATH as any other command.
Signed-off-by: Andreas Heiduk <asheiduk@gmail.com> Signed-off-by: Eric Wong <e@80x24.org>
add -p: fix 2.17.0-rc* regression due to moved code
Fix a regression in 88f6ffc1c2 ("add -p: only bind search key if
there's more than one hunk", 2018-02-13) which is present in
2.17.0-rc*, but not 2.16.0.
In Perl, regex variables like $1 always refer to the last regex
match. When the aforementioned change added a new regex match between
the old match and the corresponding code that was expecting $1, the $1
variable would always be undef, since the newly inserted regex match
doesn't have any captures.
As a result the "/" feature to search for a string in a hunk by regex
completely broke, on git.git:
$ perl -pi -e 's/Git/Tig/g' README.md
$ ./git --exec-path=$PWD add -p
[..]
Stage this hunk [y,n,q,a,d,j,J,g,/,s,e,?]? s
Split into 4 hunks.
[...]
Stage this hunk [y,n,q,a,d,j,J,g,/,s,e,?]? /Many
Use of uninitialized value $1 in string eq at /home/avar/g/git/git-add--interactive line 1568, <STDIN> line 1.
search for regex? Many
I.e. the initial "/regex" command wouldn't work, and would always emit
a warning and ask again for a regex, now it works as intended again.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
upload-pack: disable object filtering when disabled by config
When upload-pack gained partial clone support (v2.17.0-rc0~132^2~12,
2017-12-08), it was guarded by the uploadpack.allowFilter config item
to allow server operators to control when they start supporting it.
That config item didn't go far enough, though: it controls whether the
'filter' capability is advertised, but if a (custom) client ignores
the capability advertisement and passes a filter specification anyway,
the server would handle that despite allowFilter being false.
This is particularly significant if a security bug is discovered in
this new experimental partial clone code. Installations without
uploadpack.allowFilter ought not to be affected since they don't
intend to support partial clone, but they would be swept up into being
vulnerable.
Simplify and limit the attack surface by making uploadpack.allowFilter
disable the feature, not just the advertisement of it.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
credential: ignore SIGPIPE when writing to credential helpers
The credential subsystem can trigger SIGPIPE when writing to an
external helper if that helper closes its stdin before reading the
whole input. Normally this is rare, since helpers would need to read
that input to make a decision about how to respond, but:
1. It's reasonable to configure a helper which only handles "get"
while ignoring "store". Such a handler might not read stdin
for "store", thereby rapidly closing stdin upon helper exit.
2. A broken or misbehaving helper might exit immediately. That's an
error, but it's not reasonable for it to take down the parent Git
process with SIGPIPE.
Even with such a helper, seeing this problem should be rare. Getting
SIGPIPE requires the helper racily exiting before we've written the
fairly small credential output.
Signed-off-by: Erik E Brady <brady@cisco.com> Reviewed-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
submodule: check for NULL return of get_submodule_ref_store()
If we can't find a ref store for a submodule then assume the latter
is not initialized (or was removed). Print a status line accordingly
instead of causing a segmentation fault by passing NULL as the first
parameter of refs_head_ref().
Reported-by: Jeremy Feusi <jeremy@feusi.co> Reviewed-by: Stefan Beller <sbeller@google.com> Initial-Test-By: Stefan Beller <sbeller@google.com> Helped-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Hotfix for recently graduated topic that give help to completion
scripts from the Git subcommands that are being completed
* nd/parseopt-completion:
t9902: disable test on the list of merge-strategies under GETTEXT_POISON
completion: clear cached --options when sourcing the completion script
test: avoid pipes in git related commands for test
Avoid using pipes downstream of Git commands since the exit codes
of commands upstream of pipes get swallowed, thus potentially
hiding failure of those commands. Instead, capture Git command
output to a file and apply the downstream command(s) to that file.
Signed-off-by: Pratik Karki <predatoramigo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
submodule deinit: handle non existing pathspecs gracefully
This fixes a regression introduced in 2e612731b5 (submodule: port
submodule subcommand 'deinit' from shell to C, 2018-01-15), when
handling pathspecs that do not exist gracefully. This restores the
historic behavior of reporting the pathspec as unknown and returning
instead of reporting a bug.
Reported-by: Peter Oberndorfer <kumbayo84@arcor.de> Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 1ada5020b3 ("stash: use stash_push for no verb form", 2017-02-28),
when the pathspec argument was introduced in 'git stash', that was also
documented. However I forgot to remove an extra square bracket after
the '--message' argument, even though the square bracket should have
been after the pathspec argument (where it was also added).
Remove the extra square bracket after the '--message' argument, to show
that the pathspec argument should be used with the 'push' verb.
While the pathspec argument can be used without the push verb, that's a
special case described later in the man page, and removing the first extra
square bracket instead of the second one makes the synopis easier to
understand.
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
doc hash-function-transition: clarify what SHAttered means
Attempt to clarify what the SHAttered attack means in practice for
Git. The previous version of the text made no mention whatsoever of
Git already having a mitigation for this specific attack, which the
SHAttered researchers claim will detect cryptanalytic collision
attacks.
I may have gotten some of the nuances wrong, but as far as I know this
new text accurately summarizes the current situation with SHA-1 in
git. I.e. git doesn't really use SHA-1 anymore, it uses
Hardened-SHA-1 (they just so happen to produce the same outputs
99.99999999999...% of the time).
Thus the previous text was incorrect in asserting that:
[...]As a result [of SHAttered], SHA-1 cannot be considered
cryptographically secure any more[...]
That's not the case. We have a mitigation against SHAttered, *however*
we consider it prudent to move to work towards a NewHash should future
vulnerabilities in either SHA-1 or Hardened-SHA-1 emerge.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
doc hash-function-transition: clarify how older gits die on NewHash
Change the "Repository format extension" to accurately describe what
happens with different versions of Git when they encounter NewHash
repositories, instead of only saying what happens with versions v2.7.0
and later.
See ab9cb76f66 ("Repository format version check.", 2005-11-25) and 00a09d57eb ("introduce "extensions" form of
core.repositoryformatversion", 2015-06-23) for the relevant changes to
the setup code where these variables are checked.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 11395a3b4b (test_must_be_empty: make sure the file exists, not
just empty, 2018-02-27) basically duplicated the 'test_path_is_file'
helper function in 'test_must_be_empty'.
Just call 'test_path_is_file' to avoid this code duplication.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
One of the most interesting thing one can be interested in when
looking at performance test results is possible performance
regressions.
This new option makes it easy to spot such possible regressions.
This new option is named '--sort-by=regression' to make it
possible and easy to add other ways to sort the results, like for
example '--sort-by=utime'.
If we would like to sort according to how much the stime regressed
we could also add a new option called '--sort-by=regression:stime'.
Then '--sort-by=regression' could become a synonym for
'--sort-by=regression:rtime'.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>