fetch-pack.c: use oidset to check existence of loose object
When fetching from a repository with large number of refs, because to
check existence of each refs in local repository to packed and loose
objects, 'git fetch' ends up doing a lot of lstat(2) to non-existing
loose form, which makes it slow.
Instead of making as many lstat(2) calls as the refs the remote side
advertised to see if these objects exist in the loose form, first
enumerate all the existing loose objects in hashmap beforehand and use
it to check existence of them if the number of refs is larger than the
number of loose objects.
With this patch, the number of lstat(2) calls in `git fetch` is reduced
from 411412 to 13794 for chromium repository, it has more than 480000
remote refs.
I took time stat of `git fetch` when fetch-pack happens for chromium
repository 3 times on linux with SSD.
* with this patch
8.105s
8.309s
7.640s
avg: 8.018s
* master
12.287s
11.175s
12.227s
avg: 11.896s
On my MacBook Air which has slower lstat(2).
* with this patch
14.501s
* master
1m16.027s
`git fetch` on slow disk will be improved largely.
Signed-off-by: Takuto Ikuta <tikuta@chromium.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Writing out the index file when the only thing that changed in it
is the untracked cache information is often wasteful, and this has
been optimized out.
* bp/untracked-cache-noflush:
untracked cache: use git_env_bool() not getenv() for customization
dir.c: don't flag the index as dirty for changes to the untracked cache
"git status" can spend a lot of cycles to compute the relation
between the current branch and its upstream, which can now be
disabled with "--no-ahead-behind" option.
* jh/status-no-ahead-behind:
status: support --no-ahead-behind in long format
status: update short status to respect --no-ahead-behind
status: add --[no-]ahead-behind to status and commit for V2 format.
stat_tracking_info: return +1 when branches not equal
Build the executable in 'script' phase in Travis CI integration, to
follow the established practice, rather than during 'before_script'
phase. This allows the CI categorize the failures better ('failed'
is project's fault, 'errored' is build environment's).
* sg/travis-build-during-script-phase:
travis-ci: build Git during the 'script' phase
Avoid using identifiers that clash with C++ keywords. Even though
it is not a goal to compile Git with C++ compilers, changes like
this help use of code analysis tools that targets C++ on our
codebase.
Since Git 1.7.9, "git merge" defaulted to --no-ff (i.e. even when
the side branch being merged is a descendant of the current commit,
create a merge commit instead of fast-forwarding) when merging a
tag object. This was appropriate default for integrators who pull
signed tags from their downstream contributors, but caused an
unnecessary merges when used by downstream contributors who
habitually "catch up" their topic branches with tagged releases
from the upstream. Update "git merge" to default to --no-ff only
when merging a tag object that does *not* sit at its usual place in
refs/tags/ hierarchy, and allow fast-forwarding otherwise, to
mitigate the problem.
* jc/allow-ff-merging-kept-tags:
merge: allow fast-forward when merging a tracked tag
"git add -p" used to offer "/" (look for a matching hunk) as a
choice, even there was only one hunk, which has been corrected.
Also the single-key help is now given only for keys that are
enabled (e.g. help for '/' won't be shown when there is only one
hunk).
* pw/add-p-single:
add -p: improve error messages
add -p: only bind search key if there's more than one hunk
add -p: only display help for active keys
The new "--show-current-patch" option gives an end-user facing way
to get the diff being applied when "git rebase" (and "git am")
stops with a conflict.
* nd/rebase-show-current-patch:
rebase: introduce and use pseudo-ref REBASE_HEAD
rebase: add --show-current-patch
am: add --show-current-patch
"git send-email" learned to complain when the batch-size option is
not defined when the relogin-delay option is, since these two are
mutually required.
* xz/send-email-batch-size:
send-email: error out when relogin delay is missing
Clarify how configured fetch refspecs interact with the "--prune"
option of "git fetch", and also add a handy short-hand for getting
rid of stale tags that are locally held.
* ab/fetch-prune:
fetch: make the --prune-tags work with <url>
fetch: add a --prune-tags option and fetch.pruneTags config
fetch tests: add scaffolding for the new fetch.pruneTags
git-fetch & config doc: link to the new PRUNING section
git remote doc: correct dangerous lies about what prune does
git fetch doc: add a new section to explain the ins & outs of pruning
fetch tests: fetch <url> <spec> as well as fetch [<remote>]
fetch tests: expand case/esac for later change
fetch tests: double quote a variable for interpolation
fetch tests: test --prune and refspec interaction
fetch tests: add a tag to be deleted to the pruning tests
fetch tests: re-arrange arguments for future readability
fetch tests: refactor in preparation for testing tag pruning
remote: add a macro for "refs/tags/*:refs/tags/*"
fetch: stop accessing "remote" variable indirectly
fetch: trivially refactor assignment to ref_nr
fetch: don't redundantly NULL something calloc() gave us
This adds xfuncname and word_regex patterns for golang, a quite
popular programming language. It also includes test cases for the
xfuncname regex (t4018) and updated documentation.
The xfuncname regex finds functions, structs and interfaces. Although
the Go language prohibits the opening brace from being on its own
line, the regex does not makes it mandatory, to be able to match
`func` statements like this:
func foo(bar int,
baz int) {
}
This is covered by the test case t4018/golang-long-func.
The word_regex pattern finds identifiers, integers, floats, complex
numbers and operators, according to the go specification.
Signed-off-by: Alban Gruin <alban.gruin@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
commit: run git gc --auto just before the post-commit hook
Change the behavior of git-commit back to what it was back in d4bb43ee27 ("Invoke "git gc --auto" from commit, merge, am and
rebase.", 2007-09-05) when it was git-commit.sh.
Shortly afterwards in f5bbc3225c ("Port git commit to C.", 2007-11-08)
when it was ported to C, the "git gc --auto" invocation went away.
Since that unintended regression, git gc --auto only ran for git-am,
git-merge, git-fetch, and git-receive-pack. It was possible to
write a script that would "git commit" a lot of data locally, and gc
would never run.
One such repository that was locally committing generated zone file
changes had grown to a size of ~60GB before a daily cronjob was added
to "git gc", bringing it down to less than 1GB. This will make such
cases work without intervention.
I think fixing such pathological cases where the repository will grow
forever is a worthwhile trade-off for spending a couple of
milliseconds calling "git gc --auto" (in the common cases where it
doesn't do anything).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Many places in "git apply" knew that "/dev/null" that signals
"there is no such file on this side of the diff" can be followed by
whitespace and garbage when parsing a patch, except for one, which
made an otherwise valid patch (e.g. ones from subversion) rejected.
* tk/apply-dev-null-verify-name-fix:
apply: handle Subversion diffs with /dev/null gracefully
apply: demonstrate a problem applying svn diffs
"git am" has learned the "--quit" option, in addition to the existing
"--abort" option; having the pair mirrors a few other commands like
"rebase" and "cherry-pick".
untracked cache: use git_env_bool() not getenv() for customization
GIT_DISABLE_UNTRACKED_CACHE and GIT_TEST_UNTRACKED_CACHE are only
sensed for their presense by using getenv(); use git_env_bool()
instead so that GIT_DISABLE_UNTRACKED_CACHE=false would work as
naïvely expected.
Also rename GIT_TEST_UNTRACKED_CACHE to GIT_FORCE_UNTRACKED_CACHE
to express what it does more honestly. Forcing its use may be one
useful thing to do while testing the feature, but testing does not
have to be the only use of the knob.
While at it, avoid repeated calls to git_env_bool() by capturing the
return value from the first call in a static variable.
In mark_parents_uninteresting(), we check for the existence of an
object file to see if we should treat a commit as parsed. The result
is to set the "parsed" bit on the commit.
Modify the condition to only check has_object_file() if the result
would change the parsed bit.
When a local branch is different from its upstream ref, "git status"
will compute ahead/behind counts. This uses paint_down_to_common()
and hits mark_parents_uninteresting(). On a copy of the Linux repo
with a local instance of "master" behind the remote branch
"origin/master" by ~60,000 commits, we find the performance of
"git status" went from 1.42 seconds to 1.32 seconds, for a relative
difference of -7.0%.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
perf: use GIT_PERF_REPEAT_COUNT=3 by default even without config file
9ba95ed23c (perf/run: update get_var_from_env_or_config() for
subsections) stopped setting a default value for GIT_PERF_REPEAT_COUNT
if no perf config file is present, because get_var_from_env_or_config
returns early in that case.
Fix it by setting the default value after calling this function. Its
fifth parameter is not used for any other variable, so remove the
associated code.
Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
During abbreviation checks, we navigate to the position within a
pack-index that an OID would be inserted and check surrounding OIDs
for the maximum matching prefix. This position may be beyond the
last position, because the given OID is lexicographically larger
than every OID in the pack. Then nth_packed_object_oid() does not
initialize "oid".
Use the return value of nth_packed_object_oid() to prevent these
errors.
Also the comment about checking near-by objects miscounts the
neighbours. If we have a hit at "first", we check "first-1" and
"first+1" to make sure we have sufficiently long abbreviation not to
match either. If we do not have a hit, "first" is the smallest
among the objects that are larger than what we want to name, so we
check that and "first-1" to make sure we have sufficiently long
abbreviation not to match either. In either case, we only check up
to two near-by objects.
Reported-by: Christian Couder <christian.couder@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove erroneous space between % and < in '% <(<N>)'.
Signed-off-by: Mårten Kongstad <marten.kongstad@gmail.com> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
test_must_be_empty: make sure the file exists, not just empty
The helper function test_must_be_empty is meant to make sure the
given file is empty, but its implementation is:
if test -s "$1"
then
... not empty, we detected a failure ...
fi
Surely, the file having non-zero size is a sign that the condition
"the file must be empty" is violated, but it misses the case where
the file does not even exist. It is an accident waiting to happen
with a buggy test like this:
Merge branch 'sb/submodule-update-reset-fix' into maint
When resetting the working tree files recursively, the working tree
of submodules are now also reset to match.
* sb/submodule-update-reset-fix:
submodule: submodule_move_head omits old argument in forced case
unpack-trees: oneway_merge to update submodules
t/lib-submodule-update.sh: fix test ignoring ignored files in submodules
t/lib-submodule-update.sh: clarify test
Merge branch 'nd/ita-wt-renames-in-status' into maint
"git status" after moving a path in the working tree (hence making
it appear "removed") and then adding with the -N option (hence
making that appear "added") detected it as a rename, but did not
report the old and new pathnames correctly.
* nd/ita-wt-renames-in-status:
wt-status.c: handle worktree renames
wt-status.c: rename rename-related fields in wt_status_change_data
wt-status.c: catch unhandled diff status codes
wt-status.c: coding style fix
Use DIFF_DETECT_RENAME for detect_rename assignments
t2203: test status output with porcelain v2 format
* jk/test-hashmap-updates:
test-hashmap: use "unsigned int" for hash storage
test-hashmap: simplify alloc_test_entry
test-hashmap: use strbuf_getline rather than fgets
test-hashmap: use xsnprintf rather than snprintf
test-hashmap: check allocation computation for overflow
test-hashmap: use ALLOC_ARRAY rather than bare malloc
Code to unquote single-quoted string (used in the parser for
configuration files, etc.) did not diagnose bogus input correctly
and produced bogus results instead.
* jk/sq-dequote-on-bogus-input:
sq_dequote: fix extra consumption of source string
"git add" files in the same directory, but spelling the directory
path in different cases on case insensitive filesystem, corrupted
the name hash data structure and led to unexpected results. This
has been corrected.
* bp/name-hash-dirname-fix:
name-hash: properly fold directory names in adjust_dirname_case()
Doc update to warn against remaining bugs in untracked cache.
* ab/untracked-cache-invalidation-docs:
update-index doc: note the caveat with "could not open..."
update-index doc: note a fixed bug in the untracked cache
Some bugs around "untracked cache" feature have been fixed.
* nd/fix-untracked-cache-invalidation:
dir.c: ignore paths containing .git when invalidating untracked cache
dir.c: stop ignoring opendir() error in open_cached_dir()
dir.c: fix missing dir invalidation in untracked code
dir.c: avoid stat() in valid_cached_dir()
status: add a failing test showing a core.untrackedCache bug
Amazingly, timegm(gmtime(0)) is only 0 before 2020 because perl's
timegm deviates from GNU timegm(3) in how it handles years.
man Time::Local says
Whenever possible, use an absolute four digit year instead.
with a detailed explanation about ambiguity of 2-digit years above that.
Even though this ambiguity is error-prone with >50% of users getting it
wrong, it has been like this for 20+ years, so we just use 4-digit years
everywhere to be on the safe side.
We add some extra logic to cvsimport because it allows 2-digit year
input and interpreting an 18 as 1918 can be avoided easily and safely.
Signed-off-by: Bernhard M. Wiedemann <bwiedemann@suse.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
If log.showsignature is true (or --show-signature is passed) while
performing a `subtree add` or `subtree pull`, the command fails.
toptree_for_commit() calls `log` and passes the output to `commit-tree`.
If this output shows the GPG signature data, `commit-tree` throws a
fatal error.
This commit fixes the issue by adding --no-show-signature to `log` calls
in a few places, as well as using the more appropriate `rev-parse`
instead where possible.
Signed-off-by: Stephen R Guglielmo <srg@guglielmo.us> Signed-off-by: Junio C Hamano <gitster@pobox.com>
strbuf_read_file(): preserve errno across close() call
If we encounter a read error, the user may want to report it
by looking at errno. However, our close() call may clobber
errno, leading to confusing results. Let's save and restore
it in the error case.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the NO_PTHREADS or !num_threads case, this doesn't change
anything. In the threaded case, note that grep_source_init duplicates
its third argument, so there is no need to keep [path]buf.buf alive
across the call of add_work().
Signed-off-by: Rasmus Villemoes <rv@rasmusvillemoes.dk> Reviewed-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
grep_source_init typically does three strdup()s, and in the threaded
case, the call from add_work() happens while holding grep_mutex.
We can thus reduce the time we hold grep_mutex by moving the
grep_source_init() call out of add_work(), and simply have add_work()
copy the initialized structure to the available slot in the todo
array.
This also simplifies the prototype of add_work(), since it no longer
needs to duplicate all the parameters of grep_source_init(). In the
callers of add_work(), we get to reduce the amount of code duplicated in
the threaded and non-threaded cases slightly (avoiding repeating the
long "GREP_SOURCE_OID, pathbuf.buf, path, oid" argument list); a
subsequent cleanup patch will make that even more so.
Signed-off-by: Rasmus Villemoes <rv@rasmusvillemoes.dk> Reviewed-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
In gitsubmodules.txt, a few non-ASCII apostrophes are used to spell
possessive, e.g. "submodule's". These unfortunately are not
rendered at https://git-scm.com/docs/gitsubmodules correctly by the
renderer used there.
Use ASCII apostrophes instead to work around the problem. It also
is good to be consistent, as there are possessives spelled with
ASCII apostrophes.
Signed-off-by: Motoki Seki <marmot.motoki@gmail.com> Acked-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Test helper functions like test_must_fail may produce
messages to stderr when they see a problem. When the tests
are run with "--verbose", this ends up on the test script's
stderr, and the user can read it.
But there's a problem. Some tests record stderr as part of
the test, like:
In this case the error text goes into "output". This makes
the --verbose output less useful (it also means we might
accidentally match it in the second, though in practice we
tend to produce these messages only on error, so we'd abort
the test when the first command fails).
Let's instead send this user-facing output directly to
descriptor 4, which always points to the original stderr (or
/dev/null in non-verbose mode). And it's already forbidden
to redirect descriptor 4, since we use it for BASH_XTRACEFD,
as explained in 9be795fbce (t5615: avoid re-using descriptor
4, 2017-12-08).
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
This was an undocumented debugging aid that does not seem to
have come in handy in the past decade, judging from its lack
of mentions on the mailing list.
Let's drop it in the name of simplicity. This is morally a
revert of 3131b71301 (Add "--show-all" revision walker flag
for debugging, 2008-02-09), but note that I did leave in the
mapping of UNINTERESTING to "^" in get_revision_mark(). I
don't think this would be possible to trigger with the
current code, but it's the only sensible marker.
We'll skip the usual deprecation period because this was
explicitly a debugging aid that was never documented.
Signed-off-by: Jeff King <peff@peff.net> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "--show-all" revision option shows UNINTERESTING
commits. Some of these commits may be unparsed when we try
to show them (since we may or may not need to walk their
parents to fulfill the request).
Commit 3131b71301 (Add "--show-all" revision walker flag for
debugging, 2008-02-09) resolved this by just skipping
pretty-printing for commits without their object contents
cached, saying:
Because we now end up listing commits we may not even have been parsed
at all "show_log" and "show_commit" need to protect against commits
that don't have a commit buffer entry.
That was the easy fix to avoid the pretty-printer segfaulting,
but:
1. It doesn't work for all formats. E.g., --oneline
prints the oid for each such commit but not a trailing
newline, leading to jumbled output.
2. It only affects some commits, depending on whether we
happened to parse them or not (so if they were at the
tip of an UNINTERESTING starting point, or if we
happened to traverse over them, you'd see more data).
3. It unncessarily ties the decision to show the verbose
header to whether the commit buffer was cached. That
makes it harder to change the logic around caching
(e.g., if we could traverse without actually loading
the full commit objects).
These days it's safe to feed such a commit to the
pretty-print code. Since be5c9fb904 (logmsg_reencode: lazily
load missing commit buffers, 2013-01-26), we'll load it on
demand in such a case. So let's just always show the verbose
headers.
This does change the behavior of plumbing, but:
a. The --show-all option was explicitly introduced as a
debugging aid, and was never documented (and has rarely
even been mentioned on the list by git devs).
b. Avoiding the commits was already not deterministic due
to (2) above. So the caller might have seen full
headers for these commits anyway, and would need to be
prepared for it.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>