"diff-highlight" (in contrib/) used to show byte-by-byte
differences, which meant that multi-byte characters can be chopped
in the middle. It learned to pay attention to character boundaries
(assuming the UTF-8 payload).
* jk/colors:
diff-highlight: do not split multibyte characters
* jk/test-annoyances:
t5551: make EXPENSIVE test cheaper
t5541: move run_with_cmdline_limit to test-lib.sh
t: pass GIT_TRACE through Apache
t: redirect stderr GIT_TRACE to descriptor 4
t: translate SIGINT to an exit
An earlier update to the parser that disects an address broke an
address, followed by a colon, followed by an empty string (instead
of the port number).
* tb/connect-ipv6-parse-fix:
connect.c: ignore extra colon after hostname
* va/fix-git-p4-tests:
t9814: guarantee only one source exists in git-p4 copy tests
git-p4: fix copy detection test
t9814: fix broken shell syntax in git-p4 rename test
The "git push --signed" protocol extension did not limit what the
"nonce" that is a server-chosen string can contain or how long it
can be, which was unnecessarily lax. Limit both the length and the
alphabet to a reasonably small space that can still have enough
entropy.
* jc/push-cert:
push --signed: tighten what the receiving end can ask to sign
A broken or badly formatted commit might not record author or
committer lines or we may not find a valid name on them. The
function record_person() returned after calling get_commit_buffer()
without calling unuse_commit_buffer() on the memory it obtained in
such cases, potentially leaking it.
Helped-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since 33d4221 (write_sha1_file: freshen existing objects,
2014-10-15), we update the mtime of existing objects that we
would have written out (had they not existed). For the
common case in which many objects are packed, we may update
the mtime on a single packfile repeatedly. This can result
in a noticeable performance problem if calling utime() is
expensive (e.g., because your storage is on NFS).
We can fix this by keeping a per-pack flag that lets us
freshen only once per program invocation.
An alternative would be to keep the packed_git.mtime flag up
to date as we freshen, and freshen only once every N
seconds. In practice, it's not worth the complexity. We are
racing against prune expiration times here, which inherently
must be set to accomodate reasonable program running times
(because they really care about the time between an object
being written and it becoming referenced, and the latter is
typically the last step a program takes).
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
When writing out an object file, we first check whether it
already exists and if so optimize out the write. Prior to 33d4221, we did this by calling has_sha1_file(), which will
check for packed objects followed by loose. Since that
commit, we check loose objects first.
For the common case of a repository whose objects are mostly
packed, this means we will make a lot of extra access()
system calls checking for loose objects. We should follow
the same packed-then-loose order that all of our other
lookups use.
Reported-by: Stefan Saasen <ssaasen@atlassian.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
When pruning and repacking a repository that has an
alternate object store configured, we may traverse a large
number of objects in the alternate. This serves no purpose,
and may be expensive to do. A longer explanation is below.
Commits d3038d2 and abcb865 taught prune and pack-objects
(respectively) to treat "recent" objects as tips for
reachability, so that we keep whole chunks of history. They
built on the object traversal in 660c889 (sha1_file: add
for_each iterators for loose and packed objects,
2014-10-15), which covers both local and alternate objects.
In both cases, covering alternate objects is unnecessary, as
both commands can only drop objects from the local
repository. In the case of prune, we traverse only the local
object directory. And in the case of repacking, while we may
or may not include local objects in our pack, we will never
reach into the alternate with "repack -d". The "-l" option
is only a question of whether we are migrating objects from
the alternate into our repository, or leaving them
untouched.
It is possible that we may drop an object that is depended
upon by another object in the alternate. For example,
imagine two repositories, A and B, with A pointing to B as
an alternate. Now imagine a commit that is in B which
references a tree that is only in A. Traversing from recent
objects in B might prevent A from dropping that tree. But
this case isn't worth covering. Repo B should take
responsibility for its own objects. It would never have had
the commit in the first place if it did not also have the
tree, and assuming it is using the same "keep recent chunks
of history" scheme, then it would itself keep the tree, as
well.
So checking the alternate objects is not worth doing, and
come with a significant performance impact. In both cases,
we skip any recent objects that have already been marked
SEEN (i.e., that we know are already reachable for prune, or
included in the pack for a repack). So there is a slight
waste of time in opening the alternate packs at all, only to
notice that we have already considered each object. But much
worse, the alternate repository may have a large number of
objects that are not reachable from the local repository at
all, and we end up adding them to the traversal.
We can fix this by considering only local unseen objects.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The old wording was somehow implying that <start> and <end> were not
regular expressions. Also, the common case is to use a plain function
name here so <funcname> makes sense (the fact that it is a regular
expression is documented in line-range-format.txt).
Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* tag 'gitgui-0.20.0' of http://repo.or.cz/r/git-gui:
git-gui: set version 0.20
git-gui: sv.po: Update Swedish translation (547t0f0u)
git-gui i18n: Updated Bulgarian translation (547t,0f,0u)
git-gui: Makes chooser set 'gitdir' to the resolved path
git-gui: Fixes chooser not accepting gitfiles
git-gui: reinstate support for Tcl 8.4
git-gui: fix problem with gui.maxfilesdisplayed
git-gui: fix verbose loading when git path contains spaces.
git-gui/gitk: Do not depend on Cygwin's "kill" command on Windows
git-gui: add configurable tab size to the diff view
git-gui: Make git-gui lib dir configurable at runime
git-gui i18n: Updated Bulgarian translation (520t,0f,0u)
L10n: vi.po (543t): Init translation for Vietnamese
git-gui: align the new recursive checkbox with the radiobuttons.
git-gui: Add a 'recursive' checkbox in the clone menu.
limit_list: avoid quadratic behavior from still_interesting
When we are limiting a rev-list traversal due to
UNINTERESTING refs, we have to walk down the tips (both
interesting and uninteresting) to find where they intersect.
We keep a queue of commits to examine, pop commits off
the queue one by one, and potentially add their parents. The
size of the queue will naturally fluctuate based on the
"width" of the history graph; i.e., the number of
simultaneous lines of development. But for the most part it
will stay in the same ballpark as the initial number of tips
we fed, shrinking over time (as we hit common ancestors of
the tips). So roughly speaking, if we start with `N` tips,
we'll spend much of the time with a queue around `N` items.
For each UNINTERESTING commit we pop, we call
still_interesting to check whether marking its parents as
UNINTERESTING has made the whole queue uninteresting (in
which case we can quit early). Because the queue is stored
as a linked list, this is `O(N)`, where `N` is the number of
items in the queue. So processing a queue with `N` commits
marked UNINTERESTING (and one or more interesting commits)
will take `O(N^2)`.
If you feed a lot of positive tips, this isn't a problem.
They aren't UNINTERESTING, so they don't incur the
still_interesting check. It also isn't a problem if you
traverse from an interesting tip to some UNINTERESTING
bases. We order the queue by recency, so the interesting
commits stay at the front of the queue as we walk down them.
The linear check can exit early as soon as it sees one
interesting commit left in the queue.
But if you want to know whether an older commit is reachable
from a set of newer tips, we end up processing in the
opposite direction: from the UNINTERESTING ones down to the
interesting one. This may happen when we call:
git rev-list $commits --not --all
in check_everything_connected after a fetch. If we fetched
something much older than most of our refs, and if we have a
large number of refs, the traversal cost is dominated by the
quadratic behavior.
These commands simulate the connectivity check of such a
fetch, when you have `$n` distinct refs in the receiver:
# positive ref is 100,000 commits deep
git rev-list --all | head -100000 | tail -1 >input
# huge number of more recent negative refs
git rev-list --all | head -$n | sed s/^/^/ >>input
time git rev-list --stdin <input
Here are timings for various `n` on the linux.git
repository. The `n=1` case provides a baseline for just
walking the commits, which lets us see the still_interesting
overhead. The times marked with `+` subtract that baseline
to show just the extra time growth due to the large number
of refs. The `x` numbers show the slowdown of the adjusted
time versus the prior trial.
Each trial doubles `n`, so you can see the quadratic (`4x`)
behavior before this patch. Afterwards, we have a roughly
linear relationship.
The implementation is fairly straightforward. Whenever we do
the linear search, we cache the interesting commit we find,
and next time check it before doing another linear search.
If that commit is removed from the list or becomes
UNINTERESTING itself, then we fall back to the linear
search. This is very similar to the trick used by fce87ae
(Fix quadratic performance in rewrite_one., 2008-07-12).
I considered and rejected several possible alternatives:
1. Keep a count of UNINTERESTING commits in the queue.
This requires managing the count not only when removing
an item from the queue, but also when marking an item
as UNINTERESTING. That requires touching the other
functions which mark commits, and would require knowing
quickly which commits are in the queue (lookup in the
queue is linear, so we would need an auxiliary
structure or to also maintain an IN_QUEUE flag in each
commit object).
2. Keep a separate list of interesting commits. Drop items
from it when they are dropped from the queue, or if
they become UNINTERESTING. This again suffers from
extra complexity to maintain the list, not to mention
CPU and memory.
3. Use a better data structure for the queue. This is
something that could help the fix in fce87ae, because
we order the queue by recency, and it is about
inserting quickly in recency order. So a normal
priority queue would help there. But here, we cannot
disturb the order of the queue, which makes things
harder. We really do need an auxiliary index to track
the flag we care about, which is basically option (2)
above.
The "cache" trick is simple, and the numbers above show that
it works well in practice. This is because the length of
time it takes to find an interesting commit is proportional
to the length of time it will remain cached (i.e., if we
have to walk a long way to find it, it also means we have to
pop a lot of elements in the queue until we get rid of it
and have to find another interesting commit).
The worst case is still quadratic, though. We could have `N`
uninteresting commits at the front of the queue, followed by
`N` interesting commits, where commit `i` has parent `i+N`.
When we pop commit `i`, we will notice that the parent of
the next commit, `i+1+N` is still interesting and cache it.
But then handling commit `i+1`, we will mark its parent
`i+1+N` uninteresting, and immediately invalidate our cache.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
When commit fe8e3b7 refactored type_from_string to allow
input that was not NUL-terminated, it switched to using
strncmp instead of strcmp. But this means we check only the
first "len" bytes of the strings, and ignore any remaining
bytes in the object_type_string. We should make sure that it
is also "len" bytes, or else we would accept "comm" as
"commit", and so forth.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
config: use utf8_bom[] from utf.[ch] in git_parse_source()
Because the function reads one character at the time, unfortunately
we cannot use the easier skip_utf8_bom() helper, but at least we do
not have to duplicate the constant string this way.
With the recent change to ignore the UTF8 BOM at the beginning of
.gitignore files, we now have two codepaths that do such a skipping
(the other one is for reading the configuration files).
Introduce utf8_bom[] constant string and skip_utf8_bom() helper
and teach .gitignore code how to use it.
add_excludes_from_file: clarify the bom skipping logic
Even though the previous step shifts where the "entry" begins, we
still iterate over the original buf[], which may begin with the
UTF-8 BOM we are supposed to be skipping. At the end of the first
line, the code grabs the contents of it starting at "entry", so
there is nothing wrong per-se, but the logic looks really confused.
Instead, move the buf pointer and shrink its size, to truly
pretend that UTF-8 BOM did not exist in the input.
dir: allow a BOM at the beginning of exclude files
Some text editors like Notepad or LibreOffice write an UTF-8 BOM in
order to indicate that the file is Unicode text rather than whatever the
current locale would indicate.
If someone uses such an editor to edit a gitignore file, we are left
with those three bytes at the beginning of the file. If we do not skip
them, we will attempt to match a filename with the BOM as prefix, which
won't match the files the user is expecting.
Signed-off-by: Carlos Martín Nieto <cmn@elego.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Revert "merge: pass verbosity flag down to merge-recursive"
This reverts commit 2bf15a3330a26183adc8563dbeeacc11294b8a01, whose
intention was good, but the verbosity levels used in merge-recursive
turns out to be rather uneven. For example, a merge of two branches
with conflicting submodule updates used to report CONFLICT: output
with --quiet but no longer (which *is* desired), while the final
"Automatic merge failed; fix conflicts and then commit" message is
still shown even with --quiet (which *is* inconsistent).
Originally reported by Bryan Turner; it is too early to declare what
the concensus is, but it seems that we would need to level the
verbosity levels used in merge strategy backends before we can go
forward. In the meantime, we'd revert to the old behaviour until
that happens.
parse_date_basic(): let the system handle DST conversion
The function parses the input to compute the broken-down time in
"struct tm", and the GMT timezone offset. If the timezone offset
does not exist in the input, the broken-down time is turned into the
number of seconds since epoch both in the current timezone and in
GMT and the offset is computed as their difference.
However, we forgot to make sure tm.tm_isdst is set to -1 (i.e. let
the system figure out if DST is in effect in the current timezone
when turning the broken-down time to the number of seconds since
epoch); it is done so at the beginning of the function, but a call
to match_digit() in the function can lead to a call to gmtime_r() to
clobber the field.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Diagnosed-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
parse_date_basic(): return early when given a bogus timestamp
When the input does not have GMT timezone offset, the code computes
it by computing the local and GMT time for the given timestamp. But
there is no point doing so if the given timestamp is known to be a
bogus one.
"diff-highlight" (in contrib/) used to show byte-by-byte
differences, which meant that multi-byte characters can be chopped
in the middle. It learned to pay attention to character boundaries
(assuming the UTF-8 payload).
* jk/colors:
diff-highlight: do not split multibyte characters
Changed inaccurate count of "rough rules" from three to the more
generic 'a few'.
Signed-off-by: Julian Gindi <juliangindi@gmail.com> Reviewed-by: Eric Sunshine <sunshine@sunshineco.com> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "help-all" option is being initialized with a wrong value.
While being semantically wrong this can also cause a segmentation
fault in gcc on ARMv7 hardfloat platforms with a hardened
toolchain. Fix this by initializing with a NULL value.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Reviewed-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
t9814: guarantee only one source exists in git-p4 copy tests
By using a tree with multiple identical files and allowing copy detection to
choose any one of them, the check in the test is unnecessarily complex. We can
simplify by:
* Modify source file (file2) before copying the file.
* Check that only file2 is the source in the output of "p4 filelog".
* Remove all "case" statements and replace them with simple tests to check
that source is "file2".
Signed-off-by: Vitor Antunes <vitor.hda@gmail.com> Acked-by: Luke Diamand <luke@diamand.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Ignore an extra ':' at the end of the hostname in URL's like
"ssh://example.com:/path/to/repo"
The colon is meant to separate a port number from the hostname.
If the port is empty, the colon should be ignored, see RFC 3986.
It had been working for URLs with ssh:// scheme, but was unintentionally
broken in 86ceb3, "allow ssh://user@[2001:db8::1]/repo.git"
Reported-by: Reid Woodbury Jr. <reidw@rawsound.com> Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the input is UTF-8 and Perl is operating on bytes instead of
characters, a diff that changes one multibyte character to another
that shares an initial byte sequence will result in a broken diff
display as the common byte sequence prefix will be separated from
the rest of the bytes in the multibyte character.
For example, if a single line contains only the unicode character
U+C9C4 (encoded as UTF-8 0xEC, 0xA7, 0x84) and that line is then
changed to the unicode character U+C9C0 (encoded as UTF-8 0xEC,
0xA7, 0x80), when operating on bytes diff-highlight will show only
the single byte change from 0x84 to 0x80 thus creating invalid UTF-8
and a broken diff display.
Fix this by putting Perl into character mode when splitting the line
and then back into byte mode after the split is finished.
The utf8::xxx functions require Perl 5.8 so we require that as well.
Also, since we are mucking with code in the split_line function, we
change a '*' quantifier to a '+' quantifier when matching the $COLOR
expression which has the side effect of speeding everything up while
eliminating useless '' elements in the returned array.
Reported-by: Yi EungJun <semtlenori@gmail.com> Signed-off-by: Kyle J. McKay <mackyle@gmail.com> Acked-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
merge: pass verbosity flag down to merge-recursive
This makes "git merge --quiet" really quiet when we call
into merge-recursive.
Note that we can't just pass our flag down as-is; the two
parts of the code use different scales. We center at "0" as
normal for git-merge (with "--quiet" giving a negative
value), but merge-recursive uses "2" as its center. This
patch passes a negative value to merge-recursive rather than
"1", though, as otherwise the user would have to use "-qqq"
to squelch all messages (but the downside is that the user
cannot distinguish between levels 0-2 if without resorting
to the GIT_MERGE_VERBOSITY variable).
We may want to review and renormalize the message severities
in merge-recursive, but that does not have to happen now.
This is at least in improvement in the sense that we are
respecting "--quiet" at all.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
init: don't set core.worktree when initializing /.git
If you create a git repository in the root directory with
"git init /", we erroneously write a core.worktree entry.
This isn't _wrong_, in the sense that it's OK to set
core.worktree when we don't need to. But it is unnecessarily
surprising if you later move the .git directory to another
path (which usually moves the relative working tree, but is
foiled if there is an explicit worktree set).
The problem is that we check whether core.worktree is
necessary by seeing if we can make the git_dir by
concatenating "/.git" onto the working tree. That would lead
to "//.git" in this instance, but we actually have "/.git"
(without the doubled slash).
We can fix this by special-casing the root directory. I also
split the logic out into its own function to make the
conditional a bit more readable (and used skip_prefix, which
I think makes it a little more obvious what is going on).
No tests, as we would need to be able to write to "/" to do
so. I did manually confirm that:
push --signed: tighten what the receiving end can ask to sign
Instead of blindly trusting the receiving side to give us a sensible
nonce to sign, limit the length (max 256 bytes) and the alphabet
(alnum and a few selected punctuations, enough to encode in base64)
that can be used in nonce.
send-pack: unify error messages for unsupported capabilities
If --signed is not supported, the error message names the remote
"receiving end". If --atomic is not supported, the error message
names the remote "server". Unify the naming to "receiving end"
as we're in the context of "push".
Signed-off-by: Ralf Thielow <ralf.thielow@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
howto: document more tools for recovery corruption
Long ago, I documented a corruption recovery I did and gave
some C code that I used to help find a flipped bit. I had
to fix a similar case recently, and I ended up writing a few
more tools. I hope nobody ever has to use these, but it
does not hurt to share them, just in case.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
push-to-deploy: allow pushing into an unborn branch and updating it
Setting receive.denycurrentbranch to updateinstead and pushing into
the current branch, when the working tree and the index is truly
clean, is supposed to reset the working tree and the index to match
the tree of the pushed commit. This did not work when pushing into
an unborn branch.
The code that drives push-to-checkout hook needs no change, as the
interface is defined so that hook can decide what to do when the
push is coming to an unborn branch and take an appropriate action
since the beginning.
Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Merges with an absurd number of parents are still a bad idea because
they do not render well in tools like gitk, but if they are present
in the repository being imported into git then there's no need to
avoid reproducing them faithfully.
In olden times, before v1.6.0-rc0~194 (2008-06-27), git commit-tree
and higher-level tools built on top of it were limited to writing 16
parents for a commit. Nowadays normal git operations are happy to
write more parents when asked, so the motivation for this note in the
fast-import documentation is gone and we can remove it.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
gitweb.conf.txt: say "build-time", not "built-time"
"build-time" is used everywhere else.
Signed-off-by: Jérôme Zago <git-patch@agt-the-walker.net> Reviewed-by: Jakub Narębski <jnareb@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
When stream_blob_to_fd() opens an input stream with a filter, the
filter gets discarded upon calling close_istream() before the
function returns in the normal case. However, when we fail to open
the stream, we failed to discard the filter.
By discarding the filter in the failure case, give a consistent
life-time rule of the filter to the callers; otherwise the callers
need to conditionally discard the filter themselves, and this
function does not give enough hint for the caller to do so
correctly.
Signed-off-by: John Keeping <john@keeping.me.uk> Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
In b3256eb (standardize and improve lookup rules for external local
repos), enter_repo() was modified to use a different precedence
ordering of suffixes for DWIM of the repository path, and to ensure
that the repository path is actually valid instead of just testing
for existence.
However, the documentation was not modified to reflect these
changes. Fix the documentation to match the code.
Documentation contributed by Jeff King.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Paul Tan <pyokagan@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
wt_shortstatus_print_tracking() calls shorten_unambiguous_ref(),
which returns a newly allocated memory the caller takes ownership
of; it is necessary to free `base` when the function is done with
it.
Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
cherry-pick: fix docs describing handling of empty commits
Commit b27cfb0 (git-cherry-pick: Add keep-redundant-commits
option, 2012-04-20), added the --keep-redundant-commits
option, and switched the default behavior (without that
option) to silently ignore empty commits. Later, the second
half of that commit was reverted in ac2b0e8 (cherry-pick:
regression fix for empty commits, 2012-05-29), but the
documentation added for --keep-redundant-commits was never
updated to match. Let's do so now.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
sha1_file: squelch "packfile cannot be accessed" warnings
When we find an object in a packfile index, we make sure we
can still open the packfile itself (or that it is already
open), as it might have been deleted by a simultaneous
repack. If we can't access the packfile, we print a warning
for the user and tell the caller that we don't have the
object (we can then look in other packfiles, or find a loose
version, before giving up).
The warning we print to the user isn't really accomplishing
anything, and it is potentially confusing to users. In the
normal case, it is complete noise; we find the object
elsewhere, and the user does not have to care that we racily
saw a packfile index that became stale. It didn't affect the
operation at all.
A possibly more interesting case is when we later can't find
the object, and report failure to the user. In this case the
warning could be considered a clue toward that ultimate
failure. But it's not really a useful clue in practice. We
wouldn't even print it consistently (since we are racing
with another process, we might not even see the .idx file,
or we might win the race and open the packfile, completing
the operation).
This patch drops the warning entirely (not only from the
fill_pack_entry site, but also from an identical use in
pack-objects). If we did find the warning interesting in the
error case, we could stuff it away and reveal it to the user
when we later die() due to the broken object. But that
complexity just isn't worth it.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>