Improve merge performance by avoiding in-index merges.
In the early days of Git we performed a 3-way read-tree based merge
before attempting any specific merge strategy, as our core merge
strategies of merge-one-file and merge-recursive were slower script
based programs which took far longer to execute. This was a good
performance optimization in the past, as most merges were able to
be handled strictly by `read-tree -m -u`.
However now that merge-recursive is a C based program which performs
a full 3-way read-tree before it starts running we need to pay the
cost of the 3-way read-tree twice if we have to do any sort of file
level merging. This slows down some classes of simple merges which
`read-tree -m -u` could not handle but which merge-recursive does
automatically.
For a really trivial merge which can be handled entirely by
`read-tree -m -u`, skipping the read-tree and just going directly
into merge-recursive saves on average 50 ms on my PowerPC G4 system.
May sound odd, but it does appear to be true.
In a really simple merge which needs to use merge-recursive to handle
a file that was modified on both branches, skipping the read-tree
in git-merge saves on average almost 100 ms (on the same PowerPC G4)
as we avoid doing some work twice.
We only avoid `read-tree -m -u` if the only strategy to use is
merge-recursive, as not all merge strategies perform as well as
merge-recursive does.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
Since wt-status is already computing all the information needed go the whole
way and actually track the (non-)emptiness of all three sections separately,
unify the code, and provide useful messages for each individual case.
Thanks to Junio and Michael Loeffler for suggestions.
Makefile: remove $foo when $foo.exe is built/installed.
On Cygwin, newly builtins are not recognized, because there exist both
the executable binaries (with .exe extension) _and_ the now-obsolete
scripts (without extension), but the script is executed.
send-email: work around double encoding of in-body From field.
git-send-email sends out the message taken from format-patch
output without quoting nor encoding. When copying the From:
line to form in-body From: field, it should not copy it
verbatim, because the From: for the header is quoted according
to RFC 2047 when not ASCII.
The original came from Jürgen Rühle, but I moved the
string munging into a separate function so that later other
people can tweak it more easily. Bugs introduced during the
translation are mine.
Since c869753e, core.filemode is hardwired to false on Cygwin.
So this test had no chance to succeed, since an early commit
(changing just the filemode) failed, and therefore all subsequent
tests.
Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Junio C Hamano <junkio@cox.net>
My sp/mmap changes to pack-check.c modified the function such that
it expects packed_git.pack_size to be populated with the total
bytecount of the packfile by the caller.
But that isn't the case for packs obtained by git-http-fetch as
pack_size was not initialized before being accessed. This caused
verify_pack to think it had 2^32-21 bytes available when the
downloaded pack perhaps was only 305 bytes in length. The use_pack
function then later dies with "offset beyond end of packfile"
when computing the overall file checksum.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
Replacing the system call pread() with lseek()/xread()/lseek() sequence.
Using cygwin with cygwin.dll before 1.5.22 the system call pread() is buggy.
This patch introduces NO_PREAD. If NO_PREAD is set git uses a sequence of
lseek()/xread()/lseek() to emulate pread.
Signed-off-by: Stefan-W. Hahn <stefan.hahn@s-hahn.de> Signed-off-by: Junio C Hamano <junkio@cox.net>
It used to ignore the return value of the helper function; now, it
expects it to return 0, and stops iteration upon non-zero return
values; this value is then passed on as the return value of
for_each_reflog_ent().
Further, it makes no sense to force the parsing upon the helper
functions; for_each_reflog_ent() now calls the helper function with
old and new sha1, the email, the timestamp & timezone, and the message.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <junkio@cox.net>
find_header() function is used to read and parse the patchfile
and it detects errors in the patch, but one place ignored the
error and went ahead, which was quite bad.
Auto-quote config values in config.c:store_write_pair()
Suggested by Jakub Narebski <jnareb@gmail.com> on the list.
When we send a value to store_write_pair(), make sure that the value
that gets read out matches the one passed in. This means that for any
value that contains leading or trailing whitespace or any comment
character (# and ;), we need to surround it in quotes.
Signed-off-by: Brian Gernhardt <benji@silverinsanity.com> Signed-off-by: Junio C Hamano <junkio@cox.net>
With this patch, cvs add / cvs commit echoes back to the client
the correct file version (1.1) so that the file in the checkout
is recognised as up-to-date.
Signed-off-by: Martin Langhoff <martin@catalyst.net.nz> Signed-off-by: Junio C Hamano <junkio@cox.net>
* jc/reflog:
reflog --fix-stale: do not check the same trees and commits repeatedly.
reflog expire --fix-stale
Move traversal of reachable objects into a separate library.
builtin-prune: separate ref walking from reflog walking.
builtin-prune: make file-scope static struct to an argument.
short i/o: fix calls to write to use xwrite or write_in_full
We have a number of badly checked write() calls. Often we are
expecting write() to write exactly the size we requested or fail,
this fails to handle interrupts or short writes. Switch to using
the new write_in_full(). Otherwise we at a minimum need to check
for EINTR and EAGAIN, where this is appropriate use xwrite().
Note, the changes to config handling are much larger and handled
in the next patch in the sequence.
Signed-off-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
short i/o: fix calls to read to use xread or read_in_full
We have a number of badly checked read() calls. Often we are
expecting read() to read exactly the size we requested or fail, this
fails to handle interrupts or short reads. Add a read_in_full()
providing those semantics. Otherwise we at a minimum need to check
for EINTR and EAGAIN, where this is appropriate use xread().
Signed-off-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
short i/o: clean up the naming for the write_{in,or}_xxx family
We recently introduced a write_in_full() which would either write
the specified object or emit an error message and fail. In order
to fix the read side we now want to introduce a read_in_full()
but without an error emit. This patch cleans up the naming
of this family of calls:
1) convert the existing write_or_whine() to write_or_whine_pipe()
to better indicate its pipe specific nature,
2) convert the existing write_in_full() calls to write_or_whine()
to better indicate its nature,
3) introduce a write_in_full() providing a write or fail semantic,
and
4) convert write_or_whine() and write_or_whine_pipe() to use
write_in_full().
Signed-off-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
With this patch, cvsimport will skip commits made
in the last 10 minutes. The recent-ness test is of
5 minutes + cvsps fuzz window (5 minutes default).
When working with a CVS repository that is in use,
importing commits that are too recent can lead to
partially incorrect trees. This is mainly due to
- Commits that are within the cvsps fuzz window may later
be found to have affected more files.
- When performing incremental imports, clock drift between
the systems may lead to skipped commits.
This commit helps keep incremental imports of in-use
CVS repositories sane.
Signed-off-by: Martin Langhoff <martin@catalyst.net.nz> Signed-off-by: Junio C Hamano <junkio@cox.net>
Remove unnecessary git-rm --cached reference from status output
Since git-reset has learned restoring the absence of paths git-rm --cached is
no longer necessary. Therefore remove it from the cached content header hint.
Also remove the unfortunate wording 'Cached' from the header itself.
Signed-off-by: Jürgen Rühle <j-r@online.de> Signed-off-by: Junio C Hamano <junkio@cox.net>
* sp/mmap: (27 commits)
Spell default packedgitlimit slightly differently
Increase packedGit{Limit,WindowSize} on 64 bit systems.
Update packedGit config option documentation.
mmap: set FD_CLOEXEC for file descriptors we keep open for mmap()
pack-objects: fix use of use_pack().
Fix random segfaults in pack-objects.
Cleanup read_cache_from error handling.
Replace mmap with xmmap, better handling MAP_FAILED.
Release pack windows before reporting out of memory.
Default core.packdGitWindowSize to 1 MiB if NO_MMAP.
Test suite for sliding window mmap implementation.
Create pack_report() as a debugging aid.
Support unmapping windows on 'temporary' packfiles.
Improve error message when packfile mmap fails.
Ensure core.packedGitWindowSize cannot be less than 2 pages.
Load core configuration in git-verify-pack.
Fully activate the sliding window pack access.
Unmap individual windows rather than entire files.
Document why header parsing won't exceed a window.
Loop over pack_windows when inflating/accessing data.
...
* jr/status:
Improve cached content header of status output
Support --amend on initial commit in status output
Improve "nothing to commit" part of status output
Clarify syntax and role of git-add in status output
git-reset <tree> -- <path> restores absense of <path> in <tree>
When <path> exists in the index (either merged or unmerged), and
<tree> does not have it, git-reset should be usable to restore
the absense of it from the tree. This implements it.
diff-index --cached --raw: show tree entry on the LHS for unmerged entries.
This updates the way diffcore represents an unmerged pair
somewhat. It used to be that entries with mode=0 on both sides
were used to represent an unmerged pair, but now it has an
explicit flag. This is to allow diff-index --cached to report
the entry from the tree when the path is unmerged in the index.
This is used in updating "git reset <tree> -- <path>" to restore
absense of the path in the index from the tree.
reflog --fix-stale: do not check the same trees and commits repeatedly.
Since we use the reachability tracking machinery now, we should
keep the already checked trees and commits whose completeness is
known, to avoid checking the same thing over and over again.
The logic in an earlier round to detect reflog entries that
point at a broken commit was not sufficient. Just like we do
not trust presense of a commit during pack transfer (we trust
only our refs), we should not trust a commit's presense, even if
the tree of that commit is complete.
A repository that had reflog enabled on some of the refs that
was rewound and then run git-repack or git-prune from older
versions of git can have reflog entries that point at a commit
that still exist but lack commits (or trees and blobs needed for
that commit) between it and some commit that is reachable from
one of the refs.
This revamps the logic -- the definition of "broken commit"
becomes: a commit that is not reachable from any of the refs and
there is a missing object among the commit, tree, or blob
objects reachable from it that is not reachable from any of the
refs. Entries in the reflog that refer to such a commit are
expired.
Since this computation involves traversing all the reachable
objects, i.e. it has the same cost as 'git prune', it is enabled
only when a new option --fix-stale. Fortunately, once this is
run, we should not have to ever worry about missing objects,
because the current prune and pack-objects know about reflogs
and protect objects referred by them.
Unfortunately, this will be absolutely necessary to help people
migrate to the newer prune and repack.
Move traversal of reachable objects into a separate library.
This moves major part of builtin-prune into a separate file,
reachable.c. It is used to mark the objects that are reachable
from refs, and optionally from reflogs.
The patch looks very large, but if you look at it with diff -C,
which this message is formatted in, most of them are copied
lines and there are very little additions.
builtin-prune: separate ref walking from reflog walking.
This is necessary for the next step, because the reason I am
making the connectivity walker into a library is because I want
to use it for cleaning up stale reflog entries.
builtin-prune: make file-scope static struct to an argument.
I want to make the first part of 'git prune' that marks the
reachable objects callable as a library, so this starts the
first step toward the goal by making the callchain to pass
rev_info structure as an argument.
gitweb: Fix split patches output (e.g. file to symlink)
Do not replace /dev/null in two-line from-file/to-file diff header for
split patches ("split" patch mean more than one patch per one
diff-tree raw line) by a/file or b/file link.
Split patches differ from pair of deletion/creation patch in git diff
header: both a/file and b/file are hyperlinks, in all patches in a
split.
Signed-off-by: Jakub Narebski <jnareb@gmail.com> Signed-off-by: Junio C Hamano <junkio@cox.net>
gitweb: Fix errors in git_patchset_body for empty patches
We now do not skip over empty patches in git_patchset_body (where
empty means that they consist only of git diff header, and of extended
diff header, for example "pure rename" patch). This means that after
extended diff header there can be next patch (i.e. /^diff /) or end of
patchset, and not necessary patch body (i.e. /^--- /).
Signed-off-by: Jakub Narebski <jnareb@gmail.com> Signed-off-by: Junio C Hamano <junkio@cox.net>
gitweb: Fix error in git_patchest_body for file creation/deletion patch
$from_id, $to_id variables should be local per PATCH.
Fix error in git_patchset_body for file creation (deletion) patches,
where instead of /dev/null as from-file (to-file) diff header line, it
had link to previous file with current file name. This error occured
only if there was another patch before file creation (deletion) patch.
Signed-off-by: Jakub Narebski <jnareb@gmail.com> Signed-off-by: Junio C Hamano <junkio@cox.net>
Documentation/git-svn: clarify dcommit, rebase vs pull/merge
Clarify that dcommit creates a revision in SVN for every commit
in git. Also, add 'merge' to the rebase vs pull section because
git-merge is now a first-class UI.
Signed-off-by: Eric Wong <normalperson@yhbt.net> Signed-off-by: Junio C Hamano <junkio@cox.net>
This adds ability to do import "in chunks" (default 1000 revisions),
after each chunk git repo will be repacked. The option -R is used to
change default value of chunk size (or how often repository will
repacked).
Signed-off-by: Sasha Khapyorsky <sashak@voltaire.com> Signed-off-by: Junio C Hamano <junkio@cox.net>
Describe git-clone's actual behavior in the summary
If a branch other than "master" is checked out in the origin repository,
git-clone makes a local copy of that branch rather than the origin's
"master"
branch. This patch describes the actual behavior.
Signed-off-by: Steven Grimm <koreth@midwinter.com> Signed-off-by: Junio C Hamano <junkio@cox.net>
Set default "tar" umask to 002 and owner.group to root.root
In order to make the generated tar files more friendly to users who
extract them as root using GNU tar and its implied -p option, change
the default umask to 002 and change the owner name and group name to
root. This ensures that a) the extracted files and directories are
not world-writable and b) that they belong to user and group root.
Before they would have been assigned to a user and/or group named
git if it existed. This also answers the question in the removed
comment: uid=0, gid=0, uname=root, gname=root is exactly what we
want.
Normal users who let tar apply their umask while extracting are
only affected if their umask allowed the world to change their
files (e.g. a umask of zero). This case is so unlikely and strange
that we don't need to support it.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <junkio@cox.net>
Increase packedGit{Limit,WindowSize} on 64 bit systems.
If we have a 64 bit address space we can easily afford to commit
a larger amount of virtual address space to pack file access.
So on these platforms we should increase the default settings of
core.packedGit{Limit,WindowSize} to something that will better
handle very large projects.
Thanks to Andy Whitcroft for pointing out that we can safely
increase these defaults on such systems.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
The earlier test timestamp was too old; I forgot that the bare
unixtime integer had to be after Jan 1, 2000. This changes
test_tick to use the git-epoch timestamp.
Somehow we forgot to turn save_commit_buffer off while walking
the reachable objects. Releasing the memory for commit object
data that we do not use matters for large projects (for example,
about 90MB is saved while traversing linux-2.6 history).
It might be handy to have a single command that helps you manage
your configuration that relates to downloading from remote
repositories. This currently does only about 20% of what I want
it to do.
$ git remote
shows the list of 'remotes' you have defined somewhere, and
$ git remote origin
shows the details about the named remote (in this case
"origin"). How the branches are tracked, if you have a
tracking branch that is stale, etc.
$ git add another git://git.kernel.org/pub/...
defines the default remote.another.url and remote.another.fetch
entries just like a clone does; you can say "git fetch another"
afterwards.
For it to be useful, I think it should be enhanced to:
- check overlaps of tracking branches and warn;
- offer to remove stale tracking branches in one go;
- offer ways to remove or rename remote;
- offer ways to update an existing remote, perhaps have an
interactive mode;
Other enhancements might be also possible, but I do not think of
anything that is absolutely necessary other than the above right
now.
Blame "linenr" link jumps to previous state at "orig_lineno"
Blame currently displays the commit id which introduced a
block of one or more lines, the line numbers wrt the current
listing of the file and the file's line contents.
The commit id displayed is hyperlinked to the commit.
Currently the linenr links are hyperlinked to the same
commit id displayed to the left, which is _no_ different
than the block of lines displayed, since it is the _same
commit_ that is hyperlinked. And thus clicking on it leads
to the same state of the file for that chunk of
lines. I.e. data mining is not currently possible with
gitweb given a chunk of lines introduced by a commit.
This patch makes such data mining possible.
The line numbers are now hyperlinked to the parent of the
commit id of the block of lines. Furthermore they are
linked to the line where that block was introduced.
Thus clicking on a linenr link will show you the file's
line(s) state prior to the commit id you were viewing.
So clicking continually on a linenr link shows you how this
line and its line number changed over time, leading to the
initial commit where it was first introduced.
Signed-off-by: Luben Tuikov <ltuikov@yahoo.com> Signed-off-by: Junio C Hamano <junkio@cox.net>
gitweb: Fix "Use of uninitialized value" warning in git_tags_body
Fix "Use of uninitialized value" warning in git_tags_body generated
for lightweight tags of tree and blob object; those don't have age
($tag{'age'}) defined.
Signed-off-by: Jakub Narebski <jnareb@gmail.com> Signed-off-by: Junio C Hamano <junkio@cox.net>
git-svn: make --repack work consistently between fetch and multi-fetch
Since fetch reforks itself at most every 1000 revisions, we
need to update the counter in the parent process to have a
working count if we set our repack interval to be > ~1000
revisions. multi-fetch has always done this correctly
because of an extra process; now fetch uses the extra process;
as well.
While we're at it, only compile the $sha1 regex that checks for
repacking once.
Signed-off-by: Eric Wong <normalperson@yhbt.net> Signed-off-by: Junio C Hamano <junkio@cox.net>
It now requires at least one of the (trunk|branch|tags) arguments
(either from the command-line or in .git/config). Also we make
sure that anything that is passed as a URL ('help') in David's
case is actually a URL.
Thanks to David Kågedal for reporting this issue.
Signed-off-by: Eric Wong <normalperson@yhbt.net> Signed-off-by: Junio C Hamano <junkio@cox.net>
The variable named entry is allocated using malloc() and then
forgotten, it being shadowed by an automatic variable of the
same name. Fixing the array size at 3 worked so far because
the only caller of traverse_trees() needed only as much
entries. Simply remove the shadowing varaible and we're able
to traverse more than three trees and save stack space at the
same time!
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <junkio@cox.net>
pack-check.c::verify_packfile(): don't run SHA-1 update on huge data
Running the SHA1_Update() on the whole packfile in a single call
revealed an overflow problem we had in the SHA-1 implementation
on POWER architecture some time ago, which was fixed with commit b47f509b (June 19, 2006). Other SHA-1 implementations may have
a similar problem.
The sliding mmap() series already makes chunked calls to
SHA1_Update(), so this patch itself will become moot when it
graduates to "master", but in the meantime, run the hash
function in smaller chunks to prevent possible future problems.
My change in 190d7fdcf325bb444fa806f09ebbb403a4ae4ee6 had a small bug
found by Michael Krufky which caused the passed in hash value to be
ignored, so shortlog would only show the HEAD revision.
Signed-off-by: Robert Fitzsimons <robfitz@273k.net> Signed-off-by: Junio C Hamano <junkio@cox.net>
gitweb: There can be empty patches (in git_patchset_body)
We now do not skip over empty patches in git_patchset_body
(where empty means that they consist only of git diff header,
and of extended diff header), so uncomment branch of code dealing
with empty patches (patches which do not have even two-line
from/to header)
Signed-off-by: Jakub Narebski <jnareb@gmail.com> Signed-off-by: Junio C Hamano <junkio@cox.net>