This uses a simplified libxdiff setup to generate unified diffs _without_
doing fork/execve of GNU "diff".
This has several huge advantages, for example:
Before:
[torvalds@g5 linux]$ time git diff v2.6.16.. > /dev/null
real 0m24.818s
user 0m13.332s
sys 0m8.664s
After:
[torvalds@g5 linux]$ time git diff v2.6.16.. > /dev/null
real 0m4.563s
user 0m2.944s
sys 0m1.580s
and the fact that this should be a lot more portable (ie we can ignore all
the issues with doing fork/execve under Windows).
Perhaps even more importantly, this allows us to do diffs without actually
ever writing out the git file contents to a temporary file (and without
any of the shell quoting issues on filenames etc etc).
NOTE! THIS PATCH DOES NOT DO THAT OPTIMIZATION YET! I was lazy, and the
current "diff-core" code actually will always write the temp-files,
because it used to be something that you simply had to do. So this current
one actually writes a temp-file like before, and then reads it into memory
again just to do the diff. Stupid.
But if this basic infrastructure is accepted, we can start switching over
diff-core to not write temp-files, which should speed things up even
further, especially when doing big tree-to-tree diffs.
Now, in the interest of full disclosure, I should also point out a few
downsides:
- the libxdiff algorithm is different, and I bet GNU diff has gotten a
lot more testing. And the thing is, generating a diff is not an exact
science - you can get two different diffs (and you will), and they can
both be perfectly valid. So it's not possible to "validate" the
libxdiff output by just comparing it against GNU diff.
- GNU diff does some nice eye-candy, like trying to figure out what the
last function was, and adding that information to the "@@ .." line.
libxdiff doesn't do that.
- The libxdiff thing has some known deficiencies. In particular, it gets
the "\No newline at end of file" case wrong. So this is currently for
the experimental branch only. I hope Davide will help fix it.
That said, I think the huge performance advantage, and the fact that it
integrates better is definitely worth it. But it should go into a
development branch at least due to the missing newline issue.
Technical note: this is based on libxdiff-0.17, but I did some surgery to
get rid of the extraneous fat - stuff that git doesn't need, and seriously
cutting down on mmfile_t, which had much more capabilities than the diff
algorithm either needed or used. In this version, "mmfile_t" is just a
trivial <pointer,length> tuple.
That said, I tried to keep the differences to simple removals, so that you
can do a diff between this and the libxdiff origin, and you'll basically
see just things getting deleted. Even the mmfile_t simplifications are
left in a state where the diffs should be readable.
Apologies to Davide, whom I'd love to get feedback on this all from (I
wrote my own "fill_mmfile()" for the new simpler mmfile_t format: the old
complex format had a helper function for that, but I did my surgery with
the goal in mind that eventually we _should_ just do
which was really a nightmare with the old "helpful" mmfile_t, and really
is that easy with the new cut-down interfaces).
[ Btw, as any hawk-eye can see from the diff, this was actually generated
with itself, so it is "self-hosting". That's about all the testing it
has gotten, along with the above kernel diff, which eye-balls correctly,
but shows the newline issue when you double-check it with "git-apply" ]
Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
This was triggered by me testing the "@@" numbering shorthand by GNU
patch, which not only showed that git-apply thought it meant the number
was duplicated (when it means that the second number is 1), but my tests
showed than when git-apply mis-understood the number, it would then not
raise an alarm about it if the patch ended early.
Now, this doesn't actually _matter_, since with a three-line context, the
only case that "x,1" will be shorthanded to "x" is when x itself is 1 (in
which case git-apply got it right), but the fact that git-apply would also
silently accept truncated patches was a missed opportunity for additional
sanity-checking.
So make git-apply refuse to look at a patch fragment that ends early.
Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
send-email: Identify author at the top when sending e-mail
git-send-email did not check if the sender is the same as the
patch author. Follow the "From: at the beginning" convention to
propagate the patch author correctly.
Some documentation "options" were followed by independent preformatted
paragraphs. Now they are associated plain text paragraphs. The
difference is clear in the generated html.
git-pull: further safety while on tracking branch.
Running 'git pull' while on the tracking branch has a built-in
safety valve to fast-forward the index and working tree to match
the branch head, but it errs on the safe side too cautiously.
contrib/git-svn: allow rebuild to work on non-linear remote heads
Because committing back to an SVN repository from different
machines can result in different lineages, two different
repositories running git-svn can result in different commit
SHA1s (but of the same tree). Sometimes trees that are tracked
independently are merged together (usually via children),
resulting in non-unique git-svn-id: lines in rev-list.
Signed-off-by: Eric Wong <normalperson@yhbt.net> Signed-off-by: Junio C Hamano <junkio@cox.net>
http-push: add support for deleting remote branches
Processes new command-line arguments -d and -D to remove a remote branch
if the following conditions are met:
- one branch name is present on the command line
- the specified branch name matches exactly one remote branch name
- the remote HEAD is a symref
- the specified branch is not the remote HEAD
- the remote HEAD resolves to an object that exists locally (-d only)
- the specified branch resolves to an object that exists locally (-d only)
- the specified branch is an ancestor of the remote HEAD (-d only)
Signed-off-by: Nick Hengeveld <nickh@reactrix.com> Signed-off-by: Junio C Hamano <junkio@cox.net>
verbose option in git-commit.sh lead us to run git-diff-index, which
needs a commit-ish we are making diff against. When we are commiting
the fist set, we obviously don't have any commit-ish in the repo. So
we just skip the git-diff-index run.
It might be possible to produce diff against empty but do we need
that?
Signed-off-by: Yasushi SHOJI <yashi@atmark-techno.com> Signed-off-by: Junio C Hamano <junkio@cox.net>
This patch makes the html docs right, makes the asciidoc docs a bit odd
but consistent with what is there already, and makes the manpages look
OK using docbook-xsl 1.68, but miss a paragraph separator when using 1.69.
For the manpages, current is like
-A <author_file>
Read a file with lines on the form
username = User's Full Name <email@addr.es>
and use "User's Full Name <email@addr.es>" as the GIT
With this patch, docbook-xsl v1.68 looks like
-A <author_file>
Read a file with lines on the form
username = User's Full Name <email@addr.es>
and use "User's Full Name <email@addr.es>" as the GIT author and
while docbook-xsl v1.69 becomes
-A <author_file>
Read a file with lines on the form
username = User's Full Name <email@addr.es>
and use "User's Full Name <email@addr.es>" as the GIT author and
The extra indentation is to keep the v1.69 manpage looking sane.
http-fetch: nicer warning for a server with unreliable 404 status
When a repository otherwise properly prepared is served by a
dumb HTTP server that sends "No such page" output with 200
status for human consumption to a request for a page that does
not exist, the users will get an alarming "File X corrupt" error
message. Hint that they might be dealing with such a server at
the end and suggest running fsck-objects to check if the result
is OK (the pack-fallback code does the right thing in this case
so unless a loose object file was actually corrupt the result
should check OK).
Currently we unpack the delta data from the pack and then unpack
the base object to apply that delta data to it. When getting an
object that is deeply deltified, we can reduce memory footprint
by unpacking the base object first and then unpacking the delta
data, because we will need to keep at most one delta data in
memory that way.
git.el: Get the default user name and email from the repository config.
If user name or email are not set explicitly, get them from the
user.name and user.email configuration values before falling back to
the Emacs defaults.
Signed-off-by: Alexandre Julliard <julliard@winehq.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
The updated code reads the tip of the current branch before and
after the import runs, but forgot to chomp what we read from the
command. The read-tree command did not them with the trailing
LF.
cvsimport: honor -i and non -i upon subsequent imports
Documentation says -i is "import only", so without it,
subsequent import should update the current branch and working
tree files in a sensible way.
"A sensible way" defined by this commit is "act as if it is a
git pull from foreign repository which happens to be CVS not
git". So:
- If importing into the current branch (note that cvsimport
requires the tracking branch is pristine -- you checked out
the tracking branch but it is your responsibility not to make
your own commits there), fast forward the branch head and
match the index and working tree using two-way merge, just
like "git pull" does.
- If importing into a separate tracking branch, update that
branch head, and merge it into your current branch, again,
just like "git pull" does.
Before this patch git-blame <directory> gave non-sensible output. (It
assigned blame to some random file in <directory>) Abort with an error
message instead.
Signed-off-by: Fredrik Kuivinen <freku045@student.liu.se> Signed-off-by: Junio C Hamano <junkio@cox.net>
As pointed out by Junio, it may be dangerous to cut off people's names
after 15 bytes. If the name is encoded in an encoding which uses more
than one byte per code point we may end up with outputting garbage.
Instead of trying to do something smart, just output the entire name.
We don't gain much screen space by chopping it off anyway.
Furthermore, only output the file name if we actually found any
renames.
Signed-off-by: Fredrik Kuivinen <freku045@student.liu.se> Signed-off-by: Junio C Hamano <junkio@cox.net>
When fetching alternates, http-fetch may reuse the slot to fetch non-http
alternates if http-alternates does not exist. When doing so, it now needs
to update the slot's finished status so run_active_slot waits for the
non-http alternates request to finish.
Signed-off-by: Nick Hengeveld <nickh@reactrix.com> Signed-off-by: Junio C Hamano <junkio@cox.net>
* fk/blame:
blame: Rename detection (take 2)
rev-lib: Make it easy to do rename tracking (take 2)
Make it possible to not clobber object.util in sort_in_topological_order (take 2)
Marco Costalba reports that --remove-empty omits the commit that
created paths we are interested in. try_to_simplify_commit()
logic was dropping a parent we introduced those paths against,
which I think is not what we meant. Instead, this makes such
parent parentless.
Marco Costalba reports that --remove-empty omits the commit that
created paths we are interested in. try_to_simplify_commit()
logic was dropping a parent we introduced those paths against,
which I think is not what we meant. Instead, this marks such
parent uninteresting. The traversal does not go beyond that
parent as advertised, but we still say that the current commit
changed things from that parent.
annotate-tests: override VISUAL when running tests.
The tests hang for me waiting for Emacs with its output directed
somewhere strage, because I hedged my bets and set both EDITOR and
VISUAL to run Emacs.
Signed-off-by: Mark Wooding <mdw@distorted.org.uk> Signed-off-by: Junio C Hamano <junkio@cox.net>
imap-send: cleanup execl() call to use NULL sentinel instead of 0
Some versions of gcc check that calls to the exec() family have the proper
sentinel for variadic calls. This should be (char *) NULL according to the
man page. Although for all other purposes the 0 is equivalent, gcc
nevertheless does emit a warning for 0 and not for NULL. This also makes the
usage consistent throughout git.
The whitespace in function calls throughout imap-send.c has its own style,
so I left it that way.
RPM, at least on Fedora boxes, automatically creates a
dependency for any perl "use" lines, and one of the help text
lines unfortunately begins like this:
-S, --rev-file revs-file
use revs from revs-file instead of calling git-rev-list
RPM gets confused and creates a false dependecy for the
nonexistent perl package "revs". Obviously this creates a
problem when someone goes to install the git-core rpm.
Since other help sentences all start with capital letter, make
this one match them by upcasing "Use". As a side effect, RPM
stops getting confused.
If info/refs exists on the remote, get a lock on info/refs, make sure that
there is a local copy of the object referenced in each remote ref (in case
someone else added a tag we don't have locally), do all the refspec updates,
and generate and send an updated info/refs file.
Replace single-use functions with one that can get a list of remote
collections and pass file/directory information to user-defined functions
for processing.
Incorporate into http-push a fix related to accessing slot results after
the slot was reused, and fix a case in run_active_slot where a
finished slot wasn't detected if the slot was reused.
rev-lib: Make it easy to do rename tracking (take 2)
prune_fn in the rev_info structure is called in place of
try_to_simplify_commit. This makes it possible to do rename tracking
with a custom try_to_simplify_commit-like function.
This commit also introduces init_revisions which initialises the rev_info
structure with default values.
Signed-off-by: Fredrik Kuivinen <freku045@student.liu.se> Signed-off-by: Junio C Hamano <junkio@cox.net>
git-imap-send drops a patch series generated by git-format-patch into an
IMAP folder. This allows patch submitters to send patches through their
own mail program.
git-imap-send uses the following values from the GIT repository
configuration:
The target IMAP folder:
[imap]
Folder = "INBOX.Drafts"
A command to open an ssh tunnel to the imap mail server.
try_to_simplify_commit(): do not skip inspecting tree change at boundary.
When git-rev-list (and git-log) collapsed ancestry chain to
commits that touch specified paths, we failed to inspect and
notice tree changes when we are about to hit uninteresting
parent. This resulted in "git rev-list since.. -- file" to
always show the child commit after the lower bound, even if it
does not touch the file. This commit fixes it.
The fsck-objects command (back then it was called fsck-cache)
used to complain if objects referred to by files in .git/refs/
or objects stored in files under .git/objects/??/ were not found
as stand-alone SHA1 files (i.e. found in alternate object pools
or packed archives stored under .git/objects/pack). Back then,
packs and alternates were new curiosity and having everything as
loose objects were the norm.
When we adjusted the behaviour of fsck-cache to consider objects
found in packs are OK, we introduced the --standalone flag as a
backward compatibility measure.
It still correctly checks if your repository is complete and
consists only of loose objects, so in that sense it is doing the
"right" thing, but checking that is pointless these days. This
commit removes --standalone flag.
contrib/git-svn: fix a harmless warning on rebuild (with old repos)
It's only for repositories that were imported with very early
versions of git-svn. Unfortunately, some of those repos are out
in the wild already, so fix this warning.
Signed-off-by: Eric Wong <normalperson@yhbt.net> Signed-off-by: Junio C Hamano <junkio@cox.net>
'svn info' doesn't work with URLs in svn <= 1.1. Now we
only run svn info in local directories.
As a side effect, this should also work better for 'init' off
directories that are no longer in the latest revision of the
repository.
svn checkout -r<revision> arguments are fixed.
Newer versions of svn (1.2.x) seem to need URL@REV as well as
-rREV to checkout a particular revision...
Add an example in the manpage of how to track directory that has
been moved since its initial revision.
A huge thanks to Yann Dirson for the bug reporting and testing
my original patch. Thanks also to Junio C Hamano for suggesting
a safer way to use git-rev-parse.
Signed-off-by: Eric Wong <normalperson@yhbt.net> Signed-off-by: Junio C Hamano <junkio@cox.net>
revision.c:make_parents_uninteresting() is exponential with the number
of merges in the tree. That's fine -- unless some other part of git
already has pulled the whole commit tree into memory ...
diff-delta: bound hash list length to avoid O(m*n) behavior
The diff-delta code can exhibit O(m*n) behavior with some patological
data set where most hash entries end up in the same hash bucket.
To prevent this, a limit is imposed to the number of entries that can
exist in the same hash bucket.
Because of the above the code is a tiny bit more expensive on average,
even if some small optimizations were added as well to atenuate the
overhead. But the problematic samples used to diagnoze the issue are now
orders of magnitude less expensive to process with only a slight loss in
compression.
Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
Since I've started using the "merge.summary" flag in my repo, my merge
messages look nicer, but I dislike how I get notifications of merges
within merges.
So I'd suggest this trivial change..
Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
This brings http-push functionality more in line with the ssh/git version,
by borrowing bits from send-pack and rev-list to process refspecs and
revision history in more standard ways. Also, the status of remote objects
is determined using PROPFIND requests for the object directory rather than
HEAD requests for each object - while it may be less efficient for small
numbers of objects, this approach is able to get the status of all remote
loose objects in a maximum of 256 requests.
There was a misguided logic to overly prefer using objects that
we are not going to pack as the base object. This was
unnecessary. It does not matter to the unpacking side where the
base object is -- it matters more to make the resulting delta
smaller.