This uses a simplified libxdiff setup to generate unified diffs _without_
doing fork/execve of GNU "diff".
This has several huge advantages, for example:
Before:
[torvalds@g5 linux]$ time git diff v2.6.16.. > /dev/null
real 0m24.818s
user 0m13.332s
sys 0m8.664s
After:
[torvalds@g5 linux]$ time git diff v2.6.16.. > /dev/null
real 0m4.563s
user 0m2.944s
sys 0m1.580s
and the fact that this should be a lot more portable (ie we can ignore all
the issues with doing fork/execve under Windows).
Perhaps even more importantly, this allows us to do diffs without actually
ever writing out the git file contents to a temporary file (and without
any of the shell quoting issues on filenames etc etc).
NOTE! THIS PATCH DOES NOT DO THAT OPTIMIZATION YET! I was lazy, and the
current "diff-core" code actually will always write the temp-files,
because it used to be something that you simply had to do. So this current
one actually writes a temp-file like before, and then reads it into memory
again just to do the diff. Stupid.
But if this basic infrastructure is accepted, we can start switching over
diff-core to not write temp-files, which should speed things up even
further, especially when doing big tree-to-tree diffs.
Now, in the interest of full disclosure, I should also point out a few
downsides:
- the libxdiff algorithm is different, and I bet GNU diff has gotten a
lot more testing. And the thing is, generating a diff is not an exact
science - you can get two different diffs (and you will), and they can
both be perfectly valid. So it's not possible to "validate" the
libxdiff output by just comparing it against GNU diff.
- GNU diff does some nice eye-candy, like trying to figure out what the
last function was, and adding that information to the "@@ .." line.
libxdiff doesn't do that.
- The libxdiff thing has some known deficiencies. In particular, it gets
the "\No newline at end of file" case wrong. So this is currently for
the experimental branch only. I hope Davide will help fix it.
That said, I think the huge performance advantage, and the fact that it
integrates better is definitely worth it. But it should go into a
development branch at least due to the missing newline issue.
Technical note: this is based on libxdiff-0.17, but I did some surgery to
get rid of the extraneous fat - stuff that git doesn't need, and seriously
cutting down on mmfile_t, which had much more capabilities than the diff
algorithm either needed or used. In this version, "mmfile_t" is just a
trivial <pointer,length> tuple.
That said, I tried to keep the differences to simple removals, so that you
can do a diff between this and the libxdiff origin, and you'll basically
see just things getting deleted. Even the mmfile_t simplifications are
left in a state where the diffs should be readable.
Apologies to Davide, whom I'd love to get feedback on this all from (I
wrote my own "fill_mmfile()" for the new simpler mmfile_t format: the old
complex format had a helper function for that, but I did my surgery with
the goal in mind that eventually we _should_ just do
which was really a nightmare with the old "helpful" mmfile_t, and really
is that easy with the new cut-down interfaces).
[ Btw, as any hawk-eye can see from the diff, this was actually generated
with itself, so it is "self-hosting". That's about all the testing it
has gotten, along with the above kernel diff, which eye-balls correctly,
but shows the newline issue when you double-check it with "git-apply" ]
Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
... to store parts of the path, if possible. This allows us to avoid
writing extended headers in certain cases (long pathes can only be
split at '/' chars).
Also adds a file to the test repo with a 100 chars long directory name.
Even old versions of tar that don't understand POSIX extended headers
should be able to handle this testcase.
Btw.: The longest path in the kernel tree currently has 70 chars.
Together with a 30 chars long prefix this would already cross the
field limit of 100 chars.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <junkio@cox.net>
... and use it initially to write global extended header records.
Improvements compared to the old write_header():
- Uses a struct ustar_header instead of hardcoded offsets.
- Takes one struct strbuf as path argument instead of a (basedir,
prefix, name) tuple.
- Not only writes the tar header, but also the contents of the
file, if any.
- Does not write directly into the ring buffer. This allows the
code to be layed out more naturally, because there is no more
ordering constraint. Before we had to first finish writing the
extended header, now we can construct the extended and normal
headers in parallel.
- The typeflag parameter has been replaced by (reasonable) magic
values. path == NULL indicates an extended header, additionally
sha1 == NULL means it is a global extended header.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <junkio@cox.net>
This was triggered by me testing the "@@" numbering shorthand by GNU
patch, which not only showed that git-apply thought it meant the number
was duplicated (when it means that the second number is 1), but my tests
showed than when git-apply mis-understood the number, it would then not
raise an alarm about it if the patch ended early.
Now, this doesn't actually _matter_, since with a three-line context, the
only case that "x,1" will be shorthanded to "x" is when x itself is 1 (in
which case git-apply got it right), but the fact that git-apply would also
silently accept truncated patches was a missed opportunity for additional
sanity-checking.
So make git-apply refuse to look at a patch fragment that ends early.
Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
send-email: Identify author at the top when sending e-mail
git-send-email did not check if the sender is the same as the
patch author. Follow the "From: at the beginning" convention to
propagate the patch author correctly.
This makes sure that many commands that take refs on the command
line to honor core.warnambiguousrefs configuration. Earlier,
the commands affected by this patch did not read the
configuration file.
Some documentation "options" were followed by independent preformatted
paragraphs. Now they are associated plain text paragraphs. The
difference is clear in the generated html.
git-pull: further safety while on tracking branch.
Running 'git pull' while on the tracking branch has a built-in
safety valve to fast-forward the index and working tree to match
the branch head, but it errs on the safe side too cautiously.
* jc/revlist:
rev-list --timestamp
git-apply: do not barf when updating an originally empty file.
http-push.c: squelch C90 warnings.
fix field width/precision warnings in blame.c
The traditional one created refs/origin by mistake, not
refs/heads/origin. Also it mistakenly failed to prevent
$origin from being listed twice in remotes/origin file.
The first was a simple typo where I put $yc instead of [yc $row].
The second was that I broke the logic for keeping up with fast
movement through the commits, e.g. when you select a commit and then
press down-arrow and let it autorepeat. That got broken when I
changed the merge diff display to use git-diff-tree --cc.
clone: record the remote primary branch with remotes/$origin/HEAD
This matches c51d13692d4e451c755dd7da3521c5db395df192 commit to
record the primary branch of the remote with a symbolic ref
remotes/$origin/HEAD. The user can later change it to point at
different branch to change the meaning of "$origin" shorthand.
get_sha1_basic(): try refs/... and finally refs/remotes/$foo/HEAD
This implements the suggestion by Jeff King to use
refs/remotes/$foo/HEAD to interpret a shorthand "$foo" to mean
the primary branch head of a tracked remote. clone needs to be
told about this convention as well.
* jc/name:
core.warnambiguousrefs: warns when "name" is used and both "name" branch and tag exists.
contrib/git-svn: allow rebuild to work on non-linear remote heads
http-push: don't assume char is signed
http-push: add support for deleting remote branches
Be verbose when !initial commit
Fix multi-paragraph list items in OPTIONS section
http-fetch: nicer warning for a server with unreliable 404 status
* --use-separate-remote uses .git/refs/remotes/$origin/
directory to keep track of the upstream branches.
* The $origin above defaults to "origin" as usual, but the
existing "-o $origin" option can be used to override it.
I am not yet convinced if we should make "$origin" the synonym to
"refs/remotes/$origin/$name" where $name is the primary branch
name of $origin upstream, nor if so how we should decide which
upstream branch is the primary one, but that is more or less
orthogonal to what the clone does here.
contrib/git-svn: allow rebuild to work on non-linear remote heads
Because committing back to an SVN repository from different
machines can result in different lineages, two different
repositories running git-svn can result in different commit
SHA1s (but of the same tree). Sometimes trees that are tracked
independently are merged together (usually via children),
resulting in non-unique git-svn-id: lines in rev-list.
Signed-off-by: Eric Wong <normalperson@yhbt.net> Signed-off-by: Junio C Hamano <junkio@cox.net>
http-push: add support for deleting remote branches
Processes new command-line arguments -d and -D to remove a remote branch
if the following conditions are met:
- one branch name is present on the command line
- the specified branch name matches exactly one remote branch name
- the remote HEAD is a symref
- the specified branch is not the remote HEAD
- the remote HEAD resolves to an object that exists locally (-d only)
- the specified branch resolves to an object that exists locally (-d only)
- the specified branch is an ancestor of the remote HEAD (-d only)
Signed-off-by: Nick Hengeveld <nickh@reactrix.com> Signed-off-by: Junio C Hamano <junkio@cox.net>
verbose option in git-commit.sh lead us to run git-diff-index, which
needs a commit-ish we are making diff against. When we are commiting
the fist set, we obviously don't have any commit-ish in the repo. So
we just skip the git-diff-index run.
It might be possible to produce diff against empty but do we need
that?
Signed-off-by: Yasushi SHOJI <yashi@atmark-techno.com> Signed-off-by: Junio C Hamano <junkio@cox.net>
This patch makes the html docs right, makes the asciidoc docs a bit odd
but consistent with what is there already, and makes the manpages look
OK using docbook-xsl 1.68, but miss a paragraph separator when using 1.69.
For the manpages, current is like
-A <author_file>
Read a file with lines on the form
username = User's Full Name <email@addr.es>
and use "User's Full Name <email@addr.es>" as the GIT
With this patch, docbook-xsl v1.68 looks like
-A <author_file>
Read a file with lines on the form
username = User's Full Name <email@addr.es>
and use "User's Full Name <email@addr.es>" as the GIT author and
while docbook-xsl v1.69 becomes
-A <author_file>
Read a file with lines on the form
username = User's Full Name <email@addr.es>
and use "User's Full Name <email@addr.es>" as the GIT author and
The extra indentation is to keep the v1.69 manpage looking sane.
http-fetch: nicer warning for a server with unreliable 404 status
When a repository otherwise properly prepared is served by a
dumb HTTP server that sends "No such page" output with 200
status for human consumption to a request for a page that does
not exist, the users will get an alarming "File X corrupt" error
message. Hint that they might be dealing with such a server at
the end and suggest running fsck-objects to check if the result
is OK (the pack-fallback code does the right thing in this case
so unless a loose object file was actually corrupt the result
should check OK).
* jc/merge:
git-merge knows some strategies want to skip trivial merges
generate-cmdlist: style cleanups.
Add missing semicolon to sed command.
unpack_delta_entry(): reduce memory footprint.
git.el: Added a function to diff against the other heads in a merge.
git.el: Get the default user name and email from the repository config.
git.el: More robust handling of subprocess errors when returning strings.
* A new flag --reference can be used to name a local repository
that is to be used as an alternate. This is in response to
an inquiry by James Cloos in the message on the list
<m3r74ykue7.fsf@lugabout.cloos.reno.nv.us>.
* A new flag --use-separate-remote stops contaminating local
branch namespace by upstream branch names. The upstream
branch heads are copied in .git/refs/remotes/ instead of
.git/refs/heads/ and .git/remotes/origin file is set up to
reflect this as well. It requires to have fetch/pull update
to understand .git/refs/remotes by Eric Wong to further
update the repository cloned this way.
For the former change, git-fetch-pack is taught a new flag --all
to fetch from all the remote heads. Nobody uses the git-clone-pack
with this change, so we could deprecate the command, but removal
of the command will be left to a separate round.
Currently we unpack the delta data from the pack and then unpack
the base object to apply that delta data to it. When getting an
object that is deeply deltified, we can reduce memory footprint
by unpacking the base object first and then unpacking the delta
data, because we will need to keep at most one delta data in
memory that way.
git.el: Get the default user name and email from the repository config.
If user name or email are not set explicitly, get them from the
user.name and user.email configuration values before falling back to
the Emacs defaults.
Signed-off-by: Alexandre Julliard <julliard@winehq.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
The point where the line for a parent joins to the first child
shown is visually different from the lines to the other children,
because the line doesn't branch, but terminates at the child.
Because of this, we now treat the first child a little differently
in the optimizer, and we draw its link in drawlineseg rather
than drawparentlinks. This improves the appearance of the graph.
The updated code reads the tip of the current branch before and
after the import runs, but forgot to chomp what we read from the
command. The read-tree command did not them with the trailing
LF.
gitk: Make downward-pointing arrows end in vertical line segment
It seems Tk 8.4 can't draw arrows on diagonal line segments. This
adds code to the optimizer to make the last bit of a line go vertically
before being terminated with an arrow pointing downwards, so that
it will be drawn correctly by Tk 8.4.
gitk: Don't change cursor at end of layout if find in progress
If the user is doing a find in files or patches, which changed the
cursor to a watch, don't change it back to a pointer when we reach
the end of laying out the graph.
* master:
3% tighter packs for free
Rewrite synopsis to clarify the two primary uses of git-checkout.
Fix minor typo.
Reference git-commit-tree for env vars.
Clarify git-rebase example commands.
Document the default source of template files.
Call out the two different uses of git-branch and fix a typo.
Add git-show reference
cvsimport: honor -i and non -i upon subsequent imports
Documentation says -i is "import only", so without it,
subsequent import should update the current branch and working
tree files in a sensible way.
"A sensible way" defined by this commit is "act as if it is a
git pull from foreign repository which happens to be CVS not
git". So:
- If importing into the current branch (note that cvsimport
requires the tracking branch is pristine -- you checked out
the tracking branch but it is your responsibility not to make
your own commits there), fast forward the branch head and
match the index and working tree using two-way merge, just
like "git pull" does.
- If importing into a separate tracking branch, update that
branch head, and merge it into your current branch, again,
just like "git pull" does.
Before this patch git-blame <directory> gave non-sensible output. (It
assigned blame to some random file in <directory>) Abort with an error
message instead.
Signed-off-by: Fredrik Kuivinen <freku045@student.liu.se> Signed-off-by: Junio C Hamano <junkio@cox.net>
As pointed out by Junio, it may be dangerous to cut off people's names
after 15 bytes. If the name is encoded in an encoding which uses more
than one byte per code point we may end up with outputting garbage.
Instead of trying to do something smart, just output the entire name.
We don't gain much screen space by chopping it off anyway.
Furthermore, only output the file name if we actually found any
renames.
Signed-off-by: Fredrik Kuivinen <freku045@student.liu.se> Signed-off-by: Junio C Hamano <junkio@cox.net>
When fetching alternates, http-fetch may reuse the slot to fetch non-http
alternates if http-alternates does not exist. When doing so, it now needs
to update the slot's finished status so run_active_slot waits for the
non-http alternates request to finish.
Signed-off-by: Nick Hengeveld <nickh@reactrix.com> Signed-off-by: Junio C Hamano <junkio@cox.net>
* fk/blame:
blame: Rename detection (take 2)
rev-lib: Make it easy to do rename tracking (take 2)
Make it possible to not clobber object.util in sort_in_topological_order (take 2)
The "score" calculation for diffcore-rename was totally broken.
It scaled "score" as
score = src_copied * MAX_SCORE / dst->size;
which means that you got a 100% similarity score even if src and dest were
different, if just every byte of dst was copied from src, even if source
was much larger than dst (eg we had copied 85% of the bytes, but _deleted_
the remaining 15%).
That's clearly bogus. We should do the score calculation relative not to
the destination size, but to the max size of the two.
This seems to fix it.
Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>