Invalidate cache-tree entries for touched paths in git-apply.
This updates git-apply to maintain cache-tree information. With
this and the previous write-tree patch, repeated "apply --index"
followed by "write-tree" on a huge tree will hopefully become
faster.
The updated write-tree reads from $GIT_DIR/index.aux to pick up
subtree objects information, updates the cache-tree with the
index, and updates index.aux file after writing a tree out of
the index file.
Until update-index and other programs that modify the index are
updated to maintain index.aux file, the index.aux file written
by the last write-tree will become stale immediately after they
update the index, which will result in the whole tree
recomputation just like the original write-tree.
The idea is to convert those commands to invalidate cache-tree
whenever they touch the index entries, and write updated
index.aux out. After the index is updated with them, write-tree
will be able to reuse the parts of the cache-tree that have not
been touched.
The cache_tree data structure is to cache tree object names that
would result from the current index file.
The idea is to have an optional file to record each tree object
name that corresponds to a directory path in the cache when we
run write_cache(), and read it back when we run read_cache().
During various index manupulations, we selectively invalidate
the parts so that the next write-tree can bypass regenerating
tree objects for unchanged parts of the directory hierarchy.
We could perhaps make the cache-tree data an optional part of
the index file, but that would involve the index format updates,
so unless we need it for performance reasons, the current plan
is to use a separate file, $GIT_DIR/index.aux to store this
information and link it with the index file with the checksum
that is already used for index file integrity check.
Split up builtin commands into separate files from git.c
Right now it split it into "builtin-log.c" for log-related commands
("log", "show" and "whatchanged"), and "builtin-help.c" for the
informational commands (usage printing and "help" and "version").
This just makes things easier to read, I find.
Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
* fix:
fix pack-object buffer size
mailinfo: decode underscore used in "Q" encoding properly.
Reintroduce svn pools to solve the memory leak.
pack-objects: do not stop at object that is "too small"
mailinfo: decode underscore used in "Q" encoding properly.
Quoted-Printable (RFC 2045) and the "Q" encoding (RFC 2047) are
subtly different; the latter is used on the mail header and an
underscore needs to be decoded to 0x20.
pack-objects: do not stop at object that is "too small"
Because we sort the delta window by name-hash and then size,
hitting an object that is too small to consider as a delta base
for the current object does not mean we do not have better
candidate in the window beyond it.
Noticed by Shawn Pearce, analyzed by Nico, Linus and me.
When running "git commit --amend" only to fix the commit log
message without any content change, we mistakenly showed the
git-status output that says "nothing to commit" without
commenting it out.
If you have already run update-index but you want to amend the
top commit, "git commit --amend --only" without any paths should
have worked, because --only means "starting from the base
commit, update-index these paths only to prepare the index to
commit, and perform the commit". However, we refused -o without
paths.
When a verbatim rename or copy is detected, we did not show
anything on the "diff --stat" for the filepair. This makes it
to show the rename information.
* lt/xsha1:
get_tree_entry(): make it available from tree-walk
sha1_name.c: no need to include diff.h; tree-walk.h will do.
sha1_name.c: prepare to make get_tree_entry() reusable from others.
get_sha1() shorthands for blob/tree objects
Several <<< or === or >>> characters at the beginning of a line
is very likely to be leftover conflict markers from a failed
automerge the user resolved incorrectly, so detect them.
As usual, this can be defeated with "git commit --no-verify" if
you really do want to have those files, just like changes that
introduce trailing whitespaces.
We said "fix up by hand" after failed automerge, which was a big
"Huh? Now what?". Be a bit more explicit without being too
verbose. Suggested by Carl Worth.
I personally prefer "ignore_merges" to be on by default, because quite
often the merge diff is distracting and not interesting. That's true both
with "-p" and with "--stat" output.
If you want output from merges, you can trivially use the "-m", "-c" or
"--cc" flags to tell that you're interested in merges, which also tells
the diff generator what kind of diff to do (for --stat, any of the three
will do, of course, but they differ for plain patches or for
--patch-with-stat).
This trivial patch just removes the two lines that tells "git log" not to
ignore merges. It will still show the commit log message, of course, due
to the "always_show_header" part.
Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
Allow "git repack" users to specify repacking window/depth
.. but don't even bother documenting it. I don't think any normal person
is supposed to ever really care, but it simplifies testing when you want
to use the "git repack" wrapper rather than forcing you to use the core
programs (which already do support the window/depth arguments, of course).
Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
This is a fairly straightforward patch to allow "get_sha1()" to also have
shorthands for tree and blob objects.
The syntax is very simple and intuitive: you can specify a tree or a blob
by simply specifying <revision>:<path>, and get_sha1() will do the SHA1
lookup from the tree for you.
You can currently do it with "git ls-tree <rev> <path>" and parsing the
output, but that's actually pretty awkward.
With this, you can do something like
git cat-file blob v1.2.4:Makefile
to get the contents of "Makefile" at revision v1.2.4.
Now, this isn't necessarily something you really need all that often, but
the concept itself is actually pretty powerful. We could, for example,
allow things like
to see the difference between two arbitrary files in two arbitrary
revisions. To do that, the only thing we'd have to do is to make
git-diff-tree accept two blobs to diff, in addition to the two trees it
now expects.
When I unified the revision argument parsing, I introduced a simple bug
wrt tags that had been marked uninteresting. When it was preparing for the
revision walk, it would mark all the parent commits of an uninteresting
tag correctly uninteresting, but it would forget about the commit itself.
This means that when I just did my 2.6.17-rc2 release, and my scripts
generated the log for "v2.6.17-rc1..v2.6.17-rc2", everything was fine,
except the commit pointed to by 2.6.17-rc1 (which shouldn't have been
there) was included. Even though it should obviously have been marked as
being uninteresting.
Not a huge deal, and the fix is trivial.
Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
* lt/logopt:
Fix "git log --stat": make sure to set recursive with --stat.
combine-diff: show diffstat with the first parent.
git.c: LOGSIZE is unused after log printing cleanup.
Log message printout cleanups (#3): fix --pretty=oneline
Log message printout cleanups (#2)
Log message printout cleanups
rev-list --header: output format fix
Fixes for option parsing
log/whatchanged/show - log formatting cleanup.
Simplify common default options setup for built-in log family.
Tentative built-in "git show"
Built-in git-whatchanged.
rev-list option parser fix.
Split init_revisions() out of setup_revisions()
Fix up rev-list option parsing.
Fix up default abbrev in setup_revisions() argument parser.
Common option parsing for "git log --diff" and friends
combine-diff: show diffstat with the first parent.
Asking for stat (either with --stat or --patch-with-stat) gives
you diffstat for the first parent, even under combine-diff.
While the combined patch is useful to highlight the complexity
and interaction of the parts touched by all branches when
reviewing a merge commit, diffstat is a tool to assess the
extent of damage the merge brings in, and showing stat with the
first parent is more sensible than clever per-parent diffstat.
This option is very special, since pretty_print_commit() will _remove_
the newline at the end of it, so we want to have an extra separator
between the things.
I added a honking big comment this time, so that (a) I don't forget this
_again_ (I broke "oneline" several times during this printout cleanup),
and so that people can understand _why_ the code does what it does.
Now, arguably the alternate fix is to always have the '\n' at the end in
pretty-print-commit, but git-rev-list depends on the current behaviour
(but we could have git-rev-list remove it, whatever).
With the big comment, the code hopefully doesn't get broken again. And now
things like
git log --pretty=oneline --cc --patch-with-stat
works (even if that is admittedly a totally insane combination: if you
want the patch, having the "oneline" log format is just crazy, but hey,
it _works_. Even insane people are people).
Here's a further patch on top of the previous one with cosmetic
improvements (no "real" code changes, just trivial updates):
- it gets the "---" before a diffstat right, including for the combined
merge case. Righ now the logic is that we always use "---" when we have
a diffstat, and an empty line otherwise. That's how I visually prefer
it, but hey, it can be tweaked later.
- I made "diff --cc/combined" add the "---/+++" header lines too. The
thing won't be mistaken for a valid diff, since the "@@" lines have too
many "@" characters (three or more), but it just makes it visually
match a real diff, which at least to me makes a big difference in
readability. Without them, it just looks very "wrong".
I guess I should have taken the filename from each individual entry
(and had one "---" file per parent), but I didn't even bother to try to
see how that works, so this was the simple thing.
With this, doing a
git log --cc --patch-with-stat
looks quite readable, I think. The only nagging issue - as far as I'm
concerned - is that diffstats for merges are pretty questionable the way
they are done now. I suspect it would be better to just have the _first_
diffstat, and always make the merge diffstat be the one for "result
against first parent".
On Sun, 16 Apr 2006, Junio C Hamano wrote:
>
> In the mid-term, I am hoping we can drop the generate_header()
> callchain _and_ the custom code that formats commit log in-core,
> found in cmd_log_wc().
Ok, this was nastier than expected, just because the dependencies between
the different log-printing stuff were absolutely _everywhere_, but here's
a patch that does exactly that.
The patch is not very easy to read, and the "--patch-with-stat" thing is
still broken (it does not call the "show_log()" thing properly for
merges). That's not a new bug. In the new world order it _should_ do
something like
if (rev->logopt)
show_log(rev, rev->logopt, "---\n");
but it doesn't. I haven't looked at the --with-stat logic, so I left it
alone.
That said, this patch removes more lines than it adds, and in particular,
the "cmd_log_wc()" loop is now a very clean:
so it doesn't get much prettier than this. All the complexity is entirely
hidden in log-tree.c, and any code that needs to flush the log literally
just needs to do the "if (rev->logopt) show_log(...)" incantation.
I had to make the combined_diff() logic take a "struct rev_info" instead
of just a "struct diff_options", but that part is pretty clean.
This does change "git whatchanged" from using "diff-tree" as the commit
descriptor to "commit", and I changed one of the tests to reflect that new
reality. Otherwise everything still passes, and my other tests look fine
too.
Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
The switch is inside an if statement which is false if
the character is ' '. Either the if should be <=' '
instead of <' ', or the case should be removed as it could
be misleading.
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Signed-off-by: Junio C Hamano <junkio@cox.net>
The strncmp for ACK was ACK does not include the final space.
Presumably either we should either remove the trailing space,
or compare 4 chars (as this patch does).
'path' is sometimes strdup'ed, but never freed.
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Signed-off-by: Junio C Hamano <junkio@cox.net>
rev-list --boundary: show boundary commits even when limited otherwise.
The boundary commits are shown for UI like gitk to draw them as
soon as topo-order sorting allows, and should not be omitted by
get_revision() filtering logic. As long as their immediate
child commits are shown, we should not filter them out.
gitk: Fix bug caused by missing commitlisted elements
This bug was reported by Yann Dirson, and results in an 'Error:
expected boolean value but got ""' dialog when scrolling to the bottom
of the graph under some circumstances. The issue is that git-rev-list
isn't outputting all the boundary commits when it is asked for commits
affecting only certain files. We already cope with that by adding the
missing boundary commits in addextraid, but there we weren't adding a
0 to the end of the commitlisted list when we added the extra id to
the end of the displayorder list.
This fixes it by appending 0 to commitlisted in addextraid, thus keeping
commitlisted and displayorder in sync.
* master:
pager: do not fork a pager if PAGER is set to empty.
diff-options: add --patch-with-stat
diff-files --stat: do not dump core with unmerged index.
Support "git cmd --help" syntax
diff --stat: do not do its own three-dashes.
diff-tree: typefix.
GIT v1.3.0-rc4
xdiff: post-process hunks to make them consistent.
This uses the "--no-walk" flag that I never actually implemented (but I'm
sure I mentioned it) to make "git show" be essentially the same thing as
"git whatchanged --no-walk".
It just refuses to add more interesting parents to the revision walking
history, so you don't actually get any history, you just get the commit
you asked for.
I was going to add "--no-walk" as a real argument flag to git-rev-list
too, but I'm not sure anybody actually needs it. Although it might be
useful for porcelain, so I left the door open.
[jc: ported to the unified option structure by Linus]
Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
Merging all three option parsers related to whatchanged is
unarguably the right thing, but the fallout was too big to scare
me away. Let's try it once again, but once step at time.
This splits out init_revisions() call from setup_revisions(), so
that the callers can set different defaults to match the
traditional benaviour.
The rev-list command is still broken in a big way, which is the
topic of next step.
The "--help" argument is special, in that it is (along with "--version")
in that is taken by the "git" program itself rather than the sub-command,
and thus we've had the syntax "git --help cmd".
However, as anybody who has ever used CVS or some similar devil-spawn
program, it's confusing as h*ll when options before the sub-command act
differently from options after the sub-command, so this quick hack just
makes it acceptable to do "git cmd --help" instead, and get the exact same
result.
It may be hacky, but it's simple and does the trick.
Of course, this does not help if you use one of the non-builtin commands
without using the "git" helper. Ie you won't be getting a man-page just
because you do "git-rev-list --help". Don't expect us to be quite _that_
helpful.
Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
I missed that "git-diff-* --stat" spits out three-dash separator
on its own without being asked. Remove it.
When we output commit log followed by diff, perhaps --patch-with-stat,
for downstream consumer, we _would_ want the three-dash between
the message and the diff material, but that logic belongs to the
caller, not diff generator.
Recent diff_tree_setup_paths() update made it take a second
argument of type "struct diff_options", but we passed another
struct that happenes to have that type at the beginning by
mistake.
I've merged everything I think is ready for 1.3.0, so this is
the final round -- hopefully I can release this with minimum
last-minute fixup as v1.3.0 early next week.
Common option parsing for "git log --diff" and friends
This basically does a few things that are sadly somewhat interdependent,
and nontrivial to split out
- get rid of "struct log_tree_opt"
The fields in "log_tree_opt" are moved into "struct rev_info", and all
users of log_tree_opt are changed to use the rev_info struct instead.
- add the parsing for the log_tree_opt arguments to "setup_revision()"
- make setup_revision set a flag (revs->diff) if the diff-related
arguments were used. This allows "git log" to decide whether it wants
to show diffs or not.
- make setup_revision() also initialize the diffopt part of rev_info
(which we had from before, but we just didn't initialize it)
- make setup_revision() do all the "finishing touches" on it all (it will
do the proper flag combination logic, and call "diff_setup_done()")
Now, that was the easy and straightforward part.
The slightly more involved part is that some of the programs that want to
use the new-and-improved rev_info parsing don't actually want _commits_,
they may want tree'ish arguments instead. That meant that I had to change
setup_revision() to parse the arguments not into the "revs->commits" list,
but into the "revs->pending_objects" list.
Then, when we do "prepare_revision_walk()", we walk that list, and create
the sorted commit list from there.
This actually cleaned some stuff up, but it's the less obvious part of the
patch, and re-organized the "revision.c" logic somewhat. It actually paves
the way for splitting argument parsing _entirely_ out of "revision.c",
since now the argument parsing really is totally independent of the commit
walking: that didn't use to be true, since there was lots of overlap with
get_commit_reference() handling etc, now the _only_ overlap is the shared
(and trivial) "add_pending_object()" thing.
However, I didn't do that file split, just because I wanted the diff
itself to be smaller, and show the actual changes more clearly. If this
gets accepted, I'll do further cleanups then - that includes the file
split, but also using the new infrastructure to do a nicer "git diff" etc.
Even in this form, it actually ends up removing more lines than it adds.
It's nice to note how simple and straightforward this makes the built-in
"git log" command, even though it continues to support all the diff flags
too. It doesn't get much simpler that this.
I think this is worth merging soonish, because it does allow for future
cleanup and even more sharing of code. However, it obviously touches
"revision.c", which is subtle. I've tested that it passes all the tests we
have, and it passes my "looks sane" detector, but somebody else should
also give it a good look-over.
[jc: squashed the original and three "oops this too" updates, with
another fix-up.]
Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
The $initial commit is the very first commit you made. The
first midpoint bisects things evenly as designed, but the latter
does not.
The reason I got interested in this was because I was wondering
if something like the following would help people converting a
huge repository from foreign SCM, or preparing a repository to
be fetched over plain dumb HTTP only:
#!/bin/sh
N=4
P=.git/objects/pack
bottom=
while test 0 \< $N
do
N=$((N-1))
if test -z "$bottom"
then
newbottom=`git rev-list --bisect --all`
else
newbottom=`git rev-list --bisect ^$bottom --all`
fi
if test -z "$bottom"
then
rev_list="$newbottom"
elif test 0 = $N
then
rev_list="^$bottom --all"
else
rev_list="^$bottom $newbottom"
fi
p=$(git rev-list --unpacked --objects $rev_list |
git pack-objects $P/pack)
git show-index <$P/pack-$p.idx | wc -l
bottom=$newbottom
done
The idea is to pack older half of the history to one pack, then
older half of the remaining history to another, to continue a
few times, using finer granularity as we get closer to the tip.
This may not matter, since for a truly huge history, running
bisect number of times could be quite time consuming, and we
might be better off running "git rev-list --all" once into a
temporary file, and manually pick cut-off points from the
resulting list of commits. After all we are talking about
"approximately half" for such an usage, and older history does
not matter much.
Clean up trailing whitespace when pretty-printing commits
Partly because we've messed up and now have some commits with trailing
whitespace, but partly because this also just simplifies the code, let's
remove trailing whitespace from the end when pretty-printing commits.
Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
Relying on eye-candy progress bar was fragile to begin with.
Run fetch-pack with -k option, and count the objects that are in
the pack that were transferred from the other end.
Shell utilities: Guard against expr' magic tokens.
Some words, e.g., `match', are special to expr(1), and cause strange
parsing effects. Track down all uses of expr and mangle the arguments
so that this isn't a problem.
Signed-off-by: Mark Wooding <mdw@distorted.org.uk> Signed-off-by: Junio C Hamano <junkio@cox.net>
t3600-rm: skip failed-remove test when we cannot make an unremovable file.
When running t3600-rm test under fakeroot (or as root), we
cannot make a file unremovable with "chmod a-w .". Detect this
case early and skip that test.
This trivially avoids keeping the commit message data around after we
don't need it any more, avoiding a continually growing "git log" memory
footprint.
It's not a huge deal, but it's somewhat noticeable. For the current kernel
tree, doing a full "git log" I got
ie the touched pages dropped from 8851 to 5039. For the historic kernel
archive, the numbers are 18357->11037 minor page faults.
We could/should in theory free the commits themselves, but that's really a
lot harder, since during revision traversal we may hit the same commit
twice through different children having it as a parent, even after we've
shown it once (when that happens, we'll silently ignore it next time, but
we still need the "struct commit" to know).
And as the commit message data is clearly the biggest part of the commit,
this is the really easy 60% solution.
Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
This target lists undocumented commands, and/or whose document
is not referenced from the main git documentation.
For now, there are some exceptions I added primarily because I
lack the energy to document them myself:
- merge backends (we should really document them)
- ssh-push/ssh-pull (does anybody still use them?)
- annotate and blame (maybe after one of them eats the other ;-)
* jc/combine:
stripspace: make sure not to leave an incomplete line.
git-commit: do not muck with commit message when no_edit is set.
When showing a commit message, do not lose an incomplete line.
Retire t5501-old-fetch-and-upload test.
combine-diff: type fix.