Since the delta data format is not tied to any actual git object
anymore, now is the time to add a small improvement to the delta data
header as it is been done for packed object header. This patch allows
for reducing the delta header of about 2 bytes and makes for simpler
code.
Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] Emit base objects of a delta chain when the delta is output.
Deltas are useless by themselves and when you use them you need to get
to their base objects. A base object should inherit recency from the
most recent deltified object that is based on it and that is what this
patch teaches git-pack-objects.
Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It gets a bit more complicated to unpack in a streaming environment, but
here it is. The rewrite is actually a lot cleaner in other ways, it's
just a bit more subtle.
The fsck-cache complains if objects referred to by files in .git/refs/
or objects stored in files under .git/objects/??/ are not found as
stand-alone SHA1 files (i.e. found in alternate object pools
GIT_ALTERNATE_OBJECT_DIRECTORIES or packed archives stored under
.git/objects/pack).
Although this is a good semantics to maintain consistency of a single
.git/objects directory as a self contained set of objects, it sometimes
is useful to consider it is OK as long as these "outside" objects are
available.
This commit introduces a new flag, --standalone, to git-fsck-cache.
When it is not specified, connectivity checks and .git/refs pointer
checks are taught that it is OK when expected objects do not exist under
.git/objects/?? hierarchy but are available from an packed archive or in
an alternate object pool.
Another new flag, --full, makes git-fsck-cache to check not only the
current GIT_OBJECT_DIRECTORY but also objects found in alternate object
pools and packed GIT archives.a
Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The commands git-fsck-cache and probably git-*-pull needs to have a way
to enumerate objects contained in packed GIT archives and alternate
object pools. This commit exposes the data structure used to keep track
of them from sha1_file.c, and adds a couple of accessor interface
functions for use by the enhanced git-fsck-cache command.
Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] Adjust fsck-cache to packed GIT and alternate object pool.
The fsck-cache complains if objects referred to by files in .git/refs/
or objects stored in files under .git/objects/??/ are not found as
stand-alone SHA1 files (i.e. found in alternate object pools
GIT_ALTERNATE_OBJECT_DIRECTORIES or packed archives stored under
.git/objects/pack).
Although this is a good semantics to maintain consistency of a single
.git/objects directory as a self contained set of objects, it sometimes
is useful to consider it is OK as long as these "outside" objects are
available.
This commit introduces a new flag, --standalone, to git-fsck-cache.
When it is not specified, connectivity checks and .git/refs pointer
checks are taught that it is OK when expected objects do not exist under
.git/objects/?? hierarchy but are available from an packed archive or in
an alternate object pool.
Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Change pack file format. Hopefully for the last time.
This also adds a header with a signature, version info, and the number
of objects to the pack file. It also encodes the file length and type
more efficiently.
[PATCH] Obtain sha1_file_info() for deltified pack entry properly.
The initial one was not doing enough to figure things out
without uncompressing too much. It also fixes a potential
segfault resulting from missing use_packed_git() call.
We would need to introduce unuse_packed_git() call and do proper
use counting to figure out when it is safe to unmap, but
currently we do not unmap packed file yet.
Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] Skip writing out sha1 files for objects in packed git.
Now, there's still a misfeature there, which is that when you
create a new object, it doesn't check whether that object
already exists in the pack-file, so you'll end up with a few
recent objects that you really don't need (notably tree
objects), and this patch fixes it.
Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This replaces sha1sum(1) with sum(1) in t/t1002. GNU sum(1) runs in
"BSD compatibility" mode by default, and not all systems have GNU
coreutils. On any system without GNU coreutils (or sha1sum) t1002 will
fail. This patch should make t1002 complete successfully everywhere
that sum(1) runs.
I've tested this on Darwin and Linux; it works on both platforms.
Signed-off-by: Mark Allen <mrallen1@yahoo.com> Acked-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] Teach read_sha1_file() and friends about packed git object store.
GIT_OBJECT_DIRECTORY and GIT_ALTERNATE_OBJECT_DIRECTORIES can
have the "pack" subdirectory that houses "packed GIT" files
produced by git-pack-objects (e.g. .git/objects/pack/foo.pack
and .git/objects/pack/foo.idx; always store them as pairs). The
following functions in sha1_file.c can then read object contents
from such packed file:
Packed delta files created by git-pack-objects seems to be the
way to go, and existing "delta" object handling code has exposed
the object representation details to too many places. Remove it
while we refactor code to come up with a proper interface in
sha1_file.c.
Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix a bug where we would corrupt the stuff read from git-rev-list.
If we have a very long commit message, and we end up getting a
bufferfull of data from git-rev-list that all belongs to one commit,
we ended up throwing away the data from a previous read that should
have been included. The result was a error message about not being
able to parse the output of git-rev-list.
Also, if the git-rev-list output that we can't parse is long, only put
the first 80 chars in the error message. Otherwise we end up with an
enormous error window.
Also, make the writing of the SHA1 as a end-header be conditional: not
every user will necessarily want to write the SHA1 to the file itself,
even though current users do (but we migh end up using the same helper
functions for the object files themselves, that don't do this).
This also makes the packed index file contain the SHA1 of the packed
data file at the end (just before its own SHA1). That way you can
validate the pairing of the two if you want to.
git-pack-objects: write the pack files with a SHA1 csum
We want to be able to check their integrity later, and putting the
sha1-sum of the contents at the end is a good thing. The writing
routines are generic, so we could try to re-use them for the index file,
instead of having the same logic duplicated.
Update unpack-objects to know about the extra 20 bytes at the end
of the index.
Here is a script to simplify validating the gpg signature created by
git-tag-script. Might be useful to add to the git tree so that people
don't have to search for the right post in the git mailinglist archives
Check for the existence of the git directory on startup.
Check that $GIT_DIR (or .git, if GIT_DIR is not set) is a directory.
This means we can give a more informative error message if the user
runs gitk somewhere that isn't a git repository.
git-pack-objects: do the delta search in reverse size order
Starting from big objects and going backwards means that we end up
picking a delta that goes from a bigger object to a smaller one. That's
advantageous for two reasons: the bigger object is likely the newer one
(since things tend to grow, rather than shrink), and doing a delete
tends to be smaller than doing an add.
So the deltas don't tend to be top-of-tree, and the packed end result is
just slightly smaller.
[PATCH] Add git-relink-script to fix up missing hardlinks
This will scan 2 or more object repositories and look for common objects, check
if they are hardlinked, and replace one with a hardlink to the other if not.
This version warns when skipping files because of size differences, and
handle more than 2 repositories automatically.
Signed-off-by: Ryan Anderson <ryan@michonline.com> Cheered-on-by: Jeff Garzik <jgarzik@pobox.com> Acked-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
git-rev-parse: add "--not" flag to mark subsequent heads negative
If you have two lists of heads, and you want to see ones reachable from
list $a but not from list $b, just do
git-rev-list $(git-rev-parse $a --not $b)
which is useful for both bisecting (where "b" would be the list of known
good revisions, and "a" would be the latest found bad head) and for just
seeing what the difference between two sets of heads are if you want to
generate a pack-file for the difference.
[PATCH] Finish initial cut of git-pack-object/git-unpack-object pair.
This finishes the initial round of git-pack-object /
git-unpack-object pair. They are now good enough to be used as
a transport medium:
- Fix delta direction in pack-objects; the original was
computing delta to create the base object from the object to
be squashed, which was quite unfriendly for unpacker ;-).
- Add a script to test the very basics.
- Implement unpacker for both regular and deltified objects.
Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
git-pack-objects: make "--window=x" semantics more logical.
A zero disables delta generation (like before), but we make the window
be one bigger than specified, since we use one entry for the one to be
tested (it used to be that "--window=1" was meaningless, since we'd have
used up the single-entry window with the entry to be tested, and had no
chance of actually ever finding a delta).
The default window remains at 10, but now it really means "test the 10
closest objects", not "test the 9 closest objects".
Anything that generates a delta to see if two objects are close usually
isn't interested in the delta ends up being bigger than some specified
size, and this allows us to stop delta generation early when that
happens.
Return value of try_delta is checked for negativeness, but the
success path does not return anything, letting compiler warn and
presumably return garbage.
Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] Fix oversimplified optimization for add_cache_entry().
An earlier change to optimize directory-file conflict check
broke what "read-tree --emu23" expects. This is fixed by this
commit.
(1) Introduces an explicit flag to tell add_cache_entry() not to
check for conflicts and use it when reading an existing tree
into an empty stage --- by definition this case can never
introduce such conflicts.
(2) Makes read-cache.c:has_file_name() and read-cache.c:has_dir_name()
aware of the cache stages, and flag conflict only with paths
in the same stage.
Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] git-merge-one-file-script: do not misinterpret rm failure.
When a merge adds a file DF and removes a directory there by
deleting a path DF/DF, git-merge-one-file-script can be called
for the removal of DF/DF when the path DF is already created by
"git-read-tree -m -u". When this happens, we get confused by a
failure return from 'rm -f -- "$4"' (where $4 is DF/DF); finding
file DF there the "rm -f" command complains that DF is not a
directory.
What we want to ensure is that there is no file DF/DF in this
case. Avoid getting ourselves confused by first checking if
there is a file, and only then try to remove it (and check for
failure from the "rm" command).
Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This adds more tests for --emu23. One is to show how it can
carry forward more local changes than the straightforward
two-way fast forward, and another is to show the recent
overeager optimization of directory/file conflict check broke
things, which will be fixed in the next commit.
Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] git-cherry: find commits not merged upstream.
The git-cherry command helps the git-rebase script by finding
commits that have not been merged upstream. Commits already
included in upstream are prefixed with '-' (meaning "drop from
my local pull"), while commits missing from upstream are
prefixed with '+' (meaning "add to the updated upstream").
Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] fix date parsing for GIT raw commit timestamp format.
Usually all of the match_xxx routines in date.c fill tm
structure assuming that the parsed string talks about local
time, and parse_date routine compensates for it by adjusting the
value with tz offset parsed out separately. However, this logic
does not work well when we feed GIT raw commit timestamp to it,
because what match_digits gets is already in GMT.
A good testcase is:
$ make test-date
$ ./test-date 'Fri Jun 24 16:55:27 2005 -0700' '1119657327 -0700'
These two timestamps represent the same time, but the second one
without the fix this commit introduces gives you 7 hours off.
Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
So far it just reads the header and generates the list of objects.
It also sorts them by the order they are written in the pack file,
since that ends up being the same order we got them originally, and
is thus "most recent first".
git-pack-objects: create a packed object representation.
This is kind of like a tar-ball for a set of objects, ready to be
shipped off to another end. Alternatively, you could use is as a packed
representation of the object database directly, if you changed
"read_sha1_file()" to read these kinds of packs.
The latter is partiularly useful to generate a "packed history", ie you
could pack up your old history efficiently, but still have it available
(at a performance hit, of course).
I haven't actually written an unpacker yet, so the end result has not
been verified in any way yet. I obviously always write bug-free code,
so it just has to work, no?
it now lists not only the "commit difference" between the parent of HEAD
and HEAD itself (which is normally just the parent, but in the case of a
merge will be all the newly merged commits), but also all the new tree
and blob objects that weren't in the original.
NOTE! It doesn't walk all the way to the root, so it doesn't do a full
object search in the full old history. Instead, it will only look as
far back in the history as it needs to resolve the commits. Thus, if
the commit reverts a blob (or tree) back to a state much further back in
history, we may end up listing some blobs (or trees) as "new" even
though they exist further back.
Regardless, the list of objects will be a superset (usually exact) list
of objects needed to go from the beginning commit to ending commit.
As a particularly obvious special case,
git-rev-list --objects HEAD
will end up listing every single object that is reachable from the HEAD
commit.
Side note: the objects are sorted by "recency", with commits first.
Add commit row context menu and handle left-click on graph lines
Right-click on a context row now brings up a menu allowing the user to
generate a diff between that row and the selected row. Left-click on
a graph line shows the parent and children connected by the line in
the details pane. Left-click on a circle in the graph selects that
commit. Left-click elsewhere in the graph does nothing.
When displaying a diff, the bottom-right file list box behaves
slightly differently now; instead of eliding all other files' diffs,
it now just scrolls the details pane so that the selected file's diff
starts at the top of the pane.
Since the diffs can be rather large, arrange for an update to be done
every 100ms while reading diffs.
Also removed the CVS revision keywords and bumped the version number
to 1.2.
Add "git-patch-id" program to generate patch ID's.
A "patch ID" is nothing but a SHA1 of the diff associated with a patch,
with whitespace and line numbers ignored. As such, it's "reasonably
stable", but at the same time also reasonably unique, ie two patches
that have the same "patch ID" are almost guaranteed to be the same
thing.
IOW, you can use this thing to look for likely duplicate commits.
[PATCH] Fix to how --merge-order handles multiple roots
This patch addresses the problem reported by Paul Mackerras such that --merge-order
did not report the last root of a graph with merge of two independent roots.
Signed-off-by: Jon Seymour <jon.seymour@gmail.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We used to ignore unreachable tags, which just causes problems: it makes
"git prune" leave them around, but since we'll have prune everything
that tag points to, the tag object really should be removed too.
So remove the code that made us think tags were always reachable.
The sensible cleanup of the in-memory storage order of commit parents broke the --merge-order
code which was dependent on the previous behaviour of parse_commit().
This patch restores the correctness --merge-order behaviour by taking account of the
new behaviour of parse_commit.
Signed-off-by: Jon Seymour <jon.seymour@gmail.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Do a cross-project merge of Paul Mackerras' gitk visualizer
gitk is really quite incredibly cool, and is great for visualizing what
is going on in a git repository. It's especially useful when you are
looking at what has changed since a particular version, since it
gracefully handles partial trees (and this also avoids the expense of
looking at _all_ changes in a big project).
For example, to see what changed in a merge after a "git pull", do
gitk ORIG_HEAD..
to see only the new things. Or you can simply do "gitk v2.6.12.." to
see what has changed since the v2.6.12 tag etc.
This merge itself is pretty interesting too, since it shows off a
feature of git itself that is incredibly cool: you can merge a
_separate_ git project into another git project. Not only does this
keep all the history of the original project, it also makes it possible
to continue to merge with the original project and the union of the two
projects.
Typical expected usage is "git-apply --stat --summary" to show
diffstat plus dense description of information available in git
extended headers, such as creations, renames, and mode changes.
Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] git-apply --stat: show new filename for rename/copy patch.
When a patch is a git extended rename/copy patch, "git-apply
--stat" showed the old filename. Change it to show the new
filename, because most of the time we are interested in looking
at the resulting tree.
Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
git-apply: create subdirectories leading up to a new file
Applying Andrew's latest patch-bomb showed us failing miserably if a new
subdirectory needed to be created.. That said, it's uncommon enough
that it's worth optimistically assuming it won't be needed, and then
creating the subdirectories only on failure.
Make pull fetch whatever is specified, parse it to figure out what it is, and
then process it appropriately. This also supports getting tag objects, and
getting whatever they tag.
Signed-off-by: Daniel Barkalow <barkalow@iabervon.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Handle parsing a tag for a non-present object. This adds a function to lookup
an object with lookup_* for * in a string, so that it can get the right storage
based on the "type" line in the tag.
Signed-off-by: Daniel Barkalow <barkalow@iabervon.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
With --header, git-rev-list gives us the contents of the commit
in-line, so we don't need to exec a git-cat-file to get it, and we
don't need the readobj command either.
Also fixed a residual problem with handling the commit that
has a parent listed twice.
Here is a patch that fixes several gcc4 warnings about different signedness,
all between char and unsigned char. I tried to keep the patch minimal
so resertod to casts in three places.
Signed-off-by: Mika Kukkonen <mikukkon@iki.fi> Signed-off-by: Linus Torvalds <torvalds@osdl.org>