If that pack is big, it takes significant time to write and might
benefit from some more eye candies as well. This is however disabled
when the pack is written to stdout since in that case the output is
usually piped into unpack_objects which already does its own progress
reporting.
Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
This provides a stable and simpler progress reporting mechanism that
updates progress as often as possible but accurately not updating more
than once a second. The deltification phase is also made more
interesting to watch (since repacking a big repository and only seeing a
dot appear once every many seconds is rather boring and doesn't provide
much food for anticipation).
Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
Keep Porcelainish from failing by broken ident after making changes.
"empty ident not allowed" error makes commit-tree fail, so we
are already safer in that we would not end up with commit
objects that have bogus names on the author or committer fields.
However, before commit-tree is called there are already changes
made to the index file and the working tree. The operation can
be resumed after fixing the environment problem, but when this
triggers to a newcomer with unusable gecos, the first question
becomes "what did I lose and how would I recover".
This patch modifies some Porcelainish commands to verify
GIT_COMMITTER_IDENT as soon as we know we are going to make some
commits before doing much damage to prevent confusion.
Delay "empty ident" errors until they really matter.
Previous one warned people upfront to encourage fixing their
environment early, but some people just use repositories and git
tools read-only without making any changes, and in such a case
there is not much point insisting on them having a usable ident.
This round attempts to move the error until either "git-var"
asks for the ident explicitly or "commit-tree" wants to use it.
Make "empty ident" error message a bit more helpful.
It appears that some people who did not care about having bogus
names in their own commit messages are bitten by the recent
change to require a sane environment [*1*].
While it was a good idea to prevent people from using bogus
names to create commits and doing sign-offs, the error message
is not very informative. This patch attempts to warn things
upfront and hint people how to fix their environments.
pack-objects: avoid delta chains that are too long.
This tries to rework the solution for the excess delta chain
problem. An earlier commit worked it around ``cheaply'', but
repeated repacking risks unbound growth of delta chains.
This version counts the length of delta chain we are reusing
from the existing pack, and makes sure a base object that has
sufficiently long delta chain does not get deltified.
This introduces --no-reuse-delta option to disable reusing of
existing delta, which is a large part of the optimization
introduced by this series. This may become necessary if
repeated repacking makes delta chain too long. With this, the
output of the command becomes identical to that of the older
implementation. But the performance suffers greatly.
It still allows reusing non-deltified representations; there is
no point uncompressing and recompressing the whole text.
It also adds a couple more statistics output, while squelching
it under -q flag, which the last round forgot to do.
$ time old-git-pack-objects --stdout >/dev/null <RL
Generating pack...
Done counting 184141 objects.
Packing 184141 objects....................
real 12m8.530s user 11m1.450s sys 0m57.920s
$ time git-pack-objects --stdout >/dev/null <RL
Generating pack...
Done counting 184141 objects.
Packing 184141 objects.....................
Total 184141, written 184141 (delta 138297), reused 178833 (delta 134081)
real 0m59.549s user 0m56.670s sys 0m2.400s
$ time git-pack-objects --stdout --no-reuse-delta >/dev/null <RL
Generating pack...
Done counting 184141 objects.
Packing 184141 objects.....................
Total 184141, written 184141 (delta 134833), reused 47904 (delta 0)
real 11m13.830s user 9m45.240s sys 0m44.330s
There is one remaining issue when --no-reuse-delta option is not
used. It can create delta chains that are deeper than specified.
A<--B<--C<--D E F G
Suppose we have a delta chain A to D (A is stored in full either
in a pack or as a loose object. B is depth1 delta relative to A,
C is depth2 delta relative to B...) with loose objects E, F, G.
And we are going to pack all of them.
B, C and D are left as delta against A, B and C respectively.
So A, E, F, and G are examined for deltification, and let's say
we decided to keep E expanded, and store the rest as deltas like
this:
E<--F<--G<--A
Oops. We ended up making D a bit too deep, didn't we? B, C and
D form a chain on top of A!
This is because we did not know what the final depth of A would
be, when we checked objects and decided to keep the existing
delta. Unfortunately, deferring the decision until just before
the deltification is not an option. To be able to make B, C,
and D candidates for deltification with the rest, we need to
know the type and final unexpanded size of them, but the major
part of the optimization comes from the fact that we do not read
the delta data to do so -- getting the final size is quite an
expensive operation.
To prevent this from happening, we should keep A from being
deltified. But how would we tell that, cheaply?
To do this most precisely, after check_object() runs, each
object that is used as the base object of some existing delta
needs to be marked with the maximum depth of the objects we
decided to keep deltified (in this case, D is depth 3 relative
to A, so if no other delta chain that is longer than 3 based on
A exists, mark A with 3). Then when attempting to deltify A, we
would take that number into account to see if the final delta
chain that leads to D becomes too deep.
However, this is a bit cumbersome to compute, so we would cheat
and reduce the maximum depth for A arbitrarily to depth/4 in
this implementation.
When generating a new pack, notice if we have already needed
objects in existing packs. If an object is stored deltified,
and its base object is also what we are going to pack, then
reuse the existing deltified representation unconditionally,
bypassing all the expensive find_deltas() and try_deltas()
calls.
Also, notice if what we are going to write out exactly match
what is already in an existing pack (either deltified or just
compressed). In such a case, we can just copy it instead of
going through the usual uncompressing & recompressing cycle.
Without this patch, in linux-2.6 repository with about 1500
loose objects and a single mega pack:
The real problem triggered an earlier fix was that an alternate
entry was pointing at a removed directory. Complaining on
object/pack directory that cannot be opendir-ed produces noise
in an ancient repository that does not have object/pack
directory and has never been packed.
Detect the real user error and report it. Also if opendir
failed for other reasons (e.g. no read permissions), report that
as well.
Spotted by Andrew Vasquez <andrew.vasquez@qlogic.com>.
git-push: Update documentation to describe the no-refspec behavior.
It turns out that the git-push documentation didn't describe what it
would do when not given a refspec, (not on the command line, nor in a
remotes file). This is fairly important for the user who is trying to
understand operations such as:
I tracked the mystery behavior down to git-send-pack and lifted the
relevant portion of its documentation up to git-push, (namely that all
refs existing both locally and remotely are updated).
Signed-off-by: Carl Worth <cworth@cworth.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
git-add: Add support for --, documentation, and test.
This adds support to git-add to allow the common -- to separate
command-line options and file names. It adds documentation and a new
git-add test case as well.
[jc: this should apply to 1.2.X maintenance series, so I reworked
git-ls-files --error-unmatch test. ]
When git-reset --hard is used and a subdirectory becomes
empty (as it contains no tracked files in the target tree)
the empty subdirectory should be removed. This matches
the behavior of git-checkout-index and git-read-tree -m
which would not have created the subdirectory or would
have deleted it when updating the working directory.
Subdirectories which are not empty will be left behind.
This may happen if the subdirectory still contains object
files from the user's build process (for example).
[jc: simplified the logic a bit, while keeping the test script.]
Print an error if cloning a http repo and NO_CURL is set
If Git is compiled with NO_CURL=YesPlease and one tries to
clone a http repository, git-clone tries to call the curl
binary. This trivial patch prints an error instead in such
situation.
Signed-off-by: Fernando J. Pereda <ferdy@gentoo.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
When showing a conflicted merge from index stages and working
tree file, we did not fetch the mode from the working tree,
and mistook that as a deleted file. Also if the manual
resolution (or automated resolution by git rerere) ended up
taking either parent's version, we did not show _anything_ for
that path. Either was quite bad and confusing.
Documentation: git-commit in 1.2.X series defaults to --include.
The documentation was mistakenly describing the --only semantics to
be default. The 1.2.0 release and its maintenance series 1.2.X will
keep the traditional --include semantics as the default. Clarify the
situation.
Earlier, when we switched a branch we used diff-files to show
paths that are dirty in the working tree. But we allow switching
branches with updated index ("read-tree -m -u $old $new" works that
way), and only showing paths that have differences in the working
tree but not paths that are different in index was confusing.
This shows both as modified from the top commit of the branch we
just have switched to.
avoid echo -e, there are systems where it does not work
FreeBSD 4.11 being one example: the built-in echo doesn't have -e,
and the installed /bin/echo does not do "-e" as well.
"printf" works, laking just "\e" and "\xAB'.
The hashed object lookup had a subtle bug in re-hashing: it did
for (i = 0; i < count; i++)
if (objs[i]) {
.. rehash ..
where "count" was the old hash couny. Oon the face of it is obvious, since
it clearly re-hashes all the old objects.
However, it's wrong.
If the last old hash entry before re-hashing was in use (or became in use
by the re-hashing), then when re-hashing could have inserted an object
into the hash entries with idx >= count due to overflow. When we then
rehash the last old entry, that old entry might become empty, which means
that the overflow entries should be re-hashed again.
In other words, the loop has to be fixed to either traverse the whole
array, rather than just the old count.
(There's room for a slight optimization: instead of counting all the way
up, we can break when we see the first empty slot that is above the old
"count". At that point we know we don't have any collissions that we might
have to fix up any more. This patch only does the trivial fix)
[jc: with trivial fix on trivial fix]
Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
This is to be nicer to people with unusable GECOS field.
"git-var -l" is currently broken in that when used by a user who
does not have a usable GECOS field and has not corrected it by
exporting GIT_COMMITTER_NAME environment variable it dies when
it tries to output GIT_COMMITTER_IDENT (same thing for AUTHOR).
"git-pull" used "git-var -l" only because it needed to get a
configuration variable before "git-repo-config --get" was
introduced. Use the latter tool designed exactly for this
purpose.
"git-sh-setup" used "git-var GIT_AUTHOR_IDENT" without actually
wanting to use its value. The only purpose was to cause the
command to check and barf if the repository format version
recorded in the $GIT_DIR/config file is too new for us to deal
with correctly. Instead, use "repo-config --get" on a random
property and see if it die()s, and check if the exit status is
128 (comes from die -- missing variable is reported with exit
status 1, so we can tell that case apart).
Add support for explicit type specifiers when calling git-repo-config
Currently, git-repo-config will just return the raw value of option
as specified in the config file; this makes things difficult for scripts
calling it, especially if the value is supposed to be boolean.
This patch makes it possible to ask git-repo-config to check if the option
is of the given type (int or bool) and write out the value in its
canonical form. If you do not pass --int or --bool, the behaviour stays
unchanged and the raw value is emitted.
This also incidentally fixes the segfault when option with no value is
encountered.
[jc: tweaked the option parsing a bit to make it easier to see
that the patch does not change anything but the type stuff in
the diff output. Also changed to avoid "foo ? : bar" construct. ]
Signed-off-by: Petr Baudis <pasky@suse.cz> Signed-off-by: Junio C Hamano <junkio@cox.net>
The absolute path (with the leading slash) breaks SVN importing,
because it then looks for /trunk/... instead of /svn/trunk/...
(in my case, the repository URL was https://servername/svn/)
Signed-off-by: Christian Biesinger <cbiesinger@web.de> Signed-off-by: Junio C Hamano <junkio@cox.net>
We shouldn't fail a fetch just because a signal might have interrupted
the read.
Normally, we don't install any signal handlers, so EINTR really shouldn't
happen. That said, really old versions of Linux will interrupt an
interruptible system call even for signals that turn out to be ignored
(SIGWINCH is the classic example - resizing your xterm would cause it).
The same might well be true elsewhere too.
Also, since receive_keep_pack() doesn't control the caller, it can't know
that no signal handlers exist.
Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
Make "git clone" less of a deathly quiet experience
It used to be that "git-unpack-objects" would give nice percentages, but
now that we don't unpack the initial clone pack any more, it doesn't. And
I'd love to do that nice percentage view in the pack objects downloader
too, but the thing doesn't even read the pack header, much less know how
much it's going to get, so I was lazy and didn't.
Instead, it at least prints out how much data it's gotten, and what the
packing speed is. Which makes the user realize that it's actually doing
something useful instead of sitting there silently (and if the recipient
knows how large the final result is, he can at least make a guess about
when it migt be done).
So with this patch, I get something like this on my DSL line:
where even the speed approximation seems to be roughtly correct (even
though my algorithm is a truly stupid one, and only really gives "speed in
the last half second or so").
Anyway, _something_ like this is definitely needed. It could certainly be
better (if it showed the same kind of thing that git-unpack-objects did,
that would be much nicer, but would require parsing the object stream as
it comes in). But this is big step forward, I think.
Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
* lt/diff-tree:
combine-diff: Record diff status a bit more faithfully
find_unique_abbrev() simplification.
combine-diff: move formatting logic to show_combined_diff()
combined-diff: use diffcore before intersecting paths.
diff-tree -c raw output
rev-list: default to abbreviate merge parent names under --pretty.
When we prettyprint commit log messages, merge parent names were
often very long and there was no way to abbreviate it.
This changes them to be abbreviated by default, and non-default
abbreviations can be specified with --no-abbrev or --abbrev=<n>
options.
Note that this affects only the prettyprinted parent names. The
output from --show-parents is meant for machine consumption and
is not affected by this flag.
There was a stale comment that explains why the old code could
undercount when delta data copied things around inside detination
buffer. We do not use that kind of delta, so the comment does
not apply.
combine-diff: Record diff status a bit more faithfully
This shows "new file mode XXXX" and "deleted file mode XXXX"
lines like two-way diff-patch output does, by checking the
status from each parent.
The diff-raw output for combined diff is made a bit uglier by
showing diff status letters with each parent. While most of the
case you would see "MM" in the output, an Evil Merge that
touches a path that was added by inheriting from one parent is
possible and it would be shown like these:
Earlier it did not grok the 0{40} SHA1 very well, but what it
needed to do was to find the shortest 0{N} that is not used as a
valid object name to be consistent with the way names of valid
objects are abbreviated. This makes some users simpler.
This revamps the git-status command to take the same set of
parameters as git commit. It gives a preview of what is being
committed with that command. With -v flag, it shows the diff
output between the HEAD commit and the index that would be
committed if these flags were given to git-commit command.
git-commit also acquires -v flag (it used to mean "verify" but
that is the default anyway and there is --no-verify to turn it
off, so not much is lost), which uses the updated git-status -v
to seed the commit log buffer. This is handy for writing a log
message while reviewing the changes one last time.
Now, git-commit and git-status are internally share the same
implementation.
Unlike previous git-commit change, this uses a temporary index
to prepare the index file that would become the real index file
after a successful commit, and moves it to the real index file
once the commit is actually made. This makes it safer than the
previous scheme, which stashed away the original index file and
restored it after an aborted commit.
After experimenting with code to add the ability to encode a delta
against part of the deltified file, it turns out that resulting packs
are _bigger_ than when this ability is not used. The raw delta output
might be smaller, but it doesn't compress as well using gzip with a
negative net saving on average.
Said bit would in fact be more useful to allow for encoding the copying
of chunks larger than 64KB providing more savings with large files.
This will correspond to packs version 3.
While the current code still produces packs version 2, it is made future
proof so pack versions 2 and 3 are accepted. Any pack version 2 are
compatible with version 3 since the redefined bit was never used before.
When enough time has passed, code to use that bit to produce version 3
packs could be added.
Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
stat() for existence in safe_create_leading_directories()
Use stat() to explicitly check for existence rather than
relying on the non-portable EEXIST error in sha1_file.c's
safe_create_leading_directories(). There certainly are
optimizations possible, but then the code becomes almost
the same as that in coreutil's lib/mkdir-p.c.
Other uses of EEXIST seem ok. Tested on Solaris 8, AIX 5.2L,
and a few Linux versions. AIX has some unrelated (I think)
failures right now; I haven't tried many recent gits there.
Anyone have an old Ultrix box to break everything? ;)
Also remove extraneous #includes. Everything's already in
git-compat-util.h, included through cache.h.
Signed-off-by: Jason Riedy <ejr@cs.berkeley.edu> Signed-off-by: Junio C Hamano <junkio@cox.net>
combine-diff: move formatting logic to show_combined_diff()
This way, diff-files can make use of it. Also implement the
full suite of what diff_flush_raw() supports just for
consistency. With this, 'diff-tree -c -r --name-status' would
show what is expected.
There is no way to get the historical output (useful for
debugging and low-level Plumbing work) anymore, so tentatively
it makes '-m' to mean "do not combine and show individual diffs
with parents".
diff-files matches diff-tree to produce raw output for -c. For
textual combined diff, use -p -c.
If you call setup_git_directory() to work from a subdirectory,
that should be run first before running git_config(). Otherwise
you would not read the configuration file from the correct place.
NOTE! This makes "-c" be the default, which effectively means that merges
are never ignored any more, and "-m" is a no-op. So it changes semantics.
I would also like to make "--cc" the default if you do patches, but didn't
actually do that.
The raw output format is not wonderfully pretty, but it's distinguishable
from a "normal patch" in that a normal patch with just one parent has just
one colon at the beginning, while a multi-parent raw diff has <n> colons
for <n> parents.
ls-files: honour per-directory ignore file from higher directories.
When git-ls-files -o --exclude-per-directory=.gitignore is run
from a subdirectory, it did not read from .gitignore from its
parent directory. Reading from them makes output from these two
commands consistent:
It tried to "restore" GIT_AUTHOR_EMAIL environment variable but
the variable started out as unset, so ended up setting it to an
empty string. This is now caught as an error.
http-fetch: Abort requests for objects which arrived in packs
In fetch_object, there's a call to release an object request if the
object mysteriously arrived, say in a pack. Unfortunately, the fetch
attempt for this object might already be in progress, and we'll leak the
descriptor. Instead, try to tidy away the request.
Signed-off-by: Mark Wooding <mdw@distorted.org.uk> Signed-off-by: Junio C Hamano <junkio@cox.net>
format-patch: Remove last vestiges of --mbox option
Don't mention it in docs or --help output.
Remove mbox, date and author variables from git-format-patch.sh.
Use DESCRIPTION text from man-page to update LONG_USAGE output. It's
a bit silly to have two texts saying the same thing in different words,
and I'm too lazy to update both.
Signed-off-by: Andreas Ericsson <ae@op5.se> Signed-off-by: Junio C Hamano <junkio@cox.net>
Introduce --only flag to allow the new "partial commit"
semantics when paths are specified. The default is still the
traditional --include semantics. Once peoples' fingers and
scripts that want the traditional behaviour are updated to
explicitly say --include, we could change it to either default
to --only, or refuse to operate without either --only/--include
when paths are specified.
This also fixes a couple of bugs in the previous round. Namely:
- forgot to save/restore index in some cases.
- forgot to use the temporary index to show status when '--only
paths...' semantics was used.
- --author did not take precedence when reusing an existing
commit.
- "git commit" without _any_ parameter keeps the traditional
behaviour. It commits the current index.
We commit the whole index even when this form is run from a
subdirectory.
- "git commit --include paths..." (or "git commit -i paths...")
is equivalent to:
git update-index --remove paths...
git commit
- "git commit paths..." acquires a new semantics. This is an
incompatible change that needs user training, which I am
still a bit reluctant to swallow, but enough people seem to
have complained that it is confusing to them. It
1. refuses to run if $GIT_DIR/MERGE_HEAD exists, and reminds
trained git users that the traditional semantics now needs
-i flag.
2. refuses to run if named paths... are different in HEAD and
the index (ditto about reminding). Added paths are OK.
3. reads HEAD commit into a temporary index file.
4. updates named paths... from the working tree in this
temporary index.
5. does the same updates of the paths... from the working
tree to the real index.
6. makes a commit using the temporary index that has the
current HEAD as the parent, and updates the HEAD with this
new commit.
- "git commit --all" can run from a subdirectory, but it updates
the index with all the modified files and does a whole tree
commit.
- In all cases, when the command decides not to create a new
commit, the index is left as it was before the command is
run. This means that the two "git diff" in the following
sequence:
$ git diff
$ git commit -a
$ git diff
would show the same diff if you abort the commit process by
making the commit log message empty.
This commit also introduces much requested --author option.
$ git commit --author 'A U Thor <author@example.com>'
In a workflow that employs relatively long lived topic branches,
the developer sometimes needs to resolve the same conflict over
and over again until the topic branches are done (either merged
to the "release" branch, or sent out and accepted upstream).
This commit introduces a new command, "git rerere", to help this
process by recording the conflicted automerge results and
corresponding hand-resolve results on the initial manual merge,
and later by noticing the same conflicted automerge and applying
the previously recorded hand resolution using three-way merge.
If the first part uses quoted-printable to protect iso8859-1
name in the commit log, and the second part was plain ascii text
patchfile without even Content-Transfer-Encoding subheader, we
incorrectly tried to decode the patch as quoted printable.
Docs: move git url and remotes text to separate sections
The sections on git urls and remotes files in the git-fetch,
git-pull, and git-push manpages seem long enough to be worth a
manpage section of their own.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Junio C Hamano <junkio@cox.net>
The push and pull man pages include a bunch of shared text from
pull-fetch-param.txt. This simplifies maintenance somewhat, but
there's actually quite a bit of text that applies only to one or the
other.
So, separate out the push- and pull/fetch-specific text into
pull-fetch-param.txt and git-push.txt, then include the largest chunk
of common stuff (the description of protocols and url's) from
urls.txt. That cuts some irrelevant stuff from the man pages without
making us duplicate too much.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Junio C Hamano <junkio@cox.net>
combine-diff: do not punt on removed or added files.
When we remove a file, the parents' contents are all removed so
it is not that interesting to show all of them, but the fact it
was removed when all parents had it *is* unusual. When we add a
file, similarly the fact it was added when no parent wanted it
*is* unusual, and in addition the result matters, so show it.
gitk: Use git-diff-tree --cc for showing the diffs for merges
This replaces a lot of code that used the result from several 2-way
diffs to generate a combined diff for a merge. Now we just use
git-diff-tree --cc and colorize the output a bit, which is a lot
simpler, and has the enormous advantage that if the diff doesn't
show quite what someone thinks it should show, I can deflect the
blame to someone else. :)
Apparently this simplifies things for the parser/compiler and makes
it go slightly faster (since without the braces, it potentially has
to do two levels of substitutions rather than one).
* jc/daemon:
daemon: extend user-relative path notation.
daemon: Set SO_REUSEADDR on listening sockets.
daemon: do not forbid user relative paths unconditionally under --base-path
* mw/http:
http-fetch: Tidy control flow in process_alternate_response
http: Turn on verbose Curl messages if GIT_CURL_VERBOSE set in environment
http-fetch: Fix message reporting rename of object file.
http-fetch: Fix object list corruption in fill_active_slots().