gitweb.git
Documentation: clarify git-checkout -f, minor editingJ. Bruce Fields Mon, 16 Apr 2007 04:37:11 +0000 (00:37 -0400)

Documentation: clarify git-checkout -f, minor editing

"Force a re-read of everything" doesn't mean much to me.

Also some minor grammar fixes.

Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Junio C Hamano <junkio@cox.net>

Documentation: minor edits of git-lost-found manpageJ. Bruce Fields Mon, 16 Apr 2007 04:37:10 +0000 (00:37 -0400)

Documentation: minor edits of git-lost-found manpage

Minor improvements to grammar and clarity of lost-found manpage.

Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Junio C Hamano <junkio@cox.net>

Add --quiet option to suppress output of "rm" commands... Steven Grimm Mon, 16 Apr 2007 07:46:48 +0000 (00:46 -0700)

Add --quiet option to suppress output of "rm" commands for removed files.

Signed-off-by: Steven Grimm <koreth@midwinter.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>

Display the subject of the commit just made.Michael S. Tsirkin Mon, 16 Apr 2007 05:51:11 +0000 (08:51 +0300)

Display the subject of the commit just made.

Useful e.g. to figure out what I did from screen history,
or to make sure subject line is short enough and makes sense
on its own.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Junio C Hamano <junkio@cox.net>

Add policy on user-interface changesAndrew Ruder Mon, 16 Apr 2007 05:35:25 +0000 (00:35 -0500)

Add policy on user-interface changes

Documentation/SubmittingPatches: Add note that all user interface changes
should include associated documentation updates.

Signed-off-by: Andrew Ruder <andy@aeruder.net>
Signed-off-by: Junio C Hamano <junkio@cox.net>

Merge branch 'maint'Junio C Hamano Mon, 16 Apr 2007 00:52:07 +0000 (17:52 -0700)

Merge branch 'maint'

* maint:
Document -g (--walk-reflogs) option of git-log
sscanf/strtoul: parse integers robustly
git-blame: Fix overrun in fake_working_tree_commit()
[PATCH] Improve look-and-feel of the gitk tool.
[PATCH] Teach gitk to use the user-defined UI font everywhere.

Document -g (--walk-reflogs) option of git-logAlex Riesen Sun, 15 Apr 2007 22:36:06 +0000 (00:36 +0200)

Document -g (--walk-reflogs) option of git-log

Signed-off-by: Alex Riesen <raa.lkml@gmail.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>

Document git-check-attrJames Bowes Sun, 15 Apr 2007 01:27:20 +0000 (21:27 -0400)

Document git-check-attr

Signed-off-by: James Bowes <jbowes@dangerouslyinc.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>

ident.c: Use size_t (instead of int) to store sizesLuiz Fernando N. Capitulino Sun, 15 Apr 2007 18:51:29 +0000 (15:51 -0300)

ident.c: Use size_t (instead of int) to store sizes

Signed-off-by: Luiz Fernando N. Capitulino <lcapitulino@mandriva.com.br>
Signed-off-by: Junio C Hamano <junkio@cox.net>

ident.c: Use const qualifier for 'struct passwd' parametersLuiz Fernando N. Capitulino Sun, 15 Apr 2007 21:40:31 +0000 (18:40 -0300)

ident.c: Use const qualifier for 'struct passwd' parameters

Signed-off-by: Luiz Fernando N. Capitulino <lcapitulino@mandriva.com.br>
Signed-off-by: Junio C Hamano <junkio@cox.net>

Change attribute negation marker from '!' to '-'.Junio C Hamano Sun, 15 Apr 2007 21:56:09 +0000 (14:56 -0700)

Change attribute negation marker from '!' to '-'.

At the same time, we do not want to allow arbitrary strings for
attribute names, as we are likely to want to extend the syntax
later. Allow only alnum, dash, underscore and dot for now.

Signed-off-by: Junio C Hamano <junkio@cox.net>

Define a built-in attribute macro "binary".Junio C Hamano Sat, 14 Apr 2007 15:56:35 +0000 (08:56 -0700)

Define a built-in attribute macro "binary".

For binary files we would want to disable textual diff
generation and automatic crlf conversion.

Signed-off-by: Junio C Hamano <junkio@cox.net>

attribute macro supportJunio C Hamano Sat, 14 Apr 2007 15:54:37 +0000 (08:54 -0700)

attribute macro support

This adds "attribute macros" (for lack of better name). So far,
we have low-level attributes such as crlf and diff, which are
defined in operational terms --- setting or unsetting them on a
particular path directly affects what is done to the path. For
example, in order to decline diffs or crlf conversions on a
binary blob, no diffs on PostScript files, and treat all other
files normally, you would have something like these:

* diff crlf
*.ps !diff
proprietary.o !diff !crlf

That is fine as the operation goes, but gets unwieldy rather
rapidly, when we start adding more low-level attributes that are
defined in operational terms. A near-term example of such an
attribute would be 'merge-3way' which would control if git
should attempt the usual 3-way file-level merge internally, or
leave merging to a specialized external program of user's
choice. When it is added, we do _not_ want to force the users
to update the above to:

* diff crlf merge-3way
*.ps !diff
proprietary.o !diff !crlf !merge-3way

The way this patch solves this issue is to realize that the
attributes the user is assigning to paths are not defined in
terms of operations but in terms of what they are.

All of the three low-level attributes usually make sense for
most of the files that sane SCM users have git operate on (these
files are typically called "text'). Only a few cases, such as
binary blob, need exception to decline the "usual treatment
given to text files" -- and people mark them as "binary".

So this allows the $GIT_DIR/info/alternates and .gitattributes
at the toplevel of the project to also specify attributes that
assigns other attributes. The syntax is '[attr]' followed by an
attribute name followed by a list of attribute names:

[attr] binary !diff !crlf !merge-3way

When "binary" attribute is set to a path, if the path has not
got diff/crlf/merge-3way attribute set or unset by other rules,
this rule unsets the three low-level attributes.

It is expected that the user level .gitattributes will be
expressed mostly in terms of attributes based on what the files
are, and the above sample would become like this:

(built-in attribute configuration)
[attr] binary !diff !crlf !merge-3way
* diff crlf merge-3way

(project specific .gitattributes)
proprietary.o binary

(user preference $GIT_DIR/info/attributes)
*.ps !diff

There are a few caveats.

* As described above, you can define these macros only in
$GIT_DIR/info/attributes and toplevel .gitattributes.

* There is no attempt to detect circular definition of macro
attributes, and definitions are evaluated from bottom to top
as usual to fill in other attributes that have not yet got
values. The following would work as expected:

[attr] text diff crlf
[attr] ps text !diff
*.ps ps

while this would most likely not (I haven't tried):

[attr] ps text !diff
[attr] text diff crlf
*.ps ps

* When a macro says "[attr] A B !C", saying that a path does
not have attribute A does not let you tell anything about
attributes B or C. That is, given this:

[attr] text diff crlf
[attr] ps text !diff
*.txt !ps

path hello.txt, which would match "*.txt" pattern, would have
"ps" attribute set to zero, but that does not make text
attribute of hello.txt set to false (nor diff attribute set to
true).

Signed-off-by: Junio C Hamano <junkio@cox.net>

Makefile: add patch-ids.h back in.Junio C Hamano Sun, 15 Apr 2007 20:39:32 +0000 (13:39 -0700)

Makefile: add patch-ids.h back in.

I lost it by mistake while shuffling the gitattributes series which
originally was on top of the subproject topic onto the master branch.

Signed-off-by: Junio C Hamano <junkio@cox.net>

Fix 'diff' attribute semantics.Junio C Hamano Sun, 15 Apr 2007 21:35:11 +0000 (14:35 -0700)

Fix 'diff' attribute semantics.

This is in the same spirit as the previous one. Earlier 'diff'
meant 'do the built-in binary heuristics and disable patch text
generation based on it' while '!diff' meant 'do not guess, do
not generate patch text'. There was no way to say 'do generate
patch text even when the heuristics says it has NUL in it'.

Signed-off-by: Junio C Hamano <junkio@cox.net>

Fix 'crlf' attribute semantics.Junio C Hamano Sun, 15 Apr 2007 20:35:45 +0000 (13:35 -0700)

Fix 'crlf' attribute semantics.

Earlier we said 'crlf lets the path go through core.autocrlf
process while !crlf disables it altogether'. This fixes the
semantics to:

- Lack of 'crlf' attribute makes core.autocrlf to apply
(i.e. we guess based on the contents and if platform
expresses its desire to have CRLF line endings via
core.autocrlf, we do so).

- Setting 'crlf' attribute to true forces CRLF line endings in
working tree files, even if blob does not look like text
(e.g. contains NUL or other bytes we consider binary).

- Setting 'crlf' attribute to false disables conversion.

Signed-off-by: Junio C Hamano <junkio@cox.net>

Expose subprojects as special files to "git diff" machineryLinus Torvalds Sun, 15 Apr 2007 18:14:28 +0000 (11:14 -0700)

Expose subprojects as special files to "git diff" machinery

The same way we generate diffs on symlinks as the the diff of text of the
symlink, we can generate subproject diffs (when not recursing into them!)
as the diff of the text that describes the subproject.

Of course, since what descibes a subproject is just the SHA1, that's what
we'll use. Add some pretty-printing to make it a bit more obvious what is
going on, and we're done.

So with this, we can get both raw diffs and "textual" diffs of subproject
changes:

- git diff --raw:

:160000 160000 2de597b5ad348b7db04bd10cdd38cd81cbc93ab5 0000000... M sub-A

- git diff:

diff --git a/sub-A b/sub-A
index 2de597b..e8f11a4 160000
--- a/sub-A
+++ b/sub-A
@@ -1 +1 @@
-Subproject commit 2de597b5ad348b7db04bd10cdd38cd81cbc93ab5
+Subproject commit e8f11a45c5c6b9e2fec6d136d3fb5aff75393d42

NOTE! We'll also want to have the ability to recurse into the subproject
and actually diff it recursively, but that will involve a new command line
option (I'd suggest "--subproject" and "-S", but the latter is in use by
pickaxe), and some very different code.

But regardless of ay future recursive behaviour, we need the non-recursive
version too (and it should be the default, at least in the absense of
config options, so that large superprojects don't default to something
extremely expensive).

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>

git-gui: Display the directory basename in the titleShawn O. Pearce Sat, 14 Apr 2007 19:10:48 +0000 (15:10 -0400)

git-gui: Display the directory basename in the title

By showing the basename of the directory very early in the
title bar I can more easily locate a particular git-gui
session when I have 8 open at once and my Windows taskbar
is overflowing with items.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>

Merge branch 'er/ui'Shawn O. Pearce Sun, 15 Apr 2007 04:34:28 +0000 (00:34 -0400)

Merge branch 'er/ui'

* er/ui:
Always bind the return key to the default button
Do not break git-gui messages into multiple lines.
Improve look-and-feel of the git-gui tool.
Teach git-gui to use the user-defined UI font everywhere.
Allow wish interpreter to be defined with TCLTK_PATH

sscanf/strtoul: parse integers robustlyJim Meyering Mon, 9 Apr 2007 23:01:44 +0000 (01:01 +0200)

sscanf/strtoul: parse integers robustly

* builtin-grep.c (strtoul_ui): Move function definition from here, to...
* git-compat-util.h (strtoul_ui): ...here, with an added "base" parameter.
* builtin-grep.c (cmd_grep): Update use of strtoul_ui to include base, "10".
* builtin-update-index.c (read_index_info): Diagnose an invalid mode integer
that is out of range or merely larger than INT_MAX.
(cmd_update_index): Use strtoul_ui, not sscanf.
* convert-objects.c (write_subdirectory): Likewise.

Signed-off-by: Jim Meyering <jim@meyering.net>
Signed-off-by: Junio C Hamano <junkio@cox.net>

Merge git://git2./pub/scm/gitk/gitk into maintJunio C Hamano Sun, 15 Apr 2007 02:45:16 +0000 (19:45 -0700)

Merge git://git2./pub/scm/gitk/gitk into maint

* git://git2.kernel.org/pub/scm/gitk/gitk:
[PATCH] Improve look-and-feel of the gitk tool.
[PATCH] Teach gitk to use the user-defined UI font everywhere.

Fix some "git ls-files -o" fallout from gitlinksLinus Torvalds Sat, 14 Apr 2007 23:22:08 +0000 (16:22 -0700)

Fix some "git ls-files -o" fallout from gitlinks

Since "git ls-files" doesn't really pass down any details on what it
really wants done to the directory walking code, the directory walking
code doesn't really know whether the caller wants to know about gitlink
directories, or whether it wants to just know about ignored files.

So the directory walking code will return those gitlink directories unless
the caller has explicitly told it not to ("dir->show_other_directories"
tells the directory walker to only show "other" directories).

This kind of confuses "git ls-files -o", because
- it didn't really expect to see entries listed that were already in the
index, unless they were unmerged, and would die on that unexpected
setup, rather than just "continue".
- it didn't know how to match directory entries with the final "/"

This trivial change updates the "show_other_files()" function to handle
both of these issues gracefully. There really was no reason to die, when
the obviously correct thing for the function was to just ignore files it
already knew about (that's what "other" means here!).

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>

git-blame: Fix overrun in fake_working_tree_commit()Michael Spang Sat, 14 Apr 2007 21:26:20 +0000 (17:26 -0400)

git-blame: Fix overrun in fake_working_tree_commit()

git-blame would overflow commit->buffer when annotating files with long paths.

Signed-off-by: Michael Spang <mspang@uwaterloo.ca>
Signed-off-by: Junio C Hamano <junkio@cox.net>

Teach 'diff' about 'diff' attribute.Junio C Hamano Fri, 13 Apr 2007 06:05:29 +0000 (23:05 -0700)

Teach 'diff' about 'diff' attribute.

This makes paths that explicitly unset 'diff' attribute not to
produce "textual" diffs from 'git-diff' family.

Signed-off-by: Junio C Hamano <junkio@cox.net>

Define 'crlf' attribute.Junio C Hamano Fri, 13 Apr 2007 05:30:05 +0000 (22:30 -0700)

Define 'crlf' attribute.

This defines the semantics of 'crlf' attribute as an example.
When a path has this attribute unset (i.e. '!crlf'), autocrlf
line-end conversion is not applied.

Eventually we would want to let users to build a pipeline of
processing to munge blob data to filesystem format (and in the
other direction) based on combination of attributes, and at that
point the mechanism in convert_to_{git,working_tree}() that
looks at 'crlf' attribute needs to be enhanced. Perhaps the
existing 'crlf' would become the first step in the input chain,
and the last step in the output chain.

Signed-off-by: Junio C Hamano <junkio@cox.net>

Add basic infrastructure to assign attributes to pathsJunio C Hamano Thu, 12 Apr 2007 08:07:32 +0000 (01:07 -0700)

Add basic infrastructure to assign attributes to paths

This adds the basic infrastructure to assign attributes to
paths, in a way similar to what the exclusion mechanism does
based on $GIT_DIR/info/exclude and .gitignore files.

An attribute is just a simple string that does not contain any
whitespace. They can be specified in $GIT_DIR/info/attributes
file, and .gitattributes file in each directory.

Each line in these files defines a pattern matching rule.
Similar to the exclusion mechanism, a later match overrides an
earlier match in the same file, and entries from .gitattributes
file in the same directory takes precedence over the ones from
parent directories. Lines in $GIT_DIR/info/attributes file are
used as the lowest precedence default rules.

A line is either a comment (an empty line, or a line that begins
with a '#'), or a rule, which is a whitespace separated list of
tokens. The first token on the line is a shell glob pattern.
The rest are names of attributes, each of which can optionally
be prefixed with '!'. Such a line means "if a path matches this
glob, this attribute is set (or unset -- if the attribute name
is prefixed with '!'). For glob matching, the same "if the
pattern does not have a slash in it, the basename of the path is
matched with fnmatch(3) against the pattern, otherwise, the path
is matched with the pattern with FNM_PATHNAME" rule as the
exclusion mechanism is used.

This does not define what an attribute means. Tying an
attribute to various effects it has on git operation for paths
that have it will be specified separately.

Signed-off-by: Junio C Hamano <junkio@cox.net>

Merge branch 'maint'Junio C Hamano Sat, 14 Apr 2007 11:18:46 +0000 (04:18 -0700)

Merge branch 'maint'

* maint:
git-quiltimport complaining yet still working
config.txt: Fix grammatical error in description of http.noEPSV
config.txt: Change pserver to server in description of gitcvs.*
config.txt: Document core.autocrlf
config.txt: Document gitcvs.allbinary
Do not default to --no-index when given two directories.
Use rev-list --reverse in git-rebase.sh

git-quiltimport complaining yet still workingLinus Torvalds Fri, 13 Apr 2007 21:34:18 +0000 (14:34 -0700)

git-quiltimport complaining yet still working

There were two bugs: "stop_here" doesn't exist, but the bug that causes
this code to trigger in the *first* place is the wrong use of "$dotest".
It should be ".dotest"

This is essentially the same bug introduced by 87ab7992, one was
fixed with 0d38ab25 but this was somehow left behind.

Signed-off-by: Junio C Hamano <junkio@cox.net>

Replace a pair of patches with updated ones for subproj... Junio C Hamano Sat, 14 Apr 2007 10:21:56 +0000 (03:21 -0700)

Replace a pair of patches with updated ones for subproject support.

This series of three patches is a *replacement* for the patch series of
two patches (plus one-liner fixup) I sent yesterday.

It fixes the issue I noted with "git status" incorrectly
claiming that a non-checked out subproject wasn't clean - that
was just a total thinko in the code (we were checking the
filesystem mode against S_IFDIRLNK, which obviously cannot work,
since S_IFDIRLINK is a git-internal state, not a filesystem
state).

It then re-sends the two patches on top of that, with the fix
for checking out superprojects (we should *not* mess up any
existing subproject directories, certainly not remove them - if
we already have a directory in the place where we now want a
subproject, we should leave it well alone!)

The first one really is a fix, and it makes the commit
commentary about a remaining bug in the patch I sent out
yesterday go away.

Teach "git-read-tree -u" to check out submodules as... Linus Torvalds Fri, 13 Apr 2007 16:26:04 +0000 (09:26 -0700)

Teach "git-read-tree -u" to check out submodules as a directory

This actually allows us to check out a supermodule after cloning, although
the submodules themselves will obviously not be checked out, and will just
be empty directories.

Checking out the submodules will be up to higher levels - we may not even
want to!

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>

Teach git list-objects logic to not follow gitlinksLinus Torvalds Fri, 13 Apr 2007 16:25:01 +0000 (09:25 -0700)

Teach git list-objects logic to not follow gitlinks

This allows us to pack superprojects and thus clone them (but not yet
check them out on the receiving side.. That's the next patch)

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>

Fix gitlink index entry filesystem matchingLinus Torvalds Fri, 13 Apr 2007 16:24:13 +0000 (09:24 -0700)

Fix gitlink index entry filesystem matching

The code to match up index entries with the filesystem was stupidly
broken. We shouldn't compare the filesystem stat() information with
S_IFDIRLNK, since that's purely a git-internal value, and not what the
filesystem uses (on the filesystem, it's just a regular directory).

Also, don't bother to make the stat() time comparisons etc for DIRLNK
entries in ce_match_stat_basic(), since we do an exact match for these
things, and the hints in the stat data simply doesn't matter.

This fixes "git status" with submodules that haven't been checked out in
the supermodule.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>

config.txt: Add gitcvs.db* variablesFrank Lichtenheld Fri, 13 Apr 2007 16:13:42 +0000 (18:13 +0200)

config.txt: Add gitcvs.db* variables

Adds documentation for gitcvs.{dbname,dbdriver,dbuser,dbpass}
Texts are mostly taken from git-cvsserver.txt whith some
adaptions so that they make more sense out of the context
of the original man page.

Signed-off-by: Frank Lichtenheld <frank@lichtenheld.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>

config.txt: Fix grammatical error in description of... Frank Lichtenheld Fri, 13 Apr 2007 16:02:33 +0000 (18:02 +0200)

config.txt: Fix grammatical error in description of http.noEPSV

s/doesn't/don't/ since "ftp servers" is plural

Signed-off-by: Frank Lichtenheld <frank@lichtenheld.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>

config.txt: Change pserver to server in description... Frank Lichtenheld Fri, 13 Apr 2007 16:02:32 +0000 (18:02 +0200)

config.txt: Change pserver to server in description of gitcvs.*

These variables apply to the SSH access as well, so don't use
pserver here which might confuse users.

Signed-off-by: Frank Lichtenheld <frank@lichtenheld.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>

config.txt: Document core.autocrlfFrank Lichtenheld Fri, 13 Apr 2007 16:02:31 +0000 (18:02 +0200)

config.txt: Document core.autocrlf

Text shamelessly stolen from the 1.5.1 release notes.

Signed-off-by: Frank Lichtenheld <frank@lichtenheld.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>

config.txt: Document gitcvs.allbinaryFrank Lichtenheld Fri, 13 Apr 2007 16:02:30 +0000 (18:02 +0200)

config.txt: Document gitcvs.allbinary

Signed-off-by: Frank Lichtenheld <frank@lichtenheld.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>

Do not default to --no-index when given two directories.Junio C Hamano Fri, 13 Apr 2007 10:23:20 +0000 (03:23 -0700)

Do not default to --no-index when given two directories.

git-diff -- a/ b/ always defaulted to --no-index, primarily
because the function is_in_index() was implemented quite
incorrectly.

Noticed by Patrick Maaß and Simon Schubert independently,
initial patch was provided by Patrick but I fixed it
differently.

Signed-off-by: Junio C Hamano <junkio@cox.net>

Use rev-list --reverse in git-rebase.shAlex Riesen Fri, 13 Apr 2007 22:19:05 +0000 (00:19 +0200)

Use rev-list --reverse in git-rebase.sh

...and drop the last perl dependency in the script.

Signed-off-by: Alex Riesen <raa.lkml@gmail.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>

Teach "git-read-tree -u" to check out submodules as... Linus Torvalds Fri, 13 Apr 2007 04:08:52 +0000 (21:08 -0700)

Teach "git-read-tree -u" to check out submodules as a directory

This actually allows us to check out a supermodule after cloning, although
the submodules will obviously not be checked out, and will just be an
empty subdirectory.

[ Side note: this also shows that we currently don't correctly handle
such subprojects that aren't checked out correctly yet. They should
always show up as not being modified, but failing to resolve the
gitlink HEAD does not properly trigger the "not modified" logic in all
places it needs to..

So more work to be done, but that's a separate issue, unrelated to
the action of checking out the superproject. ]

The bulk of this patch is simply because we need to check the type of the
index entry *before* we try to read the object it points to, and that
meant that the code needed some re-organization. So I moved some of the
code in common to both symlinks and files to be a trivial helper function.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>

Teach git list-objects logic not to follow gitlinksLinus Torvalds Fri, 13 Apr 2007 04:03:39 +0000 (21:03 -0700)

Teach git list-objects logic not to follow gitlinks

This allows us to pack superprojects and thus clone them (but not yet
check them out on the receiving side - that's the next patch)

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>

Merge branch 'jc/cherry'Junio C Hamano Fri, 13 Apr 2007 04:04:27 +0000 (21:04 -0700)

Merge branch 'jc/cherry'

* jc/cherry:
Documentation: --cherry-pick
git-log --cherry-pick A...B
Refactor patch-id filtering out of git-cherry and git-format-patch.
Add %m to '--pretty=format:'

Merge branch 'maint'Junio C Hamano Fri, 13 Apr 2007 04:04:09 +0000 (21:04 -0700)

Merge branch 'maint'

* maint:
handle_options in git wrapper miscounts the options it handled.

handle_options in git wrapper miscounts the options... Matthias Lederhofer Thu, 12 Apr 2007 18:52:03 +0000 (20:52 +0200)

handle_options in git wrapper miscounts the options it handled.

handle_options did not count the number of used arguments
correctly. When --git-dir was used the extra argument was
not added to the number of handled arguments.

Signed-off-by: Matthias Lederhofer <matled@gmx.net>
Signed-off-by: Junio C Hamano <junkio@cox.net>

Fix git {log,show,...} --pretty=emailJunio C Hamano Thu, 12 Apr 2007 10:04:05 +0000 (03:04 -0700)

Fix git {log,show,...} --pretty=email

An earlier --subject-prefix patch forgot that format-patch is
not the only codepath that adds the "[PATCH]" prefix, and broke
everybody else in the log family.

Signed-off-by: Junio C Hamano <junkio@cox.net>

Don't yap about merge-subtree during makeShawn O. Pearce Thu, 12 Apr 2007 05:21:18 +0000 (01:21 -0400)

Don't yap about merge-subtree during make

By default we are pretty quiet about the actual commands that
we are running. So we should continue to be quiet about the new
merge-subtree hardlink to merge-recursive. Technically this is not
a builtin, but it is close because subtree is actually builtin to
a non-builtin. So lets just make things easy and call it a builtin.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>

Don't show gitlink directories when we want "other... Linus Torvalds Thu, 12 Apr 2007 21:32:21 +0000 (14:32 -0700)

Don't show gitlink directories when we want "other" files

When "show_other_directories" is set, that implies that we are looking
for untracked files, which obviously means that we should ignore
directories that are marked as gitlinks in the index.

This fixes "git status" in a superproject, that would otherwise always
report that subprojects were "Untracked files:"

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>

cvsserver: Document the GIT branches -> CVS modules... Frank Lichtenheld Thu, 12 Apr 2007 14:54:28 +0000 (16:54 +0200)

cvsserver: Document the GIT branches -> CVS modules mapping more prominently

Add a note about the branches -> modules mapping to LIMITATIONS because
I really think it should be noted there and not just at the end of
the installation step-by-step HOWTO.

I used "git branches" there and changed "heads" to "branches" in
my section about database configuration. I'm reluctant to replace
all occourences of "head" with "branch" though because you always
have to say "git branch" because CVS also has the concept of
branches. You can say "head" though, because there is no such
concept in CVS. In all the existing occourences of head other than
the one I changed I think "head" flows better in the text.

Signed-off-by: Frank Lichtenheld <frank@lichtenheld.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>

Teach git-update-index about gitlinksLinus Torvalds Thu, 12 Apr 2007 19:29:40 +0000 (12:29 -0700)

Teach git-update-index about gitlinks

I finally got around to looking at Alex' patch to teach update-index about
gitlinks too, so that "git commit -a" along with any other explicit
update-index scripts can work.

I don't think there was anything wrong with Alex' patch, but the code he
patched I felt was just so ugly that the added cases just pushed it over
the edge. Especially as I don't think that patch necessarily did the right
thing for a gitlink entry that already existed in the index, but that
wasn't actually a real git repository in the working tree (just an empty
subdirectory or a non-git snapshot because it hadn't wanted to track that
particular subproject).

So I ended up deciding to clean up the git-update-index handling the same
way I tackled the directory traversal used by git-add earlier: by
splitting the different cases up into multiple smaller functions, and just
making the code easier to read (and adding more comments about the
different cases).

So this replaces the old "process_file()" with a new "process_path()"
function that then just calls out to different helper functions depending
on what kind of path it is. Processing a nondirectory ends up being just
one of the simpler cases.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>

cvsserver: Reword documentation on necessity of write... Frank Lichtenheld Thu, 12 Apr 2007 14:43:36 +0000 (16:43 +0200)

cvsserver: Reword documentation on necessity of write access

Reworded the section about git-cvsserver needing to update the
database.

Signed-off-by: Frank Lichtenheld <frank@lichtenheld.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>

cvsserver: Allow to "add" a removed fileFrank Lichtenheld Wed, 11 Apr 2007 22:51:33 +0000 (00:51 +0200)

cvsserver: Allow to "add" a removed file

CVS allows you to add a removed file (where the
removal is not yet committed) which will
cause the server to send the latest revision of the
file and to delete the "removed" status.

Copy this behaviour.

Signed-off-by: Frank Lichtenheld <frank@lichtenheld.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>

Documentation: --cherry-pickJunio C Hamano Tue, 10 Apr 2007 22:28:32 +0000 (15:28 -0700)

Documentation: --cherry-pick

Signed-off-by: Junio C Hamano <junkio@cox.net>

git-log --cherry-pick A...BJunio C Hamano Mon, 9 Apr 2007 10:40:38 +0000 (03:40 -0700)

git-log --cherry-pick A...B

This is meant to be a saner replacement for "git-cherry".

When used with "A...B", this filters out commits whose patch
text has the same patch-id as a commit on the other side. It
would probably most useful to use with --left-right.

Signed-off-by: Junio C Hamano <junkio@cox.net>

Refactor patch-id filtering out of git-cherry and git... Junio C Hamano Tue, 10 Apr 2007 00:01:27 +0000 (17:01 -0700)

Refactor patch-id filtering out of git-cherry and git-format-patch.

This implements the patch-id computation and recording library,
patch-ids.c, and rewrites the get_patch_ids() function used in
cherry and format-patch to use it, so that they do not pollute
the object namespace. Earlier code threw non-objects into the
in-core object database, and hoped for not getting bitten by
SHA-1 collisions. While it may be practically Ok, it still was
an ugly hack.

Signed-off-by: Junio C Hamano <junkio@cox.net>

Add %m to '--pretty=format:'Junio C Hamano Mon, 9 Apr 2007 09:34:05 +0000 (02:34 -0700)

Add %m to '--pretty=format:'

When used with '--boundary A...B', this shows the -/</> marker
you would get with --left-right option to 'git-log' family.
When symmetric diff is not used, everybody is shown to be on the
"right" branch.

Signed-off-by: Junio C Hamano <junkio@cox.net>

clean up add_object_entry()Nicolas Pitre Wed, 11 Apr 2007 02:54:36 +0000 (22:54 -0400)

clean up add_object_entry()

This function used to call locate_object_entry_hash() _twice_ per added
object while only once should suffice. Let's reorganize that code a bit.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>

tests for various pack index featuresNicolas Pitre Tue, 10 Apr 2007 20:26:10 +0000 (16:26 -0400)

tests for various pack index features

This is a fairly complete list of tests for various aspects of pack
index versions 1 and 2.

Tests on index v2 include 32-bit and 64-bit offsets, as well as a nice
demonstration of the flawed repacking integrity checks that index
version 2 intend to solve over index version 1 with the per object CRC.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>

use test-genrandom in tests instead of /dev/urandomNicolas Pitre Wed, 11 Apr 2007 17:35:13 +0000 (13:35 -0400)

use test-genrandom in tests instead of /dev/urandom

This way tests are completely deterministic and possibly more portable.

Signed-off-by: Nicolas Pitre <nico@cam.org>

simple random data generator for testsNicolas Pitre Wed, 11 Apr 2007 17:59:51 +0000 (13:59 -0400)

simple random data generator for tests

Reliance on /dev/urandom produces test vectors that are, well, random.
This can cause problems impossible to track down when the data is
different from one test invokation to another.

The goal is not to have random data to test, but rather to have a
convenient way to create sets of large files with non compressible and
non deltifiable data in a reproducible way.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>

sscanf/strtoul: parse integers robustlyJim Meyering Mon, 9 Apr 2007 23:01:44 +0000 (01:01 +0200)

sscanf/strtoul: parse integers robustly

* builtin-grep.c (strtoul_ui): Move function definition from here, to...
* git-compat-util.h (strtoul_ui): ...here, with an added "base" parameter.
* builtin-grep.c (cmd_grep): Update use of strtoul_ui to include base, "10".
* builtin-update-index.c (read_index_info): Diagnose an invalid mode integer
that is out of range or merely larger than INT_MAX.
(cmd_update_index): Use strtoul_ui, not sscanf.
* convert-objects.c (write_subdirectory): Likewise.

Signed-off-by: Jim Meyering <jim@meyering.net>
Signed-off-by: Junio C Hamano <junkio@cox.net>

Teach directory traversal about subprojectsLinus Torvalds Wed, 11 Apr 2007 21:49:44 +0000 (14:49 -0700)

Teach directory traversal about subprojects

This is the promised cleaned-up version of teaching directory traversal
(ie the "read_directory()" logic) about subprojects. That makes "git add"
understand to add/update subprojects.

It now knows to look at the index file to see if a directory is marked as
a subproject, and use that as information as whether it should be recursed
into or not.

It also generally cleans up the handling of directory entries when
traversing the working tree, by splitting up the decision-making process
into small functions of their own, and adding a fair number of comments.

Finally, it teaches "add_file_to_cache()" that directory names can have
slashes at the end, since the directory traversal adds them to make the
difference between a file and a directory clear (it always did that, but
my previous too-ugly-to-apply subproject patch had a totally different
path for subproject directories and avoided the slash for that case).

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>

Add testcase for format-patch --subject-prefix (take 3)Robin H. Johnson Wed, 11 Apr 2007 23:58:08 +0000 (16:58 -0700)

Add testcase for format-patch --subject-prefix (take 3)

Add testcase for format-patch --subject-prefix support.

Signed-off-by: Robin H. Johnson <robbat2@gentoo.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>

Add custom subject prefix support to format-patch ... Robin H. Johnson Wed, 11 Apr 2007 23:58:07 +0000 (16:58 -0700)

Add custom subject prefix support to format-patch (take 3)

Add a new option to git-format-patch, entitled --subject-prefix that allows
control of the subject prefix '[PATCH]'. Using this option, the text 'PATCH' is
replaced with whatever input is provided to the option. This allows easily
generating patches like '[PATCH 2.6.21-rc3]' or properly numbered series like
'[-mm3 PATCH N/M]'. This patch provides the implementation and documentation.

Signed-off-by: Robin H. Johnson <robbat2@gentoo.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>

Merge branch 'maint'Junio C Hamano Thu, 12 Apr 2007 01:43:01 +0000 (18:43 -0700)

Merge branch 'maint'

* maint:
GIT 1.5.1.1
cvsserver: Fix handling of diappeared files on update
fsck: do not complain on detached HEAD.
(encode_85, decode_85): Mark source buffer pointer as "const".

Fix thinko in subproject entry sortingLinus Torvalds Wed, 11 Apr 2007 21:39:12 +0000 (14:39 -0700)

Fix thinko in subproject entry sorting

This fixes a total thinko in my original series: subprojects do *not* sort
like directories, because the index is sorted purely by full pathname, and
since a subproject shows up in the index as a normal NUL-terminated
string, it never has the issues with sorting with the '/' at the end.

So if you have a subproject "proj" and a file "proj.c", the subproject
sorts alphabetically before the file in the index (and must thus also sort
that way in a tree object, since trees sort as the index).

In contrast, it you have two files "proj/file" and "proj.c", the "proj.c"
will sort alphabetically before "proj/file" in the index. The index
itself, of course, does not actually contain an entry "proj/", but in the
*tree* that gets written out, the tree entry "proj" will sort after the
file entry "proj.c", which is the only real magic sorting rule.

In other words: the magic sorting rule only affects tree entries, and
*only* affects tree entries that point to other trees (ie are of the type
S_IFDIR).

Anyway, that thinko just means that we should remove the special case to
make S_ISDIRLNK entries sort like S_ISDIR entries. They don't. They sort
like normal files.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>

GIT 1.5.1.1 v1.5.1.1Junio C Hamano Wed, 11 Apr 2007 21:39:07 +0000 (14:39 -0700)

GIT 1.5.1.1

Signed-off-by: Junio C Hamano <junkio@cox.net>

cvsserver: Fix handling of diappeared files on updateFrank Lichtenheld Wed, 11 Apr 2007 20:38:19 +0000 (22:38 +0200)

cvsserver: Fix handling of diappeared files on update

Only send a modified response if the client sent a
"Modified" entry. This fixes the case where the
file was locally deleted on the client without
being removed from CVS. In this case the client
will only have sent the Entry for the file but nothing
else.

Signed-off-by: Frank Lichtenheld <frank@lichtenheld.de>
Acked-by: Martin Langhoff <martin@catalyst.net.nz>
Acked-by: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>

fsck: do not complain on detached HEAD.Junio C Hamano Wed, 11 Apr 2007 08:28:43 +0000 (01:28 -0700)

fsck: do not complain on detached HEAD.

Detached HEAD is just a normal state of a repository. Do not
say anything about it.

Do not give worrying "error:" messages when we let the user know
that the HEAD points at nothing (i.e. yet to be born branch),
nor we do not have any default refs to start following the
objects chain. Reword them as "notice:".

Signed-off-by: Junio C Hamano <junkio@cox.net>

(encode_85, decode_85): Mark source buffer pointer... Jim Meyering Mon, 9 Apr 2007 22:56:33 +0000 (00:56 +0200)

(encode_85, decode_85): Mark source buffer pointer as "const".

Signed-off-by: Jim Meyering <jim@meyering.net>
Signed-off-by: Junio C Hamano <junkio@cox.net>

gitweb: Allow configuring the default projects order... Frank Lichtenheld Fri, 6 Apr 2007 21:58:24 +0000 (23:58 +0200)

gitweb: Allow configuring the default projects order and add order 'none'

Introduce new configuration variable $default_projects_order
that can be used to specify the default order of projects on
the index page if no 'o' parameter is given.

Allow a new value 'none' for order that will cause the projects
to be in the order we learned about them. In case of reading the
list of projects from a file, this should be the order as they are
listed in the file. In case of reading the list of projects from
a directory this will probably give random results depending on the
filesystem in use.

Signed-off-by: Frank Lichtenheld <frank@lichtenheld.de>
Acked-by: Petr Baudis <pasky@suse.cz>
Signed-off-by: Junio C Hamano <junkio@cox.net>

gitweb: Allow forks with project list fileFrank Lichtenheld Fri, 6 Apr 2007 21:58:11 +0000 (23:58 +0200)

gitweb: Allow forks with project list file

Make it possible to use the forks feature even when
reading the list of projects from a file, by creating
a list of known prefixes as we go. Forks have to be
listed after the main project in order to be recognised
as such.

Signed-off-by: Frank Lichtenheld <frank@lichtenheld.de>
Acked-by: Petr Baudis <pasky@suse.cz>
Signed-off-by: Junio C Hamano <junkio@cox.net>

Merge branch 'maint'Junio C Hamano Tue, 10 Apr 2007 20:53:07 +0000 (13:53 -0700)

Merge branch 'maint'

* maint:
Documentation: show-ref: document --exclude-existing
cvsexportcommit -p : fix the usage of git-apply -C.

Teach core object handling functions about gitlinksLinus Torvalds Tue, 10 Apr 2007 04:20:29 +0000 (21:20 -0700)

Teach core object handling functions about gitlinks

This teaches the really fundamental core SHA1 object handling routines
about gitlinks. We can compare trees with gitlinks in them (although we
can not actually generate patches for them yet - just raw git diffs),
and they show up as commits in "git ls-tree".

We also know to compare gitlinks as if they were directories (ie the
normal "sort as trees" rules apply).

[jc: amended a cut&paste error]

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>

Teach "fsck" not to follow subproject linksLinus Torvalds Tue, 10 Apr 2007 04:15:29 +0000 (21:15 -0700)

Teach "fsck" not to follow subproject links

Since the subprojects don't necessarily even exist in the current tree,
much less in the current git repository (they are totally independent
repositories), we do not want to try to follow the chain from one git
repository to another through a gitlink.

This involves teaching fsck to ignore references to gitlink objects from
a tree and from the current index.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>

Add "S_IFDIRLNK" file mode infrastructure for git linksLinus Torvalds Tue, 10 Apr 2007 04:14:58 +0000 (21:14 -0700)

Add "S_IFDIRLNK" file mode infrastructure for git links

This just adds the basic helper functions to recognize and work with git
tree entries that are links to other git repositories ("subprojects").
They still aren't actually connected up to any of the code-paths, but
now all the infrastructure is in place.

The next commit will start actually adding actual subproject support.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>

Add 'resolve_gitlink_ref()' helper functionLinus Torvalds Tue, 10 Apr 2007 04:14:26 +0000 (21:14 -0700)

Add 'resolve_gitlink_ref()' helper function

This new function resolves a ref in *another* git repository. It's
named for its intended use: to look up the git link to a subproject.

It's not actually wired up to anything yet, but we're getting closer to
having fundamental plumbing support for "links" from one git directory
to another, which is the basis of subproject support.

[jc: amended a FILE* leak]

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>

git-fetch: use fetch--tool pick-rref to avoid local... Junio C Hamano Thu, 5 Apr 2007 10:22:55 +0000 (03:22 -0700)

git-fetch: use fetch--tool pick-rref to avoid local fetch from alternate

When we are fetching from a repository that is on a local
filesystem, first check if we have all the objects that we are
going to fetch available locally, by not just checking the tips
of what we are fetching, but with a full reachability analysis
to our existing refs. In such a case, we do not have to run
git-fetch-pack which would send many needless objects. This is
especially true when the other repository is an alternate of the
current repository (e.g. perhaps the repository was created by
running "git clone -l -s" from there).

The useless objects transferred used to be discarded when they
were expanded by git-unpack-objects called from git-fetch-pack,
but recent git-fetch-pack prefers to keep the data it receives
from the other end without exploding them into loose objects,
resulting in a pack full of duplicated data when fetching from
your own alternate.

This also uses fetch--tool pick-rref on dumb transport side to
remove a shell loop to do the same.

Signed-off-by: Junio C Hamano <junkio@cox.net>

git-fetch--tool pick-rrefJunio C Hamano Thu, 5 Apr 2007 10:22:54 +0000 (03:22 -0700)

git-fetch--tool pick-rref

This script helper takes list of fully qualified refnames and
results from ls-remote and grabs only the lines for the named
refs from the latter.

Signed-off-by: Junio C Hamano <junkio@cox.net>

t3030: merge-recursive backend test.Junio C Hamano Sat, 7 Apr 2007 14:17:35 +0000 (07:17 -0700)

t3030: merge-recursive backend test.

We have fairly extensive coverage of read-tree 3-way machinery,
and many Porcelain-ish tests use git-merge front-end tests, but
we did not have good basic test for merge-recursive, which made
it very hard to hack on it.

I used this during the recent work to teach D/F conflicts to
merge-recursive.

Signed-off-by: Junio C Hamano <junkio@cox.net>

merge-recursive: handle D/F conflict case more carefully.Junio C Hamano Sat, 7 Apr 2007 13:41:13 +0000 (06:41 -0700)

merge-recursive: handle D/F conflict case more carefully.

When a path D that originally was blob in the ancestor was
modified on our branch while it was removed on the other branch,
we keep stages 1 and 2, and leave our version in the working
tree. If the other branch created a path D/F, however, that
path can cleanly be resolved in the index (after all, the
ancestor nor we do not have it and only the other side added),
but it cannot be checked out. The issue is the same when the
other branch had D and we had renamed it to D/F, or the ancestor
had D/F instead of D (so there are four combinations).

Do not stop the merge, but leave both D and D/F paths in the
index so that the user can clear things up.

Signed-off-by: Junio C Hamano <junkio@cox.net>

merge-recursive: do not barf on "to be removed" entries.Junio C Hamano Sat, 7 Apr 2007 12:52:57 +0000 (05:52 -0700)

merge-recursive: do not barf on "to be removed" entries.

When update-trees::threeway_merge() decides that a path that
exists in the current index (and HEAD) is to be removed, it
leaves a stage 0 entry whose mode bits are set to 0. The code
mistook it as "this stage wants the blob here", and proceeded
to call update_file_flags() which ended up trying to put the
mode=0 entry in the index, got very confused, and ended up
barfing with "do not know what to do with 000000".

Since threeway_merge() does not handle case #10 (one side
removes while the other side does not do anything), this is not
a problem while we refuse to merge branches that have D/F
conflicts, but when we start resolving them, we would need to be
able to remove cache entries, and at that point it starts to
matter.

Signed-off-by: Junio C Hamano <junkio@cox.net>

Treat D/F conflict entry more carefully in unpack-trees... Junio C Hamano Sat, 7 Apr 2007 12:49:19 +0000 (05:49 -0700)

Treat D/F conflict entry more carefully in unpack-trees.c::threeway_merge()

This fixes three buglets in threeway_merge() regarding D/F
conflict entries.

* After finishing with path D and handling path D/F, some stages
have D/F conflict entry which are obviously non-NULL. For the
purpose of determining if the path D/F is missing in the
ancestor, they should not be taken into account.

* D/F conflict entry is a marker to say "this stage does _not_
have the path", so do not send them to keep_entry().

Signed-off-by: Junio C Hamano <junkio@cox.net>

t1000: fix case table.Junio C Hamano Sat, 7 Apr 2007 12:42:01 +0000 (05:42 -0700)

t1000: fix case table.

Case #10 is not handled with unpack-trees.c:threeway_merge()
internally, unless under the agressive rule, and it is not a
bug. As the test expects, ND (one side did not do anything,
other side deleted) case was meant to be handled by the caller's
policy (e.g. git-merge-one-file or git-merge-recursive).

Signed-off-by: Junio C Hamano <junkio@cox.net>

shortlog -w: make wrap-line behaviour optional.Junio C Hamano Sun, 8 Apr 2007 08:28:00 +0000 (01:28 -0700)

shortlog -w: make wrap-line behaviour optional.

Signed-off-by: Junio C Hamano <junkio@cox.net>

Use print_wrapped_text() in shortlogJohannes Schindelin Fri, 22 Dec 2006 21:15:59 +0000 (22:15 +0100)

Use print_wrapped_text() in shortlog

Some oneline descriptions are just too long. In shortlog, it looks much
nicer when they are wrapped. Since print_wrapped_text() is UTF-8 aware,
it also works with those descriptions.

[jc: with minimum fixes]

Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>

validate reused pack data with CRC when possibleNicolas Pitre Tue, 10 Apr 2007 04:15:41 +0000 (00:15 -0400)

validate reused pack data with CRC when possible

This replaces the inflate validation with a CRC validation when reusing
data from a pack which uses index version 2. That makes repacking much
safer against corruptions, and it should be a bit faster too.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>

allow forcing index v2 and 64-bit offset tresholdNicolas Pitre Mon, 9 Apr 2007 21:32:03 +0000 (17:32 -0400)

allow forcing index v2 and 64-bit offset treshold

This is necessary for testing the new capabilities in some automated
way without having an actual 4GB+ pack.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>

pack-redundant.c: learn about index v2Nicolas Pitre Mon, 9 Apr 2007 05:06:37 +0000 (01:06 -0400)

pack-redundant.c: learn about index v2

Initially the conversion was made using nth_packed_object_sha1() which
made this file completely index version agnostic. Unfortunately the
overhead was quite significant so I went back to raw index walking but
with selectable base and step values which brought back similar
performances as the original.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>

show-index.c: learn about index v2Nicolas Pitre Mon, 9 Apr 2007 05:06:36 +0000 (01:06 -0400)

show-index.c: learn about index v2

When index v2 is encountered, the CRC32 of each object is also displayed
in parenthesis at the end of the line.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>

sha1_file.c: learn about index version 2Nicolas Pitre Mon, 9 Apr 2007 05:06:35 +0000 (01:06 -0400)

sha1_file.c: learn about index version 2

With this patch, packs larger than 4GB are usable, even on a 32-bit machine
(at least on Linux). If off_t is not large enough to deal with a large
pack then die() is called instead of attempting to use the pack and
producing garbage.

This was tested with a 8GB pack specially created for the occasion on
a 32-bit machine.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>

index-pack: learn about pack index version 2Nicolas Pitre Mon, 9 Apr 2007 05:06:34 +0000 (01:06 -0400)

index-pack: learn about pack index version 2

Like previous patch but for index-pack.

[ There is quite some code duplication between pack-objects and index-pack
for generating a pack index (and fast-import as well I suppose). This
should be reworked into a common function eventually. But not now. ]

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>

pack-objects: learn about pack index version 2Nicolas Pitre Mon, 9 Apr 2007 05:06:33 +0000 (01:06 -0400)

pack-objects: learn about pack index version 2

Pack index version 2 goes as follows:

- 8 bytes of header with signature and version.

- 256 entries of 4-byte first-level fan-out table.

- Table of sorted 20-byte SHA1 records for each object in pack.

- Table of 4-byte CRC32 entries for raw pack object data.

- Table of 4-byte offset entries for objects in the pack if offset is
representable with 31 bits or less, otherwise it is an index in the next
table with top bit set.

- Table of 8-byte offset entries indexed from previous table for offsets
which are 32 bits or more (optional).

- 20-byte SHA1 checksum of sorted object names.

- 20-byte SHA1 checksum of the above.

The object SHA1 table is all contiguous so future pack format that would
contain this table directly won't require big changes to the code. It is
also tighter for slightly better cache locality when looking up entries.

Support for large packs exceeding 31 bits in size won't impose an index
size bloat for packs within that range that don't need a 64-bit offset.
And because newer objects which are likely to be the most frequently used
are located at the beginning of the pack, they won't pay the 64-bit offset
lookup at run time either even if the pack is large.

Right now an index version 2 is created only when the biggest offset in a
pack reaches 31 bits. It might be a good idea to always use index version
2 eventually to benefit from the CRC32 it contains when reusing pack data
while repacking.

[jc: with the "oops" fix to keep track of the last offset correctly]

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>

compute object CRC32 with index-packNicolas Pitre Mon, 9 Apr 2007 05:06:32 +0000 (01:06 -0400)

compute object CRC32 with index-pack

Same as previous patch but for index-pack.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>

compute a CRC32 for each object as stored in a packNicolas Pitre Mon, 9 Apr 2007 05:06:31 +0000 (01:06 -0400)

compute a CRC32 for each object as stored in a pack

The most important optimization for performance when repacking is the
ability to reuse data from a previous pack as is and bypass any delta
or even SHA1 computation by simply copying the raw data from one pack
to another directly.

The problem with this is that any data corruption within a copied object
would go unnoticed and the new (repacked) pack would be self-consistent
with its own checksum despite containing a corrupted object. This is a
real issue that already happened at least once in the past.

In some attempt to prevent this, we validate the copied data by inflating
it and making sure no error is signaled by zlib. But this is still not
perfect as a significant portion of a pack content is made of object
headers and references to delta base objects which are not deflated and
therefore not validated when repacking actually making the pack data reuse
still not as safe as it could be.

Of course a full SHA1 validation could be performed, but that implies
full data inflating and delta replaying which is extremely costly, which
cost the data reuse optimization was designed to avoid in the first place.

So the best solution to this is simply to store a CRC32 of the raw pack
data for each object in the pack index. This way any object in a pack can
be validated before being copied as is in another pack, including header
and any other non deflated data.

Why CRC32 instead of a faster checksum like Adler32? Quoting Wikipedia:

Jonathan Stone discovered in 2001 that Adler-32 has a weakness for very
short messages. He wrote "Briefly, the problem is that, for very short
packets, Adler32 is guaranteed to give poor coverage of the available
bits. Don't take my word for it, ask Mark Adler. :-)" The problem is
that sum A does not wrap for short messages. The maximum value of A for
a 128-byte message is 32640, which is below the value 65521 used by the
modulo operation. An extended explanation can be found in RFC 3309,
which mandates the use of CRC32 instead of Adler-32 for SCTP, the
Stream Control Transmission Protocol.

In the context of a GIT pack, we have lots of small objects, especially
deltas, which are likely to be quite small and in a size range for which
Adler32 is dimed not to be sufficient. Another advantage of CRC32 is the
possibility for recovery from certain types of small corruptions like
single bit errors which are the most probable type of corruptions.

OK what this patch does is to compute the CRC32 of each object written to
a pack within pack-objects. It is not written to the index yet and it is
obviously not validated when reusing pack data yet either.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>

add overflow tests on pack offset variablesNicolas Pitre Mon, 9 Apr 2007 05:06:30 +0000 (01:06 -0400)

add overflow tests on pack offset variables

Change a few size and offset variables to more appropriate type, then
add overflow tests on those offsets. This prevents any bad data to be
generated/processed if off_t happens to not be large enough to handle
some big packs.

Better be safe than sorry.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>

make overflow test on delta base offset work regardless... Nicolas Pitre Mon, 9 Apr 2007 05:06:29 +0000 (01:06 -0400)

make overflow test on delta base offset work regardless of variable size

This patch introduces the MSB() macro to obtain the desired number of
most significant bits from a given variable independently of the variable
type.

It is then used to better implement the overflow test on the OBJ_OFS_DELTA
base offset variable with the property of always working correctly
regardless of the type/size of that variable.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>

get rid of num_packed_objects()Nicolas Pitre Mon, 9 Apr 2007 05:06:28 +0000 (01:06 -0400)

get rid of num_packed_objects()

The coming index format change doesn't allow for the number of objects
to be determined from the size of the index file directly. Instead, Let's
initialize a field in the packed_git structure with the object count when
the index is validated since the count is always known at that point.

While at it let's reorder some struct packed_git fields to avoid padding
due to needed 64-bit alignment for some of them.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>

Avoid overflowing name buffer in deep directory structuresLinus Torvalds Tue, 10 Apr 2007 04:13:58 +0000 (21:13 -0700)

Avoid overflowing name buffer in deep directory structures

This just makes sure that when we do a read_directory(), we check
that the filename fits in the buffer we allocated (with a bit of
slop)

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>

diff-lib: use ce_mode_from_stat() rather than messing... Linus Torvalds Tue, 10 Apr 2007 04:13:29 +0000 (21:13 -0700)

diff-lib: use ce_mode_from_stat() rather than messing with modes manually

The diff helpers used to do the magic mode canonicalization and all the
other special mode handling by hand ("trust executable bit" and "has
symlink support" handling).

That's bogus. Use "ce_mode_from_stat()" that does this all for us.

This is also going to be required when we add support for links to other
git repositories.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>

Documentation: show-ref: document --exclude-existingJulian Phillips Mon, 9 Apr 2007 20:57:36 +0000 (21:57 +0100)

Documentation: show-ref: document --exclude-existing

Use the comment in the code to document the --exclude-existing
function to git-show-ref.

Signed-off-by: Julian Phillips <julian@quantumfyre.co.uk>
Signed-off-by: Junio C Hamano <junkio@cox.net>