* maint:
Document -g (--walk-reflogs) option of git-log
sscanf/strtoul: parse integers robustly
git-blame: Fix overrun in fake_working_tree_commit()
[PATCH] Improve look-and-feel of the gitk tool.
[PATCH] Teach gitk to use the user-defined UI font everywhere.
At the same time, we do not want to allow arbitrary strings for
attribute names, as we are likely to want to extend the syntax
later. Allow only alnum, dash, underscore and dot for now.
This adds "attribute macros" (for lack of better name). So far,
we have low-level attributes such as crlf and diff, which are
defined in operational terms --- setting or unsetting them on a
particular path directly affects what is done to the path. For
example, in order to decline diffs or crlf conversions on a
binary blob, no diffs on PostScript files, and treat all other
files normally, you would have something like these:
* diff crlf
*.ps !diff
proprietary.o !diff !crlf
That is fine as the operation goes, but gets unwieldy rather
rapidly, when we start adding more low-level attributes that are
defined in operational terms. A near-term example of such an
attribute would be 'merge-3way' which would control if git
should attempt the usual 3-way file-level merge internally, or
leave merging to a specialized external program of user's
choice. When it is added, we do _not_ want to force the users
to update the above to:
The way this patch solves this issue is to realize that the
attributes the user is assigning to paths are not defined in
terms of operations but in terms of what they are.
All of the three low-level attributes usually make sense for
most of the files that sane SCM users have git operate on (these
files are typically called "text'). Only a few cases, such as
binary blob, need exception to decline the "usual treatment
given to text files" -- and people mark them as "binary".
So this allows the $GIT_DIR/info/alternates and .gitattributes
at the toplevel of the project to also specify attributes that
assigns other attributes. The syntax is '[attr]' followed by an
attribute name followed by a list of attribute names:
[attr] binary !diff !crlf !merge-3way
When "binary" attribute is set to a path, if the path has not
got diff/crlf/merge-3way attribute set or unset by other rules,
this rule unsets the three low-level attributes.
It is expected that the user level .gitattributes will be
expressed mostly in terms of attributes based on what the files
are, and the above sample would become like this:
* As described above, you can define these macros only in
$GIT_DIR/info/attributes and toplevel .gitattributes.
* There is no attempt to detect circular definition of macro
attributes, and definitions are evaluated from bottom to top
as usual to fill in other attributes that have not yet got
values. The following would work as expected:
[attr] text diff crlf
[attr] ps text !diff
*.ps ps
while this would most likely not (I haven't tried):
[attr] ps text !diff
[attr] text diff crlf
*.ps ps
* When a macro says "[attr] A B !C", saying that a path does
not have attribute A does not let you tell anything about
attributes B or C. That is, given this:
[attr] text diff crlf
[attr] ps text !diff
*.txt !ps
path hello.txt, which would match "*.txt" pattern, would have
"ps" attribute set to zero, but that does not make text
attribute of hello.txt set to false (nor diff attribute set to
true).
This is in the same spirit as the previous one. Earlier 'diff'
meant 'do the built-in binary heuristics and disable patch text
generation based on it' while '!diff' meant 'do not guess, do
not generate patch text'. There was no way to say 'do generate
patch text even when the heuristics says it has NUL in it'.
Earlier we said 'crlf lets the path go through core.autocrlf
process while !crlf disables it altogether'. This fixes the
semantics to:
- Lack of 'crlf' attribute makes core.autocrlf to apply
(i.e. we guess based on the contents and if platform
expresses its desire to have CRLF line endings via
core.autocrlf, we do so).
- Setting 'crlf' attribute to true forces CRLF line endings in
working tree files, even if blob does not look like text
(e.g. contains NUL or other bytes we consider binary).
- Setting 'crlf' attribute to false disables conversion.
Expose subprojects as special files to "git diff" machinery
The same way we generate diffs on symlinks as the the diff of text of the
symlink, we can generate subproject diffs (when not recursing into them!)
as the diff of the text that describes the subproject.
Of course, since what descibes a subproject is just the SHA1, that's what
we'll use. Add some pretty-printing to make it a bit more obvious what is
going on, and we're done.
So with this, we can get both raw diffs and "textual" diffs of subproject
changes:
NOTE! We'll also want to have the ability to recurse into the subproject
and actually diff it recursively, but that will involve a new command line
option (I'd suggest "--subproject" and "-S", but the latter is in use by
pickaxe), and some very different code.
But regardless of ay future recursive behaviour, we need the non-recursive
version too (and it should be the default, at least in the absense of
config options, so that large superprojects don't default to something
extremely expensive).
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
git-gui: Display the directory basename in the title
By showing the basename of the directory very early in the
title bar I can more easily locate a particular git-gui
session when I have 8 open at once and my Windows taskbar
is overflowing with items.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
* er/ui:
Always bind the return key to the default button
Do not break git-gui messages into multiple lines.
Improve look-and-feel of the git-gui tool.
Teach git-gui to use the user-defined UI font everywhere.
Allow wish interpreter to be defined with TCLTK_PATH
* builtin-grep.c (strtoul_ui): Move function definition from here, to...
* git-compat-util.h (strtoul_ui): ...here, with an added "base" parameter.
* builtin-grep.c (cmd_grep): Update use of strtoul_ui to include base, "10".
* builtin-update-index.c (read_index_info): Diagnose an invalid mode integer
that is out of range or merely larger than INT_MAX.
(cmd_update_index): Use strtoul_ui, not sscanf.
* convert-objects.c (write_subdirectory): Likewise.
Signed-off-by: Jim Meyering <jim@meyering.net> Signed-off-by: Junio C Hamano <junkio@cox.net>
* git://git2.kernel.org/pub/scm/gitk/gitk:
[PATCH] Improve look-and-feel of the gitk tool.
[PATCH] Teach gitk to use the user-defined UI font everywhere.
Since "git ls-files" doesn't really pass down any details on what it
really wants done to the directory walking code, the directory walking
code doesn't really know whether the caller wants to know about gitlink
directories, or whether it wants to just know about ignored files.
So the directory walking code will return those gitlink directories unless
the caller has explicitly told it not to ("dir->show_other_directories"
tells the directory walker to only show "other" directories).
This kind of confuses "git ls-files -o", because
- it didn't really expect to see entries listed that were already in the
index, unless they were unmerged, and would die on that unexpected
setup, rather than just "continue".
- it didn't know how to match directory entries with the final "/"
This trivial change updates the "show_other_files()" function to handle
both of these issues gracefully. There really was no reason to die, when
the obviously correct thing for the function was to just ignore files it
already knew about (that's what "other" means here!).
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
This defines the semantics of 'crlf' attribute as an example.
When a path has this attribute unset (i.e. '!crlf'), autocrlf
line-end conversion is not applied.
Eventually we would want to let users to build a pipeline of
processing to munge blob data to filesystem format (and in the
other direction) based on combination of attributes, and at that
point the mechanism in convert_to_{git,working_tree}() that
looks at 'crlf' attribute needs to be enhanced. Perhaps the
existing 'crlf' would become the first step in the input chain,
and the last step in the output chain.
Add basic infrastructure to assign attributes to paths
This adds the basic infrastructure to assign attributes to
paths, in a way similar to what the exclusion mechanism does
based on $GIT_DIR/info/exclude and .gitignore files.
An attribute is just a simple string that does not contain any
whitespace. They can be specified in $GIT_DIR/info/attributes
file, and .gitattributes file in each directory.
Each line in these files defines a pattern matching rule.
Similar to the exclusion mechanism, a later match overrides an
earlier match in the same file, and entries from .gitattributes
file in the same directory takes precedence over the ones from
parent directories. Lines in $GIT_DIR/info/attributes file are
used as the lowest precedence default rules.
A line is either a comment (an empty line, or a line that begins
with a '#'), or a rule, which is a whitespace separated list of
tokens. The first token on the line is a shell glob pattern.
The rest are names of attributes, each of which can optionally
be prefixed with '!'. Such a line means "if a path matches this
glob, this attribute is set (or unset -- if the attribute name
is prefixed with '!'). For glob matching, the same "if the
pattern does not have a slash in it, the basename of the path is
matched with fnmatch(3) against the pattern, otherwise, the path
is matched with the pattern with FNM_PATHNAME" rule as the
exclusion mechanism is used.
This does not define what an attribute means. Tying an
attribute to various effects it has on git operation for paths
that have it will be specified separately.
* maint:
git-quiltimport complaining yet still working
config.txt: Fix grammatical error in description of http.noEPSV
config.txt: Change pserver to server in description of gitcvs.*
config.txt: Document core.autocrlf
config.txt: Document gitcvs.allbinary
Do not default to --no-index when given two directories.
Use rev-list --reverse in git-rebase.sh
There were two bugs: "stop_here" doesn't exist, but the bug that causes
this code to trigger in the *first* place is the wrong use of "$dotest".
It should be ".dotest"
This is essentially the same bug introduced by 87ab7992, one was
fixed with 0d38ab25 but this was somehow left behind.
Replace a pair of patches with updated ones for subproject support.
This series of three patches is a *replacement* for the patch series of
two patches (plus one-liner fixup) I sent yesterday.
It fixes the issue I noted with "git status" incorrectly
claiming that a non-checked out subproject wasn't clean - that
was just a total thinko in the code (we were checking the
filesystem mode against S_IFDIRLNK, which obviously cannot work,
since S_IFDIRLINK is a git-internal state, not a filesystem
state).
It then re-sends the two patches on top of that, with the fix
for checking out superprojects (we should *not* mess up any
existing subproject directories, certainly not remove them - if
we already have a directory in the place where we now want a
subproject, we should leave it well alone!)
The first one really is a fix, and it makes the commit
commentary about a remaining bug in the patch I sent out
yesterday go away.
Teach "git-read-tree -u" to check out submodules as a directory
This actually allows us to check out a supermodule after cloning, although
the submodules themselves will obviously not be checked out, and will just
be empty directories.
Checking out the submodules will be up to higher levels - we may not even
want to!
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
The code to match up index entries with the filesystem was stupidly
broken. We shouldn't compare the filesystem stat() information with
S_IFDIRLNK, since that's purely a git-internal value, and not what the
filesystem uses (on the filesystem, it's just a regular directory).
Also, don't bother to make the stat() time comparisons etc for DIRLNK
entries in ce_match_stat_basic(), since we do an exact match for these
things, and the hints in the stat data simply doesn't matter.
This fixes "git status" with submodules that haven't been checked out in
the supermodule.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
Adds documentation for gitcvs.{dbname,dbdriver,dbuser,dbpass}
Texts are mostly taken from git-cvsserver.txt whith some
adaptions so that they make more sense out of the context
of the original man page.
Signed-off-by: Frank Lichtenheld <frank@lichtenheld.de> Signed-off-by: Junio C Hamano <junkio@cox.net>
Teach "git-read-tree -u" to check out submodules as a directory
This actually allows us to check out a supermodule after cloning, although
the submodules will obviously not be checked out, and will just be an
empty subdirectory.
[ Side note: this also shows that we currently don't correctly handle
such subprojects that aren't checked out correctly yet. They should
always show up as not being modified, but failing to resolve the
gitlink HEAD does not properly trigger the "not modified" logic in all
places it needs to..
So more work to be done, but that's a separate issue, unrelated to
the action of checking out the superproject. ]
The bulk of this patch is simply because we need to check the type of the
index entry *before* we try to read the object it points to, and that
meant that the code needed some re-organization. So I moved some of the
code in common to both symlinks and files to be a trivial helper function.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
* jc/cherry:
Documentation: --cherry-pick
git-log --cherry-pick A...B
Refactor patch-id filtering out of git-cherry and git-format-patch.
Add %m to '--pretty=format:'
handle_options in git wrapper miscounts the options it handled.
handle_options did not count the number of used arguments
correctly. When --git-dir was used the extra argument was
not added to the number of handled arguments.
Signed-off-by: Matthias Lederhofer <matled@gmx.net> Signed-off-by: Junio C Hamano <junkio@cox.net>
An earlier --subject-prefix patch forgot that format-patch is
not the only codepath that adds the "[PATCH]" prefix, and broke
everybody else in the log family.
By default we are pretty quiet about the actual commands that
we are running. So we should continue to be quiet about the new
merge-subtree hardlink to merge-recursive. Technically this is not
a builtin, but it is close because subtree is actually builtin to
a non-builtin. So lets just make things easy and call it a builtin.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
Don't show gitlink directories when we want "other" files
When "show_other_directories" is set, that implies that we are looking
for untracked files, which obviously means that we should ignore
directories that are marked as gitlinks in the index.
This fixes "git status" in a superproject, that would otherwise always
report that subprojects were "Untracked files:"
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
cvsserver: Document the GIT branches -> CVS modules mapping more prominently
Add a note about the branches -> modules mapping to LIMITATIONS because
I really think it should be noted there and not just at the end of
the installation step-by-step HOWTO.
I used "git branches" there and changed "heads" to "branches" in
my section about database configuration. I'm reluctant to replace
all occourences of "head" with "branch" though because you always
have to say "git branch" because CVS also has the concept of
branches. You can say "head" though, because there is no such
concept in CVS. In all the existing occourences of head other than
the one I changed I think "head" flows better in the text.
Signed-off-by: Frank Lichtenheld <frank@lichtenheld.de> Signed-off-by: Junio C Hamano <junkio@cox.net>
I finally got around to looking at Alex' patch to teach update-index about
gitlinks too, so that "git commit -a" along with any other explicit
update-index scripts can work.
I don't think there was anything wrong with Alex' patch, but the code he
patched I felt was just so ugly that the added cases just pushed it over
the edge. Especially as I don't think that patch necessarily did the right
thing for a gitlink entry that already existed in the index, but that
wasn't actually a real git repository in the working tree (just an empty
subdirectory or a non-git snapshot because it hadn't wanted to track that
particular subproject).
So I ended up deciding to clean up the git-update-index handling the same
way I tackled the directory traversal used by git-add earlier: by
splitting the different cases up into multiple smaller functions, and just
making the code easier to read (and adding more comments about the
different cases).
So this replaces the old "process_file()" with a new "process_path()"
function that then just calls out to different helper functions depending
on what kind of path it is. Processing a nondirectory ends up being just
one of the simpler cases.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
CVS allows you to add a removed file (where the
removal is not yet committed) which will
cause the server to send the latest revision of the
file and to delete the "removed" status.
Copy this behaviour.
Signed-off-by: Frank Lichtenheld <frank@lichtenheld.de> Signed-off-by: Junio C Hamano <junkio@cox.net>
This is meant to be a saner replacement for "git-cherry".
When used with "A...B", this filters out commits whose patch
text has the same patch-id as a commit on the other side. It
would probably most useful to use with --left-right.
Refactor patch-id filtering out of git-cherry and git-format-patch.
This implements the patch-id computation and recording library,
patch-ids.c, and rewrites the get_patch_ids() function used in
cherry and format-patch to use it, so that they do not pollute
the object namespace. Earlier code threw non-objects into the
in-core object database, and hoped for not getting bitten by
SHA-1 collisions. While it may be practically Ok, it still was
an ugly hack.
When used with '--boundary A...B', this shows the -/</> marker
you would get with --left-right option to 'git-log' family.
When symmetric diff is not used, everybody is shown to be on the
"right" branch.
This is a fairly complete list of tests for various aspects of pack
index versions 1 and 2.
Tests on index v2 include 32-bit and 64-bit offsets, as well as a nice
demonstration of the flawed repacking integrity checks that index
version 2 intend to solve over index version 1 with the per object CRC.
Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
Reliance on /dev/urandom produces test vectors that are, well, random.
This can cause problems impossible to track down when the data is
different from one test invokation to another.
The goal is not to have random data to test, but rather to have a
convenient way to create sets of large files with non compressible and
non deltifiable data in a reproducible way.
Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
* builtin-grep.c (strtoul_ui): Move function definition from here, to...
* git-compat-util.h (strtoul_ui): ...here, with an added "base" parameter.
* builtin-grep.c (cmd_grep): Update use of strtoul_ui to include base, "10".
* builtin-update-index.c (read_index_info): Diagnose an invalid mode integer
that is out of range or merely larger than INT_MAX.
(cmd_update_index): Use strtoul_ui, not sscanf.
* convert-objects.c (write_subdirectory): Likewise.
Signed-off-by: Jim Meyering <jim@meyering.net> Signed-off-by: Junio C Hamano <junkio@cox.net>
This is the promised cleaned-up version of teaching directory traversal
(ie the "read_directory()" logic) about subprojects. That makes "git add"
understand to add/update subprojects.
It now knows to look at the index file to see if a directory is marked as
a subproject, and use that as information as whether it should be recursed
into or not.
It also generally cleans up the handling of directory entries when
traversing the working tree, by splitting up the decision-making process
into small functions of their own, and adding a fair number of comments.
Finally, it teaches "add_file_to_cache()" that directory names can have
slashes at the end, since the directory traversal adds them to make the
difference between a file and a directory clear (it always did that, but
my previous too-ugly-to-apply subproject patch had a totally different
path for subproject directories and avoided the slash for that case).
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
Add custom subject prefix support to format-patch (take 3)
Add a new option to git-format-patch, entitled --subject-prefix that allows
control of the subject prefix '[PATCH]'. Using this option, the text 'PATCH' is
replaced with whatever input is provided to the option. This allows easily
generating patches like '[PATCH 2.6.21-rc3]' or properly numbered series like
'[-mm3 PATCH N/M]'. This patch provides the implementation and documentation.
Signed-off-by: Robin H. Johnson <robbat2@gentoo.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
* maint:
GIT 1.5.1.1
cvsserver: Fix handling of diappeared files on update
fsck: do not complain on detached HEAD.
(encode_85, decode_85): Mark source buffer pointer as "const".
This fixes a total thinko in my original series: subprojects do *not* sort
like directories, because the index is sorted purely by full pathname, and
since a subproject shows up in the index as a normal NUL-terminated
string, it never has the issues with sorting with the '/' at the end.
So if you have a subproject "proj" and a file "proj.c", the subproject
sorts alphabetically before the file in the index (and must thus also sort
that way in a tree object, since trees sort as the index).
In contrast, it you have two files "proj/file" and "proj.c", the "proj.c"
will sort alphabetically before "proj/file" in the index. The index
itself, of course, does not actually contain an entry "proj/", but in the
*tree* that gets written out, the tree entry "proj" will sort after the
file entry "proj.c", which is the only real magic sorting rule.
In other words: the magic sorting rule only affects tree entries, and
*only* affects tree entries that point to other trees (ie are of the type
S_IFDIR).
Anyway, that thinko just means that we should remove the special case to
make S_ISDIRLNK entries sort like S_ISDIR entries. They don't. They sort
like normal files.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
cvsserver: Fix handling of diappeared files on update
Only send a modified response if the client sent a
"Modified" entry. This fixes the case where the
file was locally deleted on the client without
being removed from CVS. In this case the client
will only have sent the Entry for the file but nothing
else.
Signed-off-by: Frank Lichtenheld <frank@lichtenheld.de> Acked-by: Martin Langhoff <martin@catalyst.net.nz> Acked-by: Daniel Barkalow <barkalow@iabervon.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
Detached HEAD is just a normal state of a repository. Do not
say anything about it.
Do not give worrying "error:" messages when we let the user know
that the HEAD points at nothing (i.e. yet to be born branch),
nor we do not have any default refs to start following the
objects chain. Reword them as "notice:".
gitweb: Allow configuring the default projects order and add order 'none'
Introduce new configuration variable $default_projects_order
that can be used to specify the default order of projects on
the index page if no 'o' parameter is given.
Allow a new value 'none' for order that will cause the projects
to be in the order we learned about them. In case of reading the
list of projects from a file, this should be the order as they are
listed in the file. In case of reading the list of projects from
a directory this will probably give random results depending on the
filesystem in use.
Signed-off-by: Frank Lichtenheld <frank@lichtenheld.de> Acked-by: Petr Baudis <pasky@suse.cz> Signed-off-by: Junio C Hamano <junkio@cox.net>
Make it possible to use the forks feature even when
reading the list of projects from a file, by creating
a list of known prefixes as we go. Forks have to be
listed after the main project in order to be recognised
as such.
Signed-off-by: Frank Lichtenheld <frank@lichtenheld.de> Acked-by: Petr Baudis <pasky@suse.cz> Signed-off-by: Junio C Hamano <junkio@cox.net>
Teach core object handling functions about gitlinks
This teaches the really fundamental core SHA1 object handling routines
about gitlinks. We can compare trees with gitlinks in them (although we
can not actually generate patches for them yet - just raw git diffs),
and they show up as commits in "git ls-tree".
We also know to compare gitlinks as if they were directories (ie the
normal "sort as trees" rules apply).
[jc: amended a cut&paste error]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
Since the subprojects don't necessarily even exist in the current tree,
much less in the current git repository (they are totally independent
repositories), we do not want to try to follow the chain from one git
repository to another through a gitlink.
This involves teaching fsck to ignore references to gitlink objects from
a tree and from the current index.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
Add "S_IFDIRLNK" file mode infrastructure for git links
This just adds the basic helper functions to recognize and work with git
tree entries that are links to other git repositories ("subprojects").
They still aren't actually connected up to any of the code-paths, but
now all the infrastructure is in place.
The next commit will start actually adding actual subproject support.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
This new function resolves a ref in *another* git repository. It's
named for its intended use: to look up the git link to a subproject.
It's not actually wired up to anything yet, but we're getting closer to
having fundamental plumbing support for "links" from one git directory
to another, which is the basis of subproject support.
[jc: amended a FILE* leak]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
git-fetch: use fetch--tool pick-rref to avoid local fetch from alternate
When we are fetching from a repository that is on a local
filesystem, first check if we have all the objects that we are
going to fetch available locally, by not just checking the tips
of what we are fetching, but with a full reachability analysis
to our existing refs. In such a case, we do not have to run
git-fetch-pack which would send many needless objects. This is
especially true when the other repository is an alternate of the
current repository (e.g. perhaps the repository was created by
running "git clone -l -s" from there).
The useless objects transferred used to be discarded when they
were expanded by git-unpack-objects called from git-fetch-pack,
but recent git-fetch-pack prefers to keep the data it receives
from the other end without exploding them into loose objects,
resulting in a pack full of duplicated data when fetching from
your own alternate.
This also uses fetch--tool pick-rref on dumb transport side to
remove a shell loop to do the same.
We have fairly extensive coverage of read-tree 3-way machinery,
and many Porcelain-ish tests use git-merge front-end tests, but
we did not have good basic test for merge-recursive, which made
it very hard to hack on it.
I used this during the recent work to teach D/F conflicts to
merge-recursive.
merge-recursive: handle D/F conflict case more carefully.
When a path D that originally was blob in the ancestor was
modified on our branch while it was removed on the other branch,
we keep stages 1 and 2, and leave our version in the working
tree. If the other branch created a path D/F, however, that
path can cleanly be resolved in the index (after all, the
ancestor nor we do not have it and only the other side added),
but it cannot be checked out. The issue is the same when the
other branch had D and we had renamed it to D/F, or the ancestor
had D/F instead of D (so there are four combinations).
Do not stop the merge, but leave both D and D/F paths in the
index so that the user can clear things up.
merge-recursive: do not barf on "to be removed" entries.
When update-trees::threeway_merge() decides that a path that
exists in the current index (and HEAD) is to be removed, it
leaves a stage 0 entry whose mode bits are set to 0. The code
mistook it as "this stage wants the blob here", and proceeded
to call update_file_flags() which ended up trying to put the
mode=0 entry in the index, got very confused, and ended up
barfing with "do not know what to do with 000000".
Since threeway_merge() does not handle case #10 (one side
removes while the other side does not do anything), this is not
a problem while we refuse to merge branches that have D/F
conflicts, but when we start resolving them, we would need to be
able to remove cache entries, and at that point it starts to
matter.
Treat D/F conflict entry more carefully in unpack-trees.c::threeway_merge()
This fixes three buglets in threeway_merge() regarding D/F
conflict entries.
* After finishing with path D and handling path D/F, some stages
have D/F conflict entry which are obviously non-NULL. For the
purpose of determining if the path D/F is missing in the
ancestor, they should not be taken into account.
* D/F conflict entry is a marker to say "this stage does _not_
have the path", so do not send them to keep_entry().
Case #10 is not handled with unpack-trees.c:threeway_merge()
internally, unless under the agressive rule, and it is not a
bug. As the test expects, ND (one side did not do anything,
other side deleted) case was meant to be handled by the caller's
policy (e.g. git-merge-one-file or git-merge-recursive).
Some oneline descriptions are just too long. In shortlog, it looks much
nicer when they are wrapped. Since print_wrapped_text() is UTF-8 aware,
it also works with those descriptions.
[jc: with minimum fixes]
Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Junio C Hamano <junkio@cox.net>
This replaces the inflate validation with a CRC validation when reusing
data from a pack which uses index version 2. That makes repacking much
safer against corruptions, and it should be a bit faster too.
Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
Initially the conversion was made using nth_packed_object_sha1() which
made this file completely index version agnostic. Unfortunately the
overhead was quite significant so I went back to raw index walking but
with selectable base and step values which brought back similar
performances as the original.
Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
With this patch, packs larger than 4GB are usable, even on a 32-bit machine
(at least on Linux). If off_t is not large enough to deal with a large
pack then die() is called instead of attempting to use the pack and
producing garbage.
This was tested with a 8GB pack specially created for the occasion on
a 32-bit machine.
Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
[ There is quite some code duplication between pack-objects and index-pack
for generating a pack index (and fast-import as well I suppose). This
should be reworked into a common function eventually. But not now. ]
Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
- 256 entries of 4-byte first-level fan-out table.
- Table of sorted 20-byte SHA1 records for each object in pack.
- Table of 4-byte CRC32 entries for raw pack object data.
- Table of 4-byte offset entries for objects in the pack if offset is
representable with 31 bits or less, otherwise it is an index in the next
table with top bit set.
- Table of 8-byte offset entries indexed from previous table for offsets
which are 32 bits or more (optional).
- 20-byte SHA1 checksum of sorted object names.
- 20-byte SHA1 checksum of the above.
The object SHA1 table is all contiguous so future pack format that would
contain this table directly won't require big changes to the code. It is
also tighter for slightly better cache locality when looking up entries.
Support for large packs exceeding 31 bits in size won't impose an index
size bloat for packs within that range that don't need a 64-bit offset.
And because newer objects which are likely to be the most frequently used
are located at the beginning of the pack, they won't pay the 64-bit offset
lookup at run time either even if the pack is large.
Right now an index version 2 is created only when the biggest offset in a
pack reaches 31 bits. It might be a good idea to always use index version
2 eventually to benefit from the CRC32 it contains when reusing pack data
while repacking.
[jc: with the "oops" fix to keep track of the last offset correctly]
Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
compute a CRC32 for each object as stored in a pack
The most important optimization for performance when repacking is the
ability to reuse data from a previous pack as is and bypass any delta
or even SHA1 computation by simply copying the raw data from one pack
to another directly.
The problem with this is that any data corruption within a copied object
would go unnoticed and the new (repacked) pack would be self-consistent
with its own checksum despite containing a corrupted object. This is a
real issue that already happened at least once in the past.
In some attempt to prevent this, we validate the copied data by inflating
it and making sure no error is signaled by zlib. But this is still not
perfect as a significant portion of a pack content is made of object
headers and references to delta base objects which are not deflated and
therefore not validated when repacking actually making the pack data reuse
still not as safe as it could be.
Of course a full SHA1 validation could be performed, but that implies
full data inflating and delta replaying which is extremely costly, which
cost the data reuse optimization was designed to avoid in the first place.
So the best solution to this is simply to store a CRC32 of the raw pack
data for each object in the pack index. This way any object in a pack can
be validated before being copied as is in another pack, including header
and any other non deflated data.
Why CRC32 instead of a faster checksum like Adler32? Quoting Wikipedia:
Jonathan Stone discovered in 2001 that Adler-32 has a weakness for very
short messages. He wrote "Briefly, the problem is that, for very short
packets, Adler32 is guaranteed to give poor coverage of the available
bits. Don't take my word for it, ask Mark Adler. :-)" The problem is
that sum A does not wrap for short messages. The maximum value of A for
a 128-byte message is 32640, which is below the value 65521 used by the
modulo operation. An extended explanation can be found in RFC 3309,
which mandates the use of CRC32 instead of Adler-32 for SCTP, the
Stream Control Transmission Protocol.
In the context of a GIT pack, we have lots of small objects, especially
deltas, which are likely to be quite small and in a size range for which
Adler32 is dimed not to be sufficient. Another advantage of CRC32 is the
possibility for recovery from certain types of small corruptions like
single bit errors which are the most probable type of corruptions.
OK what this patch does is to compute the CRC32 of each object written to
a pack within pack-objects. It is not written to the index yet and it is
obviously not validated when reusing pack data yet either.
Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
Change a few size and offset variables to more appropriate type, then
add overflow tests on those offsets. This prevents any bad data to be
generated/processed if off_t happens to not be large enough to handle
some big packs.
Better be safe than sorry.
Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
make overflow test on delta base offset work regardless of variable size
This patch introduces the MSB() macro to obtain the desired number of
most significant bits from a given variable independently of the variable
type.
It is then used to better implement the overflow test on the OBJ_OFS_DELTA
base offset variable with the property of always working correctly
regardless of the type/size of that variable.
Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
The coming index format change doesn't allow for the number of objects
to be determined from the size of the index file directly. Instead, Let's
initialize a field in the packed_git structure with the object count when
the index is validated since the count is always known at that point.
While at it let's reorder some struct packed_git fields to avoid padding
due to needed 64-bit alignment for some of them.
Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>