This moves the path selection logic from individual programs to a new
diffcore transformer (diff-tree still needs to have its own for
performance reasons). Also the header printing code in diff-tree was
tweaked not to produce anything when pickaxe is in effect and there is
nothing interesting to report. An interesting example is the following
in the GIT archive itself:
$ git-whatchanged -p -C -S'or something in a real script'
Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There was a screwy math bug in the estimator that confused what
-C1 meant and what -C9 meant, only in one of the early "cheap"
check, which resulted in quite confusing behaviour.
Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Update the diff-raw format as Linus and I discussed, except that
it does not use sequence of underscore '_' letters to express
nonexistence. All '0' mode is used for that purpose instead.
The new diff-raw format can express rename/copy, and the earlier
restriction that -M and -C _must_ be used with the patch format
output is no longer necessary. The patch makes -M and -C flags
independent of -p flag, so you need to say git-whatchanged -M -p
to get the diff/patch format.
Updated are both documentations and tests.
Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The heuristics so far was to compare file size change and xdelta
size against the average of file size before and after the
change. This patch uses the smaller of pre- and post- change
file size instead.
It also makes a very small performance fix. I didn't measure
it; I do not expect it to make any practical difference, but
while scanning an already sorted list, breaking out in the
middle is the right thing.
Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make the commit explanation buffer larger, and make sure that if
we truncate it, we put a "..." marker there to visually tell people
about the truncation (tested with a much smaller buffer to make
sure it looks sane).
Also make sure that the explanation is properly line-terminated,
and add an extra newline iff we have a diff.
[PATCH] Diff overhaul, adding the other half of copy detection.
This patch extends diff-cache and diff-files to report the
unmodified files to diff-core as well when -C (copy detection)
is in effect, so that the unmodified files can also be used as
the source candidates. The existing test t4003 has been
extended to cover this case.
Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This steals the "pickaxe" feature from JIT and make it available
to the bare Plumbing layer. From the command line, the user
gives a string he is intersted in.
Using the diff-core infrastructure previously introduced, it
filters the differences to limit the output only to the diffs
between <src> and <dst> where the string appears only in one but
not in the other. For example:
$ ./git-rev-list HEAD | ./git-diff-tree -Sdiff-tree-helper --stdin -M
would show the diffs that touch the string "diff-tree-helper".
In real software-archaeologist application, you would typically
look for a few to several lines of code and see where that code
came from.
The "pickaxe" module runs after "rename/copy detection" module,
so it even crosses the file rename boundary, as the above
example demonstrates.
Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] Diff overhaul, adding half of copy detection.
This introduces the diff-core, the layer between the diff-tree
family and the external diff interface engine. The calls to the
interface diff-tree family uses (diff_change and diff_addremove)
have not changed and will not change. The purpose of the
diff-core layer is to provide an infrastructure to transform the
set of differences sent from the applications, before sending
them to the external diff interface.
The recently introduced rename detection code has been rewritten
to use the diff-core facility. When applications send in
separate creates and deletes, matching ones are transformed into
a single rename-and-edit diff, and sent out to the external diff
interface as such.
This patch also enhances the rename detection code further to be
able to detect copies. Currently this happens only as long as
copy sources appear as part of the modified files, but there
already is enough provision for callers to report unmodified
files to diff-core, so that they can be also used as copy source
candidates. Extending the callers this way will be done in a
separate patch.
Please see and marvel at how well this works by trying out the
newly added t/t4003-diff-rename-1.sh test script.
Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This adds the ability to actually create delta objects using a new tool:
git-mkdelta. It uses an ordered list of potential objects to deltafy
against earlier objects in the list. A cap on the depth of delta
references can be provided as well, otherwise the default is to not have
any limit. A limit of 0 will also undeltafy any given object.
Also provided is the beginning of a script to deltafy an entire
repository.
Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This adds knowledge of delta objects to fsck-cache and various object
parsing code. A new switch to git-fsck-cache is provided to display the
maximum delta depth found in a repository.
Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This makes the core code aware of delta objects and undeltafy them as
needed. The convention is to use read_sha1_file() to have
undeltafication done automatically (most users do that already so this
is transparent).
If the delta object itself has to be accessed then it must be done
through map_sha1_file() and unpack_sha1_file().
In that context mktag.c has been switched to read_sha1_file() as there
is no reason to do the full map+unpack manually.
Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix various things that sparse complains about:
- use NULL instead of 0
- make sure we declare everything properly, or mark it static
- use proper function declarations ("fn(void)" instead of "fn()")
[PATCH] Simplify "reverse-diff" logic in the diff core.
Instead of swapping the arguments just before output, this patch
makes the swapping happen on the input side of the diff core,
when "reverse-diff" is in effect. This greatly simplifies the
logic, but more importantly it is necessary for upcoming "copy
detection" work.
Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The same check we added earlier to update-cache to catch ENOTDIR
turns out to be missing from diff-files. This causes a
difference not being reported when you have DF/DF (a file in a
subdirectory) in the cache and DF is a file on the filesystem.
Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This one compares two pathnames that may be partial basenames, not
full paths. We need to get the path sorting right, since a directory
name will sort as if it had the final '/' at the end.
Add '-R' flag to diff-tree, and change the test subdirectory
shell files to be executable (something that Junio couldn't
get me to do through the pure patch with my current patch
handling infrastructure).
This cleans up the way calls are made into the diff core from diff-tree
family and diff-helper. Earlier, these programs had "if
(generating_patch)" sprinkled all over the place, but those ugliness are
gone and handled uniformly from the diff core, even when not generating
patch format.
This also allowed diff-cache and diff-files to acquire -R
(reverse) option to generate diff in reverse. Users of
diff-tree can swap two trees easily so I did not add -R there.
[ Linus' note: I'll add -R to "diff-tree" too, since a "commit
diff" doesn't have another tree to switch around: the other
tree is always the parent(s) of the commit ]
Also -M<digits-as-mantissa> suggestion made by Linus has been
implemented.
Documentation updates are also included.
Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] Implement git-checkout-cache -u to update stat information in the cache.
With -u flag, git-checkout-cache picks up the stat information
from newly created file and updates the cache. This removes the
need to run git-update-cache --refresh immediately after running
git-checkout-cache.
Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This rips out the rename detection engine from diff-helper and moves it
to the diff core, and updates the internal calling convention used by
diff-tree family into the diff core. In order to give the same option
name to diff-tree family as well as to diff-helper, I've changed the
earlier diff-helper '-r' option to '-M' (stands for Move; sorry but the
natural abbreviation 'r' for 'rename' is already taken for 'recursive').
Although I did a fair amount of test with the git-diff-tree with
existing rename commits in the core GIT repository, this should still be
considered beta (preview) release. This patch depends on the diff-delta
infrastructure just committed.
This implements almost everything I wanted to see in this series of
patch, except a few minor cleanups in the calling convention into diff
core, but that will be a separate cleanup patch.
Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I just remembered why I placed that bogus "sb->len ==0 implies
sb->eof" condition there. We need at least something like this
to catch the normal EOF (that is, line termination immediately
followed by EOF) case. "if (feof(fp))" fires when we have
already read the eof, not when we are about read it.
Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
diff-tree: don't match non-directories as partial pathnames
This normally doesn't matter, but if you have a filename that is
sometimes a directory and sometimes a regular file (or symlink),
we don't want the regular file case to trigger a "partial match".
diff-tree: fix up comparison of "interesting" sub-trees
We used to trigger the "interesting subdirectory" check for any
matching name that started with the same character series, regardless
of whether it had the matching slash or not.
We use "--" to mark end of command line switches, not "-". Also,
allow more flexibility in the passed-in sha1 names, in that a
single sha1 uses the "commit-diff" logic that compares against
its parent(s).
This patch adds a framework and a stub implementation of rename
detection to diff-helper program.
The current stub code is just enough to detect pure renames in
diff-tree output and not fancier. The plan is perhaps to use
the same delta code when Nico's delta storage patch is merged
for similarity evaluation purposes.
Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
fsck-cache: walk the 'refs' directory if the user doesn't give any
explicit references for reachability analysis.
We already had that as separate logic in git-prune-script, so this
is not a new special case - it's an old special case moved into
fsck, making normal usage be much simpler.
This implements the output format suggested by Linus in
<Pine.LNX.4.58.0505161556260.18337@ppc970.osdl.org>, except the
imaginary diff option is spelled "diff --git" with double dashes as
suggested by Matthias Urlichs.
Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH 3/3] Rename git-diff-tree-helper to git-diff-helper (part 2).
It used to be that diff-tree needed helper support to parse its
raw output to generate diffs, but these days git-diff-* family
produces the same output and the helper is not tied to diff-tree
anymore. Drop "tree" from its name.
This follows the "rename only" commit to adjust the contents of
the files involved.
Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Petr Baudis <pasky@ucw.cz>
[PATCH 2/3] Rename git-diff-tree-helper to git-diff-helper.
It used to be that diff-tree needed helper support to parse its
raw output to generate diffs, but these days git-diff-* family
produces the same output and the helper is not tied to diff-tree
anymore. Drop "tree" from its name.
This commit is done separately to record just the rename and no
file content changes. The changes in the renamed files are recorded
in the next commit.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Bundled with the changes in the unrenamed files.
This test comes from "[PATCH 2/2] The core GIT tests: recent additions and
fixes" but couldn't be included before since it depended on the modechange
diff output changes.
Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Petr Baudis <pasky@ucw.cz>
[PATCH 1/3] Update mode-change strings in diff output.
This updates the mode change strings to be a bit more machine
friendly. Although this might go against the spirit of
readability for human consumption, these mode bits strings are
shown only when unusual things (mode change, file creation and
deletion) happens, output normalized for machine consumption
would be permissible.
Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Petr Baudis <pasky@ucw.cz>
[PATCH] Add the merge test Linus called "test script from hell".
This is an adaptation to the test framework of a historic test
that was used before three way merge form of read-tree was
introduced, and subsequently used to validate the read-tree -m
merge works correctly. It covers all the tricky cases known
back then and also have been updated to cover conflicting
files/directories cases since then.
Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Petr Baudis <pasky@ucw.cz>
This test makes sure that use of deprecated environment variables still
works, using both new and old names makes new one take
precedence, and GIT_DIR and GIT_ALTERNATE_OBJECT_DIRECTORIES mechanisms
work.
Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Petr Baudis <pasky@ucw.cz>
Rename some test scripts and describe the naming convention
First digit: "family", e.g. the absolute basics and global stuff (0),
the basic db-side commands (read-tree, write-tree, commit-tree), the
basic working-tree-side commands (checkout-cache, update-cache), the
other basic commands (ls-files), the diff commands, the pull commands,
exporting commands, revision tree commands...
Second digit: the particular command we are testing
Third digit: (optionally) the particular switch or group of switches
we are testing
Exposing test_expect_success and test_expect_failure turns out
to be enough for the test scripts and there is no need for
exposing test_ok or test_failure. This patch cleans it up and
fixes the users of test_ok and test_failure.
Also test scripts have acquired a new command line flag
'--immediate' to cause them to exit upon the first failure.
This is useful especially during the development of a new test.
Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Petr Baudis <pasky@ucw.cz>
NO changed to FAIL and ok was right-aligned with it so that it is easier
to visually identify the failed tests, and the removal of # should reduce
the clutter on the line and aid the eye to spot the test number better.
t/Makefile now does not use double-colon rules (why would it?), the rm
-fr trash in the all rule is silent, and OPTS aren't set to blank so
that they can be taken from the environment.
[PATCH 2/2] The core GIT tests: recent additions and fixes.
This set of scripts are designed to test the features and fixes
we recently added to core GIT. The convention to call test
helper function has been changed during the framework cleanup
(take two), and these tests have been updated to use the cleaned
up test-lib.sh interface.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Note that this does not include the t2000-diff.sh script since it
tests a patch which was not applied yet.
This adds t/ directory to host test suite, a test helper
library and a basic set of tests.
Petr Baudis raised many valid points at the earlier attempts in
git mailing list. This round, test-lib.sh has been updated to a
bit more modern style, and the default output is made easier to
read. Also included is one sample test script that tests the
very basics. This test has already found one leftover bug
missed when we introduced symlink support, which has been fixed
since then. The supplied Makefile is designed to run all the
available tests.
Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Petr Baudis <pasky@ucw.cz>
When checkout-cache attempts to check out a non-directory where
a directory exists on the work tree, or to check out a file
under directory D when path D is a non-directory on the work
tree, the attempt fails. Before running checkout-cache, the
user can run git-ls-files with the -k (killed) option to get a
list of such paths. The tagged output format uses "K" to denote
them. This is useful for Porcelain layer to be careful when
dealing with the recently corrected behaviour of checkout-cache.
Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Petr Baudis <pasky@ucw.cz>
[PATCH 2/3] Support symlinks in git-ls-files --others.
It is kind of surprising that this was missed in the last round,
but the work tree scanner in git-ls-files was still deliberately
ignoring symlinks. This patch fixes it, so that --others will
correctly report unregistered symlinks.
Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Petr Baudis <pasky@ucw.cz>
Fix checkout-cache when existing work tree interferes with the checkout.
This is essentially the same one as the last one I sent to the
GIT list, except that the patch is rebased to the current tip of
the git-pb tree, and an unnecessary call to create_directories()
removed.
The checkout-cache command gets confused when checking out a
file in a subdirectory and the work tree has a symlink to the
subdirectory. Also it fails to check things out when there is a
non-directory in the work tree when cache expects a directory
there, and vice versa. This patch fixes the first problem by
making sure all the leading paths in the file being checked out
are indeed directories, and also fixes directory vs
non-directory conflicts when '-f' is specified by removing the
offending paths.
Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Petr Baudis <pasky@ucw.cz>
(Technically it's not a memory leak yet because the buffer is allocated
once by the function and then the utility exits - but it's a tad cleaner
to not leave such assumptions in the code, so that if someone reuses the
function (or extends the utility to include a loop) the uncleanliness
doesnt develop into a real memory leak.)
this patch fixes a 1-byte overflow in update-cache.c (probably not
exploitable). A specially crafted db object might trigger this overflow.
the bug is that normally the 'type' field is parsed by read_sha1_file(),
via:
if (sscanf(buffer, "%10s %lu", type, size) != 2)
i.e. 0-10 long strings, which take 1-11 bytes of space. Normally the
type strings are stored in char [20] arrays, but in update-cache.c that
is char [10], so a 1 byte overflow might occur.
This should not happen with a 'friendly' DB, as the longest type string
("commit") is 7 bytes long. The fix is to use the customary char [20].
(someone might want to clean those open-coded constants up with a
TYPE_LEN define, they do tend to cause problems like this. I'm not
against open-coded constants (they make code much more readable), but
for fields that get filled in from possibly hostile objects this is
playing with fire.)
hey, this might be the first true security fix for GIT? ;-)
Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Petr Baudis <pasky@ucw.cz>