Introduction
------------
-The diff commands git-diff-cache, git-diff-files, and
-git-diff-tree can be told to manipulate differences they find
-in unconventional ways before showing diff(1) output. The
-manipulation is collectively called "diffcore transformation".
-This short note describes what they are and how to use them to
-produce diff outputs that are easier to understand than the
-conventional kind.
+The diff commands git-diff-index, git-diff-files, and git-diff-tree
+can be told to manipulate differences they find in
+unconventional ways before showing diff(1) output. The manipulation
+is collectively called "diffcore transformation". This short note
+describes what they are and how to use them to produce diff outputs
+that are easier to understand than the conventional kind.
The chain of operation
The git-diff-* family works by first comparing two sets of
files:
- - git-diff-cache compares contents of a "tree" object and the
- working directory (when --cached flag is not used) or a
- "tree" object and the index file (when --cached flag is
+ - git-diff-index compares contents of a "tree" object and the
+ working directory (when '\--cached' flag is not used) or a
+ "tree" object and the index file (when '\--cached' flag is
used);
- git-diff-files compares contents of the index file and the
working directory;
- - git-diff-tree compares contents of two "tree" objects.
+ - git-diff-tree compares contents of two "tree" objects;
In all of these cases, the commands themselves compare
corresponding paths in the two sets of files. The result of
called "diffcore", in a format similar to what is output when
the -p option is not used. E.g.
- in-place edit :100644 100644 bcd1234... 0123456... M file0
- create :000000 100644 0000000... 1234567... N file4
- delete :100644 000000 1234567... 0000000... D file5
- unmerged :000000 000000 0000000... 0000000... U file6
+------------------------------------------------
+in-place edit :100644 100644 bcd1234... 0123456... M file0
+create :000000 100644 0000000... 1234567... A file4
+delete :100644 000000 1234567... 0000000... D file5
+unmerged :000000 000000 0000000... 0000000... U file6
+------------------------------------------------
The diffcore mechanism is fed a list of such comparison results
(each of which is called "filepair", although at this point each
of them talks about a single file), and transforms such a list
-into another list. There are currently 7 such transformations:
+into another list. There are currently 6 such transformations:
- - diffcore-pathspec
- - diffcore-break
- - diffcore-rename
- - diffcore-merge-broken
- - diffcore-pickaxe
- - diffcore-order
+- diffcore-pathspec
+- diffcore-break
+- diffcore-rename
+- diffcore-merge-broken
+- diffcore-pickaxe
+- diffcore-order
-These are applied in sequence. The set of filepairs git-diff-*
+These are applied in sequence. The set of filepairs git-diff-\*
commands find are used as the input to diffcore-pathspec, and
the output from diffcore-pathspec is used as the input to the
next transformation. The final result is then passed to the
output routine and generates either diff-raw format (see Output
-format sections of the manual for git-diff-* commands) or
+format sections of the manual for git-diff-\* commands) or
diff-patch format.
-diffcore-pathspec
------------------
+diffcore-pathspec: For Ignoring Files Outside Our Consideration
+---------------------------------------------------------------
The first transformation in the chain is diffcore-pathspec, and
is controlled by giving the pathname parameters to the
git-diff-* commands on the command line. The pathspec is used
to limit the world diff operates in. It removes the filepairs
-outside the specified set of pathnames.
+outside the specified set of pathnames. E.g. If the input set
+of filepairs included:
+
+------------------------------------------------
+:100644 100644 bcd1234... 0123456... M junkfile
+------------------------------------------------
+
+but the command invocation was "git-diff-files myfile", then the
+junkfile entry would be removed from the list because only "myfile"
+is under consideration.
Implementation note. For performance reasons, git-diff-tree
uses the pathname parameters on the command line to cull set of
use diffcore-pathspec, but the end result is the same.
-diffcore-break
---------------
+diffcore-break: For Splitting Up "Complete Rewrites"
+----------------------------------------------------
The second transformation in the chain is diffcore-break, and is
controlled by the -B option to the git-diff-* commands. This is
break such filepair into two filepairs that represent delete and
create. E.g. If the input contained this filepair:
- :100644 100644 bcd1234... 0123456... M file0
+------------------------------------------------
+:100644 100644 bcd1234... 0123456... M file0
+------------------------------------------------
and if it detects that the file "file0" is completely rewritten,
it changes it to:
- :100644 000000 bcd1234... 0000000... D file0
- :000000 100644 0000000... 0123456... N file0
+------------------------------------------------
+:100644 000000 bcd1234... 0000000... D file0
+:000000 100644 0000000... 0123456... A file0
+------------------------------------------------
For the purpose of breaking a filepair, diffcore-break examines
the extent of changes between the contents of the files before
after "-B" option (e.g. "-B75" to tell it to use 75%).
-diffcore-rename
----------------
+diffcore-rename: For Detection Renames and Copies
+-------------------------------------------------
This transformation is used to detect renames and copies, and is
controlled by the -M option (to detect renames) and the -C option
(to detect copies as well) to the git-diff-* commands. If the
input contained these filepairs:
- :100644 000000 0123456... 0000000... D fileX
- :000000 100644 0000000... 0123456... N file0
+------------------------------------------------
+:100644 000000 0123456... 0000000... D fileX
+:000000 100644 0000000... 0123456... A file0
+------------------------------------------------
and the contents of the deleted file fileX is similar enough to
the contents of the created file file0, then rename detection
merges these filepairs and creates:
- :100644 100644 0123456... 0123456... R100 fileX file0
+------------------------------------------------
+:100644 100644 0123456... 0123456... R100 fileX file0
+------------------------------------------------
-When the "-C" option is used, the original contents of modified
-files and contents of unchanged files are considered as
-candidates of the source files in rename/copy operation, in
-addition to the deleted files. If the input were like these
-filepairs, that talk about a modified file fileY and a newly
+When the "-C" option is used, the original contents of modified files,
+and deleted files (and also unmodified files, if the
+"\--find-copies-harder" option is used) are considered as candidates
+of the source files in rename/copy operation. If the input were like
+these filepairs, that talk about a modified file fileY and a newly
created file file0:
- :100644 100644 0123456... 1234567... M fileY
- :000000 100644 0000000... 0123456... N file0
+------------------------------------------------
+:100644 100644 0123456... 1234567... M fileY
+:000000 100644 0000000... bcd3456... A file0
+------------------------------------------------
the original contents of fileY and the resulting contents of
file0 are compared, and if they are similar enough, they are
changed to:
- :100644 100644 0123456... 1234567... M fileY
- :100644 100644 0123456... 0123456... C100 fileY file0
+------------------------------------------------
+:100644 100644 0123456... 1234567... M fileY
+:100644 100644 0123456... bcd3456... C100 fileY file0
+------------------------------------------------
In both rename and copy detection, the same "extent of changes"
algorithm used in diffcore-break is used to determine if two
files are "similar enough", and can be customized to use
-similarity score different from the default 50% by giving a
-number after "-M" or "-C" option (e.g. "-M8" to tell it to use
+a similarity score different from the default of 50% by giving a
+number after the "-M" or "-C" option (e.g. "-M8" to tell it to use
8/10 = 80%).
-Note. When the "-C" option is used, git-diff-cache and
-git-diff-file commands feed not just modified filepairs but
-unmodified ones to diffcore mechanism as well. This lets the
-copy detector consider unmodified files as copy source
-candidates at the expense of making it slower. Currently
-git-diff-tree does not feed unmodified filepairs even when the
-"-C" option is used, so it can detect copies only if the file
-that was copied happened to have been modified in the same
-changeset.
+Note. When the "-C" option is used with `\--find-copies-harder`
+option, git-diff-\* commands feed unmodified filepairs to
+diffcore mechanism as well as modified ones. This lets the copy
+detector consider unmodified files as copy source candidates at
+the expense of making it slower. Without `\--find-copies-harder`,
+git-diff-\* commands can detect copies only if the file that was
+copied happened to have been modified in the same changeset.
-diffcore-merge-broken
----------------------
+diffcore-merge-broken: For Putting "Complete Rewrites" Back Together
+--------------------------------------------------------------------
This transformation is used to merge filepairs broken by
-diffcore-break, and were not transformed into rename/copy by
+diffcore-break, and not transformed into rename/copy by
diffcore-rename, back into a single modification. This always
runs when diffcore-break is used.
single modification) by giving a second number to -B option,
like these:
- -B50/60 (give 50% "break score" to diffcore-break, use
- 60% for diffcore-merge-broken).
- -B/60 (the same as above, since diffcore-break defautls to
- 50%).
+* -B50/60 (give 50% "break score" to diffcore-break, use 60%
+ for diffcore-merge-broken).
+
+* -B/60 (the same as above, since diffcore-break defaults to 50%).
+Note that earlier implementation left a broken pair as a separate
+creation and deletion patches. This was an unnecessary hack and
+the latest implementation always merges all the broken pairs
+back into modifications, but the resulting patch output is
+formatted differently for easier review in case of such
+a complete rewrite by showing the entire contents of old version
+prefixed with '-', followed by the entire contents of new
+version prefixed with '+'.
-diffcore-pickaxe
-----------------
+
+diffcore-pickaxe: For Detecting Addition/Deletion of Specified String
+---------------------------------------------------------------------
This transformation is used to find filepairs that represent
changes that touch a specified string, and is controlled by the
--S option and the --pickaxe-all option to the git-diff-*
+-S option and the `\--pickaxe-all` option to the git-diff-*
commands.
When diffcore-pickaxe is in use, it checks if there are
string appeared in this changeset". It also checks for the
opposite case that loses the specified string.
-When --pickaxe-all is not in effect, diffcore-pickaxe leaves
-only such filepairs that touches the specified string in its
-output. When --pickaxe-all is used, diffcore-pickaxe leaves all
+When `\--pickaxe-all` is not in effect, diffcore-pickaxe leaves
+only such filepairs that touch the specified string in its
+output. When `\--pickaxe-all` is used, diffcore-pickaxe leaves all
filepairs intact if there is such a filepair, or makes the
output empty otherwise. The latter behaviour is designed to
make reviewing of the changes in the context of the whole
changeset easier.
-diffcore-order
---------------
+diffcore-order: For Sorting the Output Based on Filenames
+---------------------------------------------------------
This is used to reorder the filepairs according to the user's
(or project's) taste, and is controlled by the -O option to the
git-diff-* commands.
-This takes a text file each of whose line is a shell glob
+This takes a text file each of whose lines is a shell glob
pattern. Filepairs that match a glob pattern on an earlier line
in the file are output before ones that match a later line, and
filepairs that do not match any glob pattern are output last.
-As an example, typical orderfile for the core GIT probably
-should look like this:
-
- README
- Makefile
- Documentation
- *.h
- *.c
- t
+As an example, a typical orderfile for the core git probably
+would look like this:
+
+------------------------------------------------
+README
+Makefile
+Documentation
+*.h
+*.c
+t
+------------------------------------------------