Documentation / technical / api-diff.txton commit t2203: abstract away SHA-1-specific constants (62798a7)
   1diff API
   2========
   3
   4The diff API is for programs that compare two sets of files (e.g. two
   5trees, one tree and the index) and present the found difference in
   6various ways.  The calling program is responsible for feeding the API
   7pairs of files, one from the "old" set and the corresponding one from
   8"new" set, that are different.  The library called through this API is
   9called diffcore, and is responsible for two things.
  10
  11* finding total rewrites (`-B`), renames (`-M`) and copies (`-C`), and
  12  changes that touch a string (`-S`), as specified by the caller.
  13
  14* outputting the differences in various formats, as specified by the
  15  caller.
  16
  17Calling sequence
  18----------------
  19
  20* Prepare `struct diff_options` to record the set of diff options, and
  21  then call `diff_setup()` to initialize this structure.  This sets up
  22  the vanilla default.
  23
  24* Fill in the options structure to specify desired output format, rename
  25  detection, etc.  `diff_opt_parse()` can be used to parse options given
  26  from the command line in a way consistent with existing git-diff
  27  family of programs.
  28
  29* Call `diff_setup_done()`; this inspects the options set up so far for
  30  internal consistency and make necessary tweaking to it (e.g. if
  31  textual patch output was asked, recursive behaviour is turned on);
  32  the callback set_default in diff_options can be used to tweak this more.
  33
  34* As you find different pairs of files, call `diff_change()` to feed
  35  modified files, `diff_addremove()` to feed created or deleted files,
  36  or `diff_unmerge()` to feed a file whose state is 'unmerged' to the
  37  API.  These are thin wrappers to a lower-level `diff_queue()` function
  38  that is flexible enough to record any of these kinds of changes.
  39
  40* Once you finish feeding the pairs of files, call `diffcore_std()`.
  41  This will tell the diffcore library to go ahead and do its work.
  42
  43* Calling `diff_flush()` will produce the output.
  44
  45
  46Data structures
  47---------------
  48
  49* `struct diff_filespec`
  50
  51This is the internal representation for a single file (blob).  It
  52records the blob object name (if known -- for a work tree file it
  53typically is a NUL SHA-1), filemode and pathname.  This is what the
  54`diff_addremove()`, `diff_change()` and `diff_unmerge()` synthesize and
  55feed `diff_queue()` function with.
  56
  57* `struct diff_filepair`
  58
  59This records a pair of `struct diff_filespec`; the filespec for a file
  60in the "old" set (i.e. preimage) is called `one`, and the filespec for a
  61file in the "new" set (i.e. postimage) is called `two`.  A change that
  62represents file creation has NULL in `one`, and file deletion has NULL
  63in `two`.
  64
  65A `filepair` starts pointing at `one` and `two` that are from the same
  66filename, but `diffcore_std()` can break pairs and match component
  67filespecs with other filespecs from a different filepair to form new
  68filepair.  This is called 'rename detection'.
  69
  70* `struct diff_queue`
  71
  72This is a collection of filepairs.  Notable members are:
  73
  74`queue`::
  75
  76        An array of pointers to `struct diff_filepair`.  This
  77        dynamically grows as you add filepairs;
  78
  79`alloc`::
  80
  81        The allocated size of the `queue` array;
  82
  83`nr`::
  84
  85        The number of elements in the `queue` array.
  86
  87
  88* `struct diff_options`
  89
  90This describes the set of options the calling program wants to affect
  91the operation of diffcore library with.
  92
  93Notable members are:
  94
  95`output_format`::
  96        The output format used when `diff_flush()` is run.
  97
  98`context`::
  99        Number of context lines to generate in patch output.
 100
 101`break_opt`, `detect_rename`, `rename-score`, `rename_limit`::
 102        Affects the way detection logic for complete rewrites, renames
 103        and copies.
 104
 105`abbrev`::
 106        Number of hexdigits to abbreviate raw format output to.
 107
 108`pickaxe`::
 109        A constant string (can and typically does contain newlines to
 110        look for a block of text, not just a single line) to filter out
 111        the filepairs that do not change the number of strings contained
 112        in its preimage and postimage of the diff_queue.
 113
 114`flags`::
 115        This is mostly a collection of boolean options that affects the
 116        operation, but some do not have anything to do with the diffcore
 117        library.
 118
 119`touched_flags`::
 120        Records whether a flag has been changed due to user request
 121        (rather than just set/unset by default).
 122
 123`set_default`::
 124        Callback which allows tweaking the options in diff_setup_done().
 125
 126BINARY, TEXT;;
 127        Affects the way how a file that is seemingly binary is treated.
 128
 129FULL_INDEX;;
 130        Tells the patch output format not to use abbreviated object
 131        names on the "index" lines.
 132
 133FIND_COPIES_HARDER;;
 134        Tells the diffcore library that the caller is feeding unchanged
 135        filepairs to allow copies from unmodified files be detected.
 136
 137COLOR_DIFF;;
 138        Output should be colored.
 139
 140COLOR_DIFF_WORDS;;
 141        Output is a colored word-diff.
 142
 143NO_INDEX;;
 144        Tells diff-files that the input is not tracked files but files
 145        in random locations on the filesystem.
 146
 147ALLOW_EXTERNAL;;
 148        Tells output routine that it is Ok to call user specified patch
 149        output routine.  Plumbing disables this to ensure stable output.
 150
 151QUIET;;
 152        Do not show any output.
 153
 154REVERSE_DIFF;;
 155        Tells the library that the calling program is feeding the
 156        filepairs reversed; `one` is two, and `two` is one.
 157
 158EXIT_WITH_STATUS;;
 159        For communication between the calling program and the options
 160        parser; tell the calling program to signal the presence of
 161        difference using program exit code.
 162
 163HAS_CHANGES;;
 164        Internal; used for optimization to see if there is any change.
 165
 166SILENT_ON_REMOVE;;
 167        Affects if diff-files shows removed files.
 168
 169RECURSIVE, TREE_IN_RECURSIVE;;
 170        Tells if tree traversal done by tree-diff should recursively
 171        descend into a tree object pair that are different in preimage
 172        and postimage set.
 173
 174(JC)