contrib / diff-highlight / READMEon commit config: introduce an optional event stream while parsing (8032cc4)
   1diff-highlight
   2==============
   3
   4Line oriented diffs are great for reviewing code, because for most
   5hunks, you want to see the old and the new segments of code next to each
   6other. Sometimes, though, when an old line and a new line are very
   7similar, it's hard to immediately see the difference.
   8
   9You can use "--color-words" to highlight only the changed portions of
  10lines. However, this can often be hard to read for code, as it loses
  11the line structure, and you end up with oddly formatted bits.
  12
  13Instead, this script post-processes the line-oriented diff, finds pairs
  14of lines, and highlights the differing segments.  It's currently very
  15simple and stupid about doing these tasks. In particular:
  16
  17  1. It will only highlight hunks in which the number of removed and
  18     added lines is the same, and it will pair lines within the hunk by
  19     position (so the first removed line is compared to the first added
  20     line, and so forth). This is simple and tends to work well in
  21     practice. More complex changes don't highlight well, so we tend to
  22     exclude them due to the "same number of removed and added lines"
  23     restriction. Or even if we do try to highlight them, they end up
  24     not highlighting because of our "don't highlight if the whole line
  25     would be highlighted" rule.
  26
  27  2. It will find the common prefix and suffix of two lines, and
  28     consider everything in the middle to be "different". It could
  29     instead do a real diff of the characters between the two lines and
  30     find common subsequences. However, the point of the highlight is to
  31     call attention to a certain area. Even if some small subset of the
  32     highlighted area actually didn't change, that's OK. In practice it
  33     ends up being more readable to just have a single blob on the line
  34     showing the interesting bit.
  35
  36The goal of the script is therefore not to be exact about highlighting
  37changes, but to call attention to areas of interest without being
  38visually distracting.  Non-diff lines and existing diff coloration is
  39preserved; the intent is that the output should look exactly the same as
  40the input, except for the occasional highlight.
  41
  42Use
  43---
  44
  45You can try out the diff-highlight program with:
  46
  47---------------------------------------------
  48git log -p --color | /path/to/diff-highlight
  49---------------------------------------------
  50
  51If you want to use it all the time, drop it in your $PATH and put the
  52following in your git configuration:
  53
  54---------------------------------------------
  55[pager]
  56        log = diff-highlight | less
  57        show = diff-highlight | less
  58        diff = diff-highlight | less
  59---------------------------------------------
  60
  61
  62Color Config
  63------------
  64
  65You can configure the highlight colors and attributes using git's
  66config. The colors for "old" and "new" lines can be specified
  67independently. There are two "modes" of configuration:
  68
  69  1. You can specify a "highlight" color and a matching "reset" color.
  70     This will retain any existing colors in the diff, and apply the
  71     "highlight" and "reset" colors before and after the highlighted
  72     portion.
  73
  74  2. You can specify a "normal" color and a "highlight" color. In this
  75     case, existing colors are dropped from that line. The non-highlighted
  76     bits of the line get the "normal" color, and the highlights get the
  77     "highlight" color.
  78
  79If no "new" colors are specified, they default to the "old" colors. If
  80no "old" colors are specified, the default is to reverse the foreground
  81and background for highlighted portions.
  82
  83Examples:
  84
  85---------------------------------------------
  86# Underline highlighted portions
  87[color "diff-highlight"]
  88oldHighlight = ul
  89oldReset = noul
  90---------------------------------------------
  91
  92---------------------------------------------
  93# Varying background intensities
  94[color "diff-highlight"]
  95oldNormal = "black #f8cbcb"
  96oldHighlight = "black #ffaaaa"
  97newNormal = "black #cbeecb"
  98newHighlight = "black #aaffaa"
  99---------------------------------------------
 100
 101
 102Using diff-highlight as a module
 103--------------------------------
 104
 105If you want to pre- or post- process the highlighted lines as part of
 106another perl script, you can use the DiffHighlight module. You can
 107either "require" it or just cat the module together with your script (to
 108avoid run-time dependencies).
 109
 110Your script may set up one or more of the following variables:
 111
 112  - $DiffHighlight::line_cb - this should point to a function which is
 113    called whenever DiffHighlight has lines (which may contain
 114    highlights) to output. The default function prints each line to
 115    stdout. Note that the function may be called with multiple lines.
 116
 117  - $DiffHighlight::flush_cb - this should point to a function which
 118    flushes the output (because DiffHighlight believes it has completed
 119    processing a logical chunk of input). The default function flushes
 120    stdout.
 121
 122The script may then feed lines, one at a time, to DiffHighlight::handle_line().
 123When lines are done processing, they will be fed to $line_cb. Note that
 124DiffHighlight may queue up many input lines (to analyze a whole hunk)
 125before calling $line_cb. After providing all lines, call
 126DiffHighlight::flush() to flush any unprocessed lines.
 127
 128If you just want to process stdin, DiffHighlight::highlight_stdin()
 129is a convenience helper which will loop and flush for you.
 130
 131
 132Bugs
 133----
 134
 135Because diff-highlight relies on heuristics to guess which parts of
 136changes are important, there are some cases where the highlighting is
 137more distracting than useful. Fortunately, these cases are rare in
 138practice, and when they do occur, the worst case is simply a little
 139extra highlighting. This section documents some cases known to be
 140sub-optimal, in case somebody feels like working on improving the
 141heuristics.
 142
 1431. Two changes on the same line get highlighted in a blob. For example,
 144   highlighting:
 145
 146----------------------------------------------
 147-foo(buf, size);
 148+foo(obj->buf, obj->size);
 149----------------------------------------------
 150
 151   yields (where the inside of "+{}" would be highlighted):
 152
 153----------------------------------------------
 154-foo(buf, size);
 155+foo(+{obj->buf, obj->}size);
 156----------------------------------------------
 157
 158   whereas a more semantically meaningful output would be:
 159
 160----------------------------------------------
 161-foo(buf, size);
 162+foo(+{obj->}buf, +{obj->}size);
 163----------------------------------------------
 164
 165   Note that doing this right would probably involve a set of
 166   content-specific boundary patterns, similar to word-diff. Otherwise
 167   you get junk like:
 168
 169-----------------------------------------------------
 170-this line has some -{i}nt-{ere}sti-{ng} text on it
 171+this line has some +{fa}nt+{a}sti+{c} text on it
 172-----------------------------------------------------
 173
 174   which is less readable than the current output.
 175
 1762. The multi-line matching assumes that lines in the pre- and post-image
 177   match by position. This is often the case, but can be fooled when a
 178   line is removed from the top and a new one added at the bottom (or
 179   vice versa). Unless the lines in the middle are also changed, diffs
 180   will show this as two hunks, and it will not get highlighted at all
 181   (which is good). But if the lines in the middle are changed, the
 182   highlighting can be misleading. Here's a pathological case:
 183
 184-----------------------------------------------------
 185-one
 186-two
 187-three
 188-four
 189+two 2
 190+three 3
 191+four 4
 192+five 5
 193-----------------------------------------------------
 194
 195   which gets highlighted as:
 196
 197-----------------------------------------------------
 198-one
 199-t-{wo}
 200-three
 201-f-{our}
 202+two 2
 203+t+{hree 3}
 204+four 4
 205+f+{ive 5}
 206-----------------------------------------------------
 207
 208   because it matches "two" to "three 3", and so forth. It would be
 209   nicer as:
 210
 211-----------------------------------------------------
 212-one
 213-two
 214-three
 215-four
 216+two +{2}
 217+three +{3}
 218+four +{4}
 219+five 5
 220-----------------------------------------------------
 221
 222   which would probably involve pre-matching the lines into pairs
 223   according to some heuristic.